Wednesday, October 4, 2017

Exploratory data analysis (EDA)

This is start of a series of posts in which we will have a look at statistical analysis using predominantly Python. Statistical analysis involves collecting and scrutinizing every data sample in a set of items from which samples can be drawn.
Statistical analysis can be broken down into following steps, as follows:
· Explore the relation of the data to the underlying population (EDA).
· Create a model to summarize understanding of how the data relates to the underlying population.
· Prove (or disprove) the validity of the model.
· Employ predictive analytics to run scenarios that will help guide future actions.
The goal of statistical analysis is to identify trends. A retail business, for example, might use statistical analysis to find patterns in unstructured and semi-structured customer data that can be used to create a more positive customer experience and increase sales.

In this post we will go through Exploratory Data Analysis, which is the first step towards most data analysis work.
Exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task in short The process of organizing, plotting and summarizing the data set is known as EDA. It often involves converting tabular data in graphical form and if done well, graphical representation can allow for more rapid interpretation of the data.
In this post we will look at a couple of ways to visualize the data with the intention of gaining some useful insight from it, using Python with its workhorse plotting library matplotlib, and also seaborn. The latter is built on top of matplotlib and offers simple api for advanced visualizations and better styling of plots by default.
1. Plotting a Histogram:
    ·  A histogram is essentially a plot of frequency distribution of data grouped into bins. Consider that we have to carefully measure the anatomical properties of samples of three different species of iris, Iris setosaIris versicolor, and Iris virginica. This is the popular iris dataset commonly used in data science. Here, we will work with the measurements of petal length.
    We have 3 Numpy Arrays for each species consists of petal length.
    Following is the code to plot histogram of versicolor petal lengths

    # Import plotting modules

    import matplotlib.pyplot as plt
    import seaborn as sns

    # Set default Seaborn style

    sns.set()

    # Plot histogram of versicolor petal lengths

    plt.hist(versicolor_petal_length)

    # Label axes

    plt.xlabel('petal length(cm)')
    plt.ylabel('count')

    # Show histogram

    plt.show()

    After executing these line of code we will have following histogram.

    What we could see from Histogram is that the petal length ranges from 3.0-5.0 cm and majority of total sample size 50 are greater than 3.5

    We can plot histogram with multiple bins as well which gives a better idea about the data. The default no of bins are 10 but we can explicitly mention bins within plt.hist() as given below.

    # Plot the histogram with 7 bins
    plt.hist(versicolor_petal_length, bins = 7)

    # Label axes

    plt.xlabel('petal length (cm)')
    plt.ylabel('count')

    # Show histogram

    plt.show()

    After executing these line of code we will have following histogram

    The biggest disadvantage of histogram is that the same data may be interpreted differently depending upon the choice of bins called as bin bias. 
    2. Plotting a Bee Swam :
    • Lets make a bee swarm plot of the iris petal lengths. our x-axis should contain each of the three species, and the y-axis the petal lengths. A data frame containing the data is as df having columns as [ sepal length(cm),  sepal width(cm), petal length(cm),  petal width(cm), species

      # Create bee swarm plot with Seaborn's default settings
      sns.swarmplot(x='species',y='petal length (cm)',data=df)

      # Label the axes
      plt.xlabel('species')
      plt.ylabel('petal length')

      # Show the plot
      plt.show()
    After Executing these line of code we will have following Bee Swarm Plot
    We can clearly see from the plot that virginica petals tend to be the longest, and setosa petals tend to be the shortest of the three species.
    Suppose if we have to find that what is the % of the versicolor species having petal length less than 4 cms. 
    3. ECDF Empirical cumulative distribution function :  
    The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample.
    • Lets define a ECDF function by using this function over and again we can plot ECDF plots.
      def ecdf(data):
          """Compute ECDF for a one-dimensional array of measurements."""

          # Number of data points: n
          n = len(data)

          # x-data for the ECDF: x
          x = np.sort(data)

          # y-data for the ECDF: y
          y = np.arange(1, n+1) / n

          return x, y
      We will now use our 
      ecdf() function to compute the ECDF for the petal lengths of versicolor flowers.
      # Compute ECDF for versicolor data: x_vers, y_vers

      x_vers, y_vers = ecdf(versicolor_petal_length)



      # Generate plot using above x_vers, y_vers which we found by ecdf() function.
      plt.plot(x_vers,y_vers,marker='.',linestyle = 'none')

      # Make the margins nice
      plt.margins(0.02)

      # Label the axes
      plt.xlabel('versicolor petal length')
      plt.ylabel('ECDF')


      # Display the plot
      plt.show()
      After Executing these line of code we will have following ECDF Plot 


      Here we can say that around 30% of versicolor petal length are less than 4 cms.
      We can plot ECDF for other species in the single plot for the better comparison and understanding.

      # Compute ECDFs
      x_set, y_set = ecdf(setosa_petal_length)
      x_vers, y_vers = ecdf(versicolor_petal_length)
      x_virg, y_virg = ecdf(virginica_petal_length )

      # Plot all ECDFs on the same plot
      plt.plot(x_set, y_set,marker='.',linestyle='none')
      plt.plot(x_vers, y_vers,marker='.',linestyle='none')
      plt.plot(x_virg, y_virg,marker='.',linestyle='none')

      # Make nice margins
      plt.margins(0.02)

      # Annotate the plot
      plt.legend(('setosa', 'versicolor', 'virginica'), loc='lower right')
      plt.xlabel('petal length (cm)')
      plt.ylabel('ECDF')

      # Display the plot
      plt.show()

      After executing these line of code we will have following ECDF Plot 

      We can say that 40 % of setosa, versicolor, virginica petal length are less than 1.5 cms, 4.5 cms and 5.5 cms respectively.




    Saturday, August 26, 2017

    Tarkarli


    Tarkarli is a small town place made famous by its white sand beaches and water sports.
    We had started from Mumbai railway station to Sindhudurg station by an overnight train to spent our three day short vacation in Konkan in the west coast of India.
    We reached by 10 am and luckily found a shared taxi to our small guest house which we had booked in advance.
    (This place is not much commercialised hence hard to find luxury resorts here but staying in guest house with locals gives an another level of experience)
    Our stay was right next to the Tarkarli beach from where the soothing sound of the sea waves lashing against the sandy beach is clearly audible.

    After having our breakfast we went to beach for a long walk.
    This place really offers an oceans bliss with soft white sand beaches, cool breeze and awesome sun shine.

    After a delicious lunch with local delicacies we got a scooter on rent with around 300 INR/day without fuel for 3 days. And went to awesome Devbag beach and Wayari beach, both the beaches are untouched natural beauty.

    Next day we had a early morning as we had to go for water sports to tsunami island.

    We accompanied a group of around eight people who also stayed at the same guest house and enjoyed water sports like kayaking, high-speed motorboat and water scooter, banana ride, bumper ride, all of which was very thrilling. We also played with flying disc (frisbee) in the shallow island seas and had quite a fun. This place is quite famous for scuba as well.
    We returned to our stay before sunset and spend sunset time at calm and silent Tarkarli beach.

    After dinner we all set and enjoyed camp fire on the beach with music and long chats.
    Last day we had been to Sindhudurg fort and also visited awesome ocean rock garden. 
    Sindhudurg Fort

    Rock Garden


    And one of the most amazing experience of Tarkarli was parasailing right above the deep ocean waters, with massive waves rocking the boat and giving the feeling of overcoming the power of the seas and the wind simultaneously.



    Like all good trips, we came back with some memorabilia as a token from the ocean





    Thursday, August 10, 2017

    Basic Data Analysis in Excel: Charts and Tables

    This blog is related to following :
    •  Introduction to Reporting in Excel
    •  Excel Tables
    •  Basic Pivot Tables and Chart
    •  Dashboards
    Introduction to Reporting in Excel :

    Generally we report data in Excel but do not know how to use that data and how to represent that data graphically. In the below screenshot there are steps to follow to represent data graphically.
    Consider the sales data for a bicycle company over the years for different countries.

    Select data in Excel and then go to insert tab and choose any type of chart: Bar, Line, Pie etc.


    Excel Tables : Excel tables are formatted tables which are more user friendly and functional for calculations and formulas.

    To create excel table select data and then select table from insert tab.
    There is total row check box to find multiple result at the last row of the data such as Sum, Max, Min.




    Basic Pivot Tables and Chart : Pivot tables are one of Excel's most powerful features. A pivot table allows you to extract the significance from a large, detailed data set.

    Step to create Pivot table :

    1) Select Pivot table option from insert tab.
    2) Create Pivot table pop up appears as given below.


    3) After select "OK" on new sheet pivot table with all columns on the LHS will appear.
    4) Select Column Label , Row label and Value section as per the desire report in pivot.




    5) You can select multiple columns/rows/report filters as shown below.



    Pivot Chart :
    Steps to create pivot charts :
    1) Select pivot table data.
    2) Select pivot chart from option tab.
    3) Insert Chart pop up will appears.
    4) Select Chart type and template then select "OK".
    5) Pivot chart will appear as per the pivot table with all applied filters.


    Wednesday, July 12, 2017

    Sagar Resort Ooty


            Valley View, Lovedale Post, Grand Duff Road, Ooty, 
    TN 643003



    This place is exactly in between the hills of Ooty. The view from big balcony is so amazing that the beauty of nature still mesmerizes me. We were lucky enough to enjoy rains on the hills and that was one of the most wonderful experiences on our travel. The architecture and design of the resort was so unique that all the rooms has complete valley view(Associated Architects). Our stay at Sagar holiday resort was cozy and comfortable.





    Wednesday, July 5, 2017

    Memories and Mehendi (Heena)





    I still remember we(i and my friend) used to create Heena after every semester exams. After a long this time i tried one more, surely not as great as professionals do it but i feel happy with the outcome and more importantly i cherish all my memories related to Heena💖




    Saturday, July 1, 2017

    Ooty




    In the middle of the month of June we had taken a bus from Bangalore at 10 P.M. and reached Ooty at 10:30 A.M. the next morning in what was going to be one of the most amazing trips to South India till now.
    I woke up at around 5 A.M. and was completely mesmerized by the scenic beauty of the road to Ooty via the Hilly regions of Nilgiris.



    After reaching our stay at "Sagar Holiday Resort", we had a heavy breakfast and left for the day to explore Ooty.
    We started off with the beautiful Rose garden and then after lunch we had an awesome sunset at Doddabetta peak and also got time to experience boating at Ooty lake.








    We had already booked a day tour bus for Pykara and Mudumalai which includes the following places to visit:
    1) KAMARAJ SAGAR DAM
    2) PYKARA 
        a) DAM 
        b) LAKE 
        c) WATER FALLS
    3) PINE FOREST
    4) MUDUMALAI WILD LIFE SANCTUARY
    5) ONE HOUR JUNGLE SAFARI

    Timing : 9:30 A.M.-9:00 P.M.
    Cost : 250 PP

    You can visit for more information : Tamilnadutourism Local Ooty Tour Packages

    We had a wonderful early morning start of this tour and enjoyed our day in Pykara waterfall and Lake. Later in the day we reached Pine Forest, the most awesome and amazing experience of Pykara. At the end we touched Madumalai wild life sanctuary and were lucky enough to see wildlife in all its raw beauty. It was kind of an adventure when we saw a big family of bears while crossing the road, a couple of elephants, also bison and many more. By the time it was sunset, we were at a very peaceful and scenic place in forest :)








      



    Next day we got a chance to enjoy the Nilgiri mountain railways from Ooty to Coonoor.
    This railway track goes through the Nilgiri forest and mountains and we enjoyed awesome valley views from train. Also visited green valley of Ooty (Dolphins nose) and beautiful SIM's park. We had taken a move to the famous and awesome botanical garden and tea gardens after lunch.

    Ooty is one of the most beautiful places to visit in South India and has rightly earned its fame among tourists.










    Monday, June 26, 2017

    Sauteed Stories


    Lane 5&6, Opp Wellness Forever, North Main Road, KP, Pune
    Contact : 020 3043 1851


    We were invited by our friends over dinner for a small get together. Finally it was decided to park our vehicle at the most amazing place with nice old heritage ambiance-Sauteed stories on north main road in KP. It’s a very small place with delicious food options in the menu. Recommended for all those who want to try out some fresh and juicy mock tails with Tex Max rice. 




    Goldfield Lake Resort

    Block No. 12, Near Navarazreth Church, Kumarakom South, Illikkalam Rd, Kottayam, Kerala 686563
    Contact : 0481 252 5388


    Location of this resort is interior in Kumarakom. It feels like you are in heaven of lush green Kerala. It was such a relaxed and calm place, with breathtaking lake view. The decor and architecture deliver freshness of nature and ultimate comfort throughout our stay. Our day starts with delicious and variety of complimentary breakfast along with birds sound and scenic view of house boats on the lakeside.






    DIY Nail Art :)

    Last  Saturday I was feeling a bit bored and had no plans for the weekend hence I started this new DIY Nail art.
    All you need to do this nail art is:
    • Nail colors (whatever color combination you want)
    • Nail Remover
    • Dotting tool
    • Nail art lining brush

    I loved to do the zig-zag pattern and the lining also gives a very easy and awesome look to the nails. The dotting is an icing on the cake 
    What the final outcome turned out was: