Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left

  Apply Now

Project - End-to-End Project

10 / 31

End to End Project - Descriptive Statistics Project (P1)

Project for Descriptive Statistics

In this project, we shall import a new dataset about cars and try to observe some of the measures. Some of the tasks we shall perform are: calculating mean, median, variance, standard deviation, plotting histograms, density plots, box plots and drawing inferences .

Create a new notebook (Descriptive_Analysis.ipynb) by going to Files > New > Python 3 on CloudXLab. Please keep hold of this jupyter notebook to answer the questions that follow this exercise.

  1. Import the relevant python packages (numpy, pandas, matplotlib).

Hint:

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
  1. Import 'mtcars' data from ggplot.

    Hint: from ggplot import mtcars

  2. Also set the index of mtcars from a column "index" like this mtcars.index = mtcars["name"]. This would help you identify each row.

  3. Display the header of the data using head() function of mtcars object.

These are the columns present in mtcars dataset:

    name - Name of the car; 
mpg - Miles/(US) gallon; 
cyl - Number of cylinders; 
disp - Displacement (cu.in.); 
hp - Gross horsepower; 
drat - Rear axle ratio; 
wt - Weight (lb/1000); 
qsec - 1/4 mile time; 
vs - V/S; 
am - transmission (0 = automatic, 1 = manual); 
gear - Number of forward gears; 
carb - Number of carburetors;
  1. Calculate the mean of each of the columns of mtcars. This can be achieved by using mean function on mtcars without any argument.

  2. Get the mean of each row (car) and store it in variable 'mtcarsMeanCarWise'. This can be achieved by using mean function on mtcars with argument axis=1.

  3. Get the median of each column and store it in variable 'medianCarsFeatures'. This can be achieved by using median function on mtcars with argument axis=0.

  4. Plot the histograms for each of the columns in mtcars using hist() function with bins as 10 and figsize of (20,15). Do not forget to call plt.show()

  5. Plot the density curves for each column of mtcars using 'for' loop and plot command. You can plot desity by calling plot function with arguments kind="density" and figsize=(2,2).

  6. Plot the correlation for the following attributes: mpg, disp, hp, wt, qsec. You can use the scatter_matrix command present in pandas.plotting library.

  7. We use skewness as a measure of symmetry. If the skewness of is zero then the distribution represented is perfectly symmetric. If the skewness is negative, then the distribution is skewed to the left, while if the skew is positive then the distribution is skewed to the right. Hence, it can also be considered as a measure of normality since normal distributions are symmetric and hence have a zero skew. Calculate the skeweness of each column of 'mtcars' using mtcars.skew()

  8. Use describe and quantile to calculate the Inter-Quantile spread of 'mpg' attribute of mtcars.

  9. Plot the box plot for 'mpg' column of mtcars.

  10. Calculate the mean, standard deviation and variance of the 'mpg' column

  11. Extract cars with automatic transmission and cars with manual transmission into separate variables 'mtCarsAutomatic' and 'mtCarsManual' respectively.

  12. Plot the box plot for 'mpg' (miles per gallons) for each of the subsets of cars grouped by transmission (automatic/ manual).

What can be observed from the box plot? Can you come up with similar box plots to observe how other variables might be significantly impacting Miles/(US) gallon? What are the variables we should track if we need to predict mpg (Miles/ (US) gallon) ?

Please refer the Jupyter notebook to answer the questions that follow this exercise.

Get Hint

Answer is not availble for this assesment

Loading comments...