Project - Bike Rental Forecasting

29 / 49

End to End Project - Bikes Assessment - Analyzing dataset - Correlation Matrix

Task 1: Complete the statement to plot the correlation matrix between all the features and the dependent variable.

Hint

import statsmodels.graphics.correlation as pltcor
arr = bikesData.drop('dayWeek', axis = 1)
cols = list(arr)
arr = arr.as_matrix()
arr = preprocessing.scale(arr, axis = 1)
corrMat = np.corrcoef(arr, rowvar =0)
np.fill_diagonal(corrMat, 0)
fig = plt.figure(figsize=(9,9))
pltcor.plot_corr(corrMat, xnames = cols, ax = ax)

Observation: The correlation plot is dominated by the strong correlations between many of the features.

  • For example, date-time features are correlated, as are weather features.

  • There is also some significant correlation between date-time and weather features. This correlation results from seasonal variation (annual, daily, etc.) in weather conditions.

  • There is also strong correlation between the count feature and several other features.

Action: It is clear that many of these features are redundant with each other, and some significant pruning of this dataset is in order.

Task 2: Complete the statement to calculate correlation among these variables: 'yr', 'mnth', 'isWorking', 'xformWorkHr', 'dayCount', 'temp', 'hum', 'windspeed', 'cntDeTrended'

Hint

columnToPlotScatter = ['yr','mnth','isWorking','xformWorkHr','dayCount','temp','hum','windspeed','cntDeTrended']
arry = bikesData[columnToPlotScatter].as_matrix()
arry = preprocessing.scale(arry, axis = 1)
corrs = np.corrcoef(arry, rowvar = 0)
np.fill_diagonal(corrs, 0)
col_nms = list(bikesData)[1:]
fig = plt.figure(figsize = (9,9))
ax = fig.gca()
pltcor.plot_corr(corrs, xnames = columnToPlotScatter, ax = ax)
plt.show()

Observation: Correlation plot for a subset of features confirms our understanding that several features are redundant.

  • We should not be confused between correlation and causation - A highly correlated variable may or may not imply causation

  • Any feature highly correlated with the dependent variable may not be a good predictor.

Action: We can consider only one datetime feature and one weather feature for training the dataset eventually.


No hints are availble for this assesment

Answer is not availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...