Login using Social Account
     Continue with GoogleLogin using your credentials
Task 1: Complete the statement to plot the correlation matrix between all the features and the dependent variable.
Hint
import statsmodels.graphics.correlation as pltcor
arr = bikesData.drop('dayWeek', axis = 1)
cols = list(arr)
arr = arr.as_matrix()
arr = preprocessing.scale(arr, axis = 1)
corrMat = np.corrcoef(arr, rowvar =0)
np.fill_diagonal(corrMat, 0)
fig = plt.figure(figsize=(9,9))
pltcor.plot_corr(corrMat, xnames = cols, ax = ax)
Observation: The correlation plot is dominated by the strong correlations between many of the features.
For example, date-time features are correlated, as are weather features.
There is also some significant correlation between date-time and weather features. This correlation results from seasonal variation (annual, daily, etc.) in weather conditions.
There is also strong correlation between the count feature and several other features.
Action: It is clear that many of these features are redundant with each other, and some significant pruning of this dataset is in order.
Task 2: Complete the statement to calculate correlation among these variables: 'yr', 'mnth', 'isWorking', 'xformWorkHr', 'dayCount', 'temp', 'hum', 'windspeed', 'cntDeTrended'
Hint
columnToPlotScatter = ['yr','mnth','isWorking','xformWorkHr','dayCount','temp','hum','windspeed','cntDeTrended']
arry = bikesData[columnToPlotScatter].as_matrix()
arry = preprocessing.scale(arry, axis = 1)
corrs = np.corrcoef(arry, rowvar = 0)
np.fill_diagonal(corrs, 0)
col_nms = list(bikesData)[1:]
fig = plt.figure(figsize = (9,9))
ax = fig.gca()
pltcor.plot_corr(corrs, xnames = columnToPlotScatter, ax = ax)
plt.show()
Observation: Correlation plot for a subset of features confirms our understanding that several features are redundant.
We should not be confused between correlation and causation - A highly correlated variable may or may not imply causation
Any feature highly correlated with the dependent variable may not be a good predictor.
Action: We can consider only one datetime feature and one weather feature for training the dataset eventually.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...