Now, let us prepare data for training the model.
As part of data preparation, we need to perform following techniques on the data:
Shuffling
Feature Scaling
Shuffling the training dataset - to get uniform samples for cross validation
We need to shuffle our training data to ensure that we don't miss out any article (fashion product) in a cross validation fold.
Feature Scaling
Each image (instance) in the dataset has 784 pixels (features) and value of each feature(pixel) ranges from 0 to 255, and this range is too wide , hence we would need to use feature scaling here to apply standardization to this dataset X_train, so that all the values of each feature (pixel) is in a small range (based on the standard deviation value).
x_scaled = (x - x_mean) / standard deviation
Scaling is not needed for Decision Tree and Random Forest algorithms
Please follow the below steps:
Create a random seed=42
np.random.seed(<<your code comes here>>)
Create shuffle indices of size 60000 (as we have 60000 images in the training dataset) and store it in a variable 'shuffle_index'
shuffle_index = np.random.permutation(<<your code comes here>>)
Shuffle the indices of X_train and y_train datasets by using 'shuffle_index' variable created above.
X_train, y_train = X_train[<<your code comes here>>], y_train[<<your code comes here>>]
Import StandardScaler from SKLearn's preprocessing
from <<your code comes here>> import StandardScaler
Create an instance of StandardScaler and store it in variable called 'scaler'
scaler = <<your code comes here>>
Apply standardization on training dataset X_train using the above created StandardScaler instance scaler
using fit_transform method and store the scaled training dataset in X_train_scaled variable.
X_train_scaled = scaler.<<your code comes here>>(X_train.astype(np.float64))
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Please login to comment
0 Comments
There are 44 new comments.