Login using Social Account
     Continue with GoogleLogin using your credentials
Now, let us prepare data for training the model.
As part of data preparation, we need to perform following techniques on the data:
Shuffling
Feature Scaling
Shuffling the training dataset - to get uniform samples for cross validation
We need to shuffle our training data to ensure that we don't miss out any article (fashion product) in a cross validation fold.
Feature Scaling
Each image (instance) in the dataset has 784 pixels (features) and value of each feature(pixel) ranges from 0 to 255, and this range is too wide , hence we would need to use feature scaling here to apply standardization to this dataset X_train, so that all the values of each feature (pixel) is in a small range (based on the standard deviation value).
x_scaled = (x - x_mean) / standard deviation
Scaling is not needed for Decision Tree and Random Forest algorithms
Please follow the below steps:
Create a random seed=42
np.random.seed(<<your code comes here>>)
Create shuffle indices of size 60000 (as we have 60000 images in the training dataset) and store it in a variable 'shuffle_index'
shuffle_index = np.random.permutation(<<your code comes here>>)
Shuffle the indices of X_train and y_train datasets by using 'shuffle_index' variable created above.
X_train, y_train = X_train[<<your code comes here>>], y_train[<<your code comes here>>]
Import StandardScaler from SKLearn's preprocessing
from <<your code comes here>> import StandardScaler
Create an instance of StandardScaler and store it in variable called 'scaler'
scaler = <<your code comes here>>
Apply standardization on training dataset X_train using the above created StandardScaler instance scaler
using fit_transform method and store the scaled training dataset in X_train_scaled variable.
X_train_scaled = scaler.<<your code comes here>>(X_train.astype(np.float64))
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...