Project - Fashion MNIST

You are currently auditing this course.
9 / 22

End to End ML Project - Fashion MNIST - Data Preparation

Now, let us prepare data for training the model.

As part of data preparation, we need to perform following techniques on the data:

  1. Shuffling

  2. Feature Scaling

Shuffling the training dataset - to get uniform samples for cross validation

We need to shuffle our training data to ensure that we don't miss out any article (fashion product) in a cross validation fold.

Feature Scaling

Each image (instance) in the dataset has 784 pixels (features) and value of each feature(pixel) ranges from 0 to 255, and this range is too wide , hence we would need to use feature scaling here to apply standardization to this dataset X_train, so that all the values of each feature (pixel) is in a small range (based on the standard deviation value).

x_scaled = (x - x_mean) / standard deviation

Scaling is not needed for Decision Tree and Random Forest algorithms


Please follow the below steps:

Create a random seed=42

  np.random.seed(<<your code comes here>>)

Create shuffle indices of size 60000 (as we have 60000 images in the training dataset) and store it in a variable 'shuffle_index'

shuffle_index = np.random.permutation(<<your code comes here>>)

Shuffle the indices of X_train and y_train datasets by using 'shuffle_index' variable created above.

X_train, y_train = X_train[<<your code comes here>>], y_train[<<your code comes here>>]

Import StandardScaler from SKLearn's preprocessing

from <<your code comes here>> import StandardScaler

Create an instance of StandardScaler and store it in variable called 'scaler'

scaler = <<your code comes here>>

Apply standardization on training dataset X_train using the above created StandardScaler instance scaler using fit_transform method and store the scaled training dataset in X_train_scaled variable.

 X_train_scaled = scaler.<<your code comes here>>(X_train.astype(np.float64))

Loading comments...