End to End ML Project - Fashion MNIST - Loading the data

Let us load the Fashion MNIST dataset from Cloudxlab's below mentioned folder location (this dataset is copied from Zalando Research repository).

Location - '/cxldata/datasets/project/fashion-mnist/'

You need to load the below 4 dataset files:

train-images-idx3-ubyte.gz - this contains the Training dataset
train-labels-idx1-ubyte.gz - this contains the Training labels (target dataset)
t10k-images-idx3-ubyte.gz - this contains the Test dataset
t10k-labels-idx1-ubyte.gz - this contains the Test labels

The class labels for Fashion MNIST are:

  Label    Description

    0         T-shirt/top
    1         Trouser 
    2         Pullover 
    3         Dress
    4         Coat
    5         Sandal 
    6         Shirt
    7         Sneaker
    8         Bag
    9         Ankle boot

Out datasets consists of 60,000 images and each image has 784 features. An image consists of 28x28 pixels, and each pixel is a value from 0 to 255 describing the pixel intensity. 0 for white and 255 for black.

INSTRUCTIONS

Please define following string variables to store the location path of the dataset files. The dataset file location path should contain the file name also (appended in the end of the path).

The below variable contains location path for Training dataset

filePath_train_set = << your code comes here >>

The below variable contains location path for Training labels (target dataset)

filePath_train_label = << your code comes here >>

The below variable contains location path for Test dataset

filePath_test_set = << your code comes here >>

The below variable contains location path for Test labels

filePath_test_label = << your code comes here >>

Please create variables - (trainLabel, trainSet, testLabel, testSet) - using the below mentioned code. You can copy the below code as it is.

with gzip.open(filePath_train_label, 'rb') as trainLbpath:
     trainLabel = np.frombuffer(trainLbpath.read(), dtype=np.uint8,
                               offset=8)
with gzip.open(filePath_train_set, 'rb') as trainSetpath:
     trainSet = np.frombuffer(trainSetpath.read(), dtype=np.uint8,
                               offset=16).reshape(len(trainLabel), 784)

with gzip.open(filePath_test_label, 'rb') as testLbpath:
     testLabel = np.frombuffer(testLbpath.read(), dtype=np.uint8,
                               offset=8)

with gzip.open(filePath_test_set, 'rb') as testSetpath:
     testSet = np.frombuffer(testSetpath.read(), dtype=np.uint8,
                               offset=16).reshape(len(testLabel), 784)

trainLabel - contains Training label (target dataset)

trainSet - contains Training dataset

testLabel - contains Test label

testSet - contains Test dataset

Please copy the values of above created variables - trainSet, testSet, trainLabel and testLabel - in new variables - X_train, X_test, y_train and y_test respectively.

To get a feel of the data, you can view the article image at say index 0 of the Training dataset(X_train) and its corresponding label in the Target dataset (y_train). You can use showImage() function, that we defined earlier, for the same, e.g. showImage(X_train[0]).

See Answer

Note - Having trouble with the assessment engine? Follow the steps listed here

Previous Index Next

Project - Classify Clothes from Fashion MNIST Dataset using Machine Learning Techniques

End to End ML Project - Fashion MNIST - Loading the data

XP

Loading comments...