Fashion-MNIST using Deep Learning with TensorFlow Keras

A few months back, I had presented results of my experiments with Fashion-MNIST using Machine Learning algorithms which you can find in the below mentioned blog:

https://cloudxlab.com/blog/fashion-mnist-using-machine-learning/

In the current article, I am presenting the results of my experiments with Fashion-MNIST using Deep Learning (Convolutional Neural Network – CNN) which I have implemented using  TensorFlow Keras APIs (version 2.1.6-tf).

The complete code for this project you can find here :

https://github.com/cloudxlab/ml/tree/master/projects/Fashion-MNIST

Fashion-MNIST is a dataset of Zalando’s fashion article images —consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each instance is a 28×28 grayscale image, associated with a label.

Fashion-MNIST dataset sample images
Fashion-MNIST dataset sample images

Objective

This work is part of my experiments with Fashion-MNIST dataset using Convolutional Neural Network (CNN) which I have implemented using TensorFlow Keras APIs(version 2.1.6-tf). The objective is to identify (predict) different fashion products from the given images using a CNN model. For regularization, I have used ‘dropout’ technique for this problem.

Acknowledgements

I have used Fashion-MNIST dataset for this experiment with Deep Learning. Fashion-MNIST dataset is a collection of fashion articles images provided by Zalando . Thanks to Zalando Research for hosting the dataset.

Understanding and Analysing the dataset

Fashion MNIST Training dataset consists of 60,000 images and each image has 784 features (i.e. 28×28 pixels). Each pixel is a value from 0 to 255, describing the pixel intensity. 0 for white and 255 for black.

The class labels for Fashion MNIST are:

Label  Description
0T-shirt/top
1Trouser
2Pullover
3Dress
4Coat
5Sandal
6Shirt
7Sneaker
8Bag
9Ankle boot

Let us have a look at one instance (an article image), say at index 220, of the training dataset.

One sample Fashion-MNIST image (at index 220)
One sample Fashion-MNIST image (at index 220)

So, we see that image at index (instance no.) 220 is a bag, and the corresponding label value also indicates the same (8 – Bag).

Problem Definition

The ‘target’ dataset has 10 class labels, as we can see from above (0 – T-shirt/top, 1 – Trouser,,….9 – Ankle Boot).

Given the images of the articles, we need to classify them into one of these classes, hence, it is essentially a ‘Multi-class Classification’ problem.

We will be using CNN to come up with a model for this problem and will use “Accuracy” as the performance measure.

Preparing the Data

We already have the splitted dataset (training and test) available in ratio 85:15 (60,000:10,000) from Zalando Research, we will use the same for this experiment.

As part of data preparation, following techniques were applied on the dataset:

    1. Shuffling
    2. Normalization

We shuffle the training dataset to get uniform samples for cross-validation. This also  ensures that we don’t miss out any digit in a cross-validation fold.

Each image (instance) in the dataset has 784 pixels (features) and value of each feature(pixel) ranges from 0 to 255, this range is too wide, hence we have performed Normalization on the training and test dataset, by dividing the pixels by 255, so that values of all features (pixels) are in a small range (0 to 1).

CNN Model Architecture

Below is the architecture of my final CNN model. It has 3 convolutional layers, 2 max. Pool layers, 2 dropout layers, 1 fully connected (dense) layer and 1 output (softmax) layer. Apart from these, it also has a flatten layer whose purpose is just to ‘flatten’ the output, i.e. convert a 2-D output to a 1-D output (which is then fed to the dense layer).

CNN Model Architecture for Fashion-MNIST
CNN Model Architecture for Fashion-MNIST
CNN Model Summary
CNN Model Summary

Training, Validation and Test Results

During the training phase itself the cross validation (validation dataset = 10% of training dataset) and hyperparameter tuning were performed on the training dataset.

First, I tried with a CNN model without any regularization technique applied to it, below table shows the results of training and cross validation (validation) for the same.

Deep learning models have a high tendency to overfit (perform well on training dataset than on the validation/test dataset), the same can be seen in the below results.

Results of CNN model without applying Regularization

Results for CNN Model without applying regularization
Results for CNN Model without applying regularization

A training accuracy of 99% and test accuracy of 92% confirms that model is overfitting.

To solve the model overfitting issue, I applied regularization technique called ‘Dropout’ and also introduced a few more max. pool layers. The architecture diagram for this CNN model is shown above (under section – CNN Model Architecture). Below are the results for the same.

Results of the CNN model with regularization (dropout) applied

Results for CNN Model with regularization(dropout) applied
Results for CNN Model with regularization(dropout) applied

A training accuracy value of 94% and test accuracy of 93% confirms that model is performing fine and there is no overfitting. Thus, this is our final CNN model.

Please note that ‘Dropout’ should only be applied during the training phase and not during the test phase. For the test phase, TensorFlow Keras API (evaluate() method) automatically takes care of this internally, we needn’t specify any parameter for this (to not apply Dropout).

Below are the plots for ‘Accuracy’ and ‘Loss’ for training and validation(test) phases for this final CNN model.

'Accuracy' plot for CNN Model for training and validation(test) dataset
‘Accuracy’ plot for CNN Model for training and validation(test) dataset
'Loss' plot for CNN Model for training and validation(test) dataset
‘Loss’ plot for CNN Model for training and validation(test) dataset

Let us make a prediction for one of the instances/images (say instance no. 88) using our final CNN model (new_model).

Thus, we see that, our CNN model has predicted it right, actual instance (image) at instance no. 88 is a T-shirt and our predicted value also confirms the same (test labels[88] = 0; test_images[88] – T-shirt).

Conclusion

With our final CNN model, we could achieve a training accuracy of 94% and test accuracy of 93% confirming that model is fine with no overfitting.

If you remember, with Machine Learning model (XGBoost) I had achieved a test accuracy of 84.72 %, and with Deep Learning model (CNN) here I could achieve a test accuracy of 93 %. Thus, we got around 8% improvement in accuracy by using Deep Learning.

Though, in this case, we got a good improvement in accuracy score (8%), still there may be a chance to improve performance further, by say, increasing the number of convolutional layers (and neurons/filters) or trying out different combinations of different layers.

The complete code for this project you can find here : https://github.com/cloudxlab/ml/tree/master/projects/Fashion-MNIST

For the complete course on Machine Learning, please visit Specialization Course on Machine Learning & Deep Learning


Things to Consider While Managing Machine Learning Projects

Generally, Machine Learning (or Deep Learning) projects are quite unique and also different from traditional web application projects due to the inherent complexity involved with them.

The goal of this article is, not to go through full project management life cycle, but to discuss a few complexities and finer points which may impact different project management phases and aspects of a Machine Learning(or Deep Learning) project, and, which should be taken care of, to avoid any surprises later.

Below is a quick ready reckoner for the topics that we will be discussing in this article.

Project Management - ML Project
Project Management – ML Project

‘Machine Learning’ term in this article means both – ‘Machine Learning’ and ‘Deep Learning’.

Continue reading “Things to Consider While Managing Machine Learning Projects”

Deploying Machine Learning model in production

In this article, I am going to explain steps to deploy a trained and tested Machine Learning model in production environment.

Though, this article talks about Machine Learning model, the same steps apply to Deep Learning model too.

Below is a typical setup for deployment of a Machine Learning model, details of which we will be discussing in this article.

Process to build and deploy a REST service (for ML model) in production
Process to build and deploy a REST service (for ML model) in production

The complete code for creating a REST service for your Machine Learning model can be found at the below link:

https://github.com/cloudxlab/ml/tree/master/projects/deploy_mnist

Let us say, you have trained, fine-tuned and tested Machine Learning(ML) model – sgd_clf, which was trained and tested using SGD Classifier on MNIST dataset.  And now you want to deploy it in production, so that consumers of this model could use it. What are different options you have to deploy your ML model in production?

Continue reading “Deploying Machine Learning model in production”

Fashion-MNIST using Machine Learning

One of the classic problem that has been used in the Machine Learning world for quite sometime is the MNIST problem. The objective is to identify the digit based on image. But MNIST is not very great problem because we come up with great accuracy even if we are looking at few pixels in the image. So, another common example problem against which we test algorithms is Fashion-MNIST.

The complete code for this project you can find here : https://github.com/cloudxlab/ml/tree/master/projects/Fashion-MNIST

Fashion-MNIST is a dataset of Zalando’s fashion article images —consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each instance is a 28×28 grayscale image, associated with a label.

Continue reading “Fashion-MNIST using Machine Learning”

One-on-one discussion on Gradient Descent

Usually, the learners from our classes schedule 1-on-1 discussions with the mentors to clarify their doubts. So, thought of sharing the video of one of these 1-on-1 discussions that one of our CloudxLab learner – Leo – had with Sandeep last week.

Below are the questions from the same discussion.

You can go through the detailed discussion which happened around these questions, in the attached video below.

One-on-one discussion with Sandeep on Gradient Descent
Continue reading “One-on-one discussion on Gradient Descent”