Fashion-MNIST using Deep Learning with TensorFlow Keras

A few months back, I had presented results of my experiments with Fashion-MNIST using Machine Learning algorithms which you can find in the below mentioned blog:

https://cloudxlab.com/blog/fashion-mnist-using-machine-learning/

In the current article, I am presenting the results of my experiments with Fashion-MNIST using Deep Learning (Convolutional Neural Network – CNN) which I have implemented using  TensorFlow Keras APIs (version 2.1.6-tf).

The complete code for this project you can find here :

https://github.com/cloudxlab/ml/tree/master/projects/Fashion-MNIST

Fashion-MNIST is a dataset of Zalando’s fashion article images —consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each instance is a 28×28 grayscale image, associated with a label.

Fashion-MNIST dataset sample images
Fashion-MNIST dataset sample images

Objective

This work is part of my experiments with Fashion-MNIST dataset using Convolutional Neural Network (CNN) which I have implemented using TensorFlow Keras APIs(version 2.1.6-tf). The objective is to identify (predict) different fashion products from the given images using a CNN model. For regularization, I have used ‘dropout’ technique for this problem.

Acknowledgements

I have used Fashion-MNIST dataset for this experiment with Deep Learning. Fashion-MNIST dataset is a collection of fashion articles images provided by Zalando . Thanks to Zalando Research for hosting the dataset.

Understanding and Analysing the dataset

Fashion MNIST Training dataset consists of 60,000 images and each image has 784 features (i.e. 28×28 pixels). Each pixel is a value from 0 to 255, describing the pixel intensity. 0 for white and 255 for black.

The class labels for Fashion MNIST are:

Label  Description
0T-shirt/top
1Trouser
2Pullover
3Dress
4Coat
5Sandal
6Shirt
7Sneaker
8Bag
9Ankle boot

Let us have a look at one instance (an article image), say at index 220, of the training dataset.

One sample Fashion-MNIST image (at index 220)
One sample Fashion-MNIST image (at index 220)

So, we see that image at index (instance no.) 220 is a bag, and the corresponding label value also indicates the same (8 – Bag).

Problem Definition

The ‘target’ dataset has 10 class labels, as we can see from above (0 – T-shirt/top, 1 – Trouser,,….9 – Ankle Boot).

Given the images of the articles, we need to classify them into one of these classes, hence, it is essentially a ‘Multi-class Classification’ problem.

We will be using CNN to come up with a model for this problem and will use “Accuracy” as the performance measure.

Preparing the Data

We already have the splitted dataset (training and test) available in ratio 85:15 (60,000:10,000) from Zalando Research, we will use the same for this experiment.

As part of data preparation, following techniques were applied on the dataset:

    1. Shuffling
    2. Normalization

We shuffle the training dataset to get uniform samples for cross-validation. This also  ensures that we don’t miss out any digit in a cross-validation fold.

Each image (instance) in the dataset has 784 pixels (features) and value of each feature(pixel) ranges from 0 to 255, this range is too wide, hence we have performed Normalization on the training and test dataset, by dividing the pixels by 255, so that values of all features (pixels) are in a small range (0 to 1).

CNN Model Architecture

Below is the architecture of my final CNN model. It has 3 convolutional layers, 2 max. Pool layers, 2 dropout layers, 1 fully connected (dense) layer and 1 output (softmax) layer. Apart from these, it also has a flatten layer whose purpose is just to ‘flatten’ the output, i.e. convert a 2-D output to a 1-D output (which is then fed to the dense layer).

CNN Model Architecture for Fashion-MNIST
CNN Model Architecture for Fashion-MNIST
CNN Model Summary
CNN Model Summary

Training, Validation and Test Results

During the training phase itself the cross validation (validation dataset = 10% of training dataset) and hyperparameter tuning were performed on the training dataset.

First, I tried with a CNN model without any regularization technique applied to it, below table shows the results of training and cross validation (validation) for the same.

Deep learning models have a high tendency to overfit (perform well on training dataset than on the validation/test dataset), the same can be seen in the below results.

Results of CNN model without applying Regularization

Results for CNN Model without applying regularization
Results for CNN Model without applying regularization

A training accuracy of 99% and test accuracy of 92% confirms that model is overfitting.

To solve the model overfitting issue, I applied regularization technique called ‘Dropout’ and also introduced a few more max. pool layers. The architecture diagram for this CNN model is shown above (under section – CNN Model Architecture). Below are the results for the same.

Results of the CNN model with regularization (dropout) applied

Results for CNN Model with regularization(dropout) applied
Results for CNN Model with regularization(dropout) applied

A training accuracy value of 94% and test accuracy of 93% confirms that model is performing fine and there is no overfitting. Thus, this is our final CNN model.

Please note that ‘Dropout’ should only be applied during the training phase and not during the test phase. For the test phase, TensorFlow Keras API (evaluate() method) automatically takes care of this internally, we needn’t specify any parameter for this (to not apply Dropout).

Below are the plots for ‘Accuracy’ and ‘Loss’ for training and validation(test) phases for this final CNN model.

'Accuracy' plot for CNN Model for training and validation(test) dataset
‘Accuracy’ plot for CNN Model for training and validation(test) dataset
'Loss' plot for CNN Model for training and validation(test) dataset
‘Loss’ plot for CNN Model for training and validation(test) dataset

Let us make a prediction for one of the instances/images (say instance no. 88) using our final CNN model (new_model).

Thus, we see that, our CNN model has predicted it right, actual instance (image) at instance no. 88 is a T-shirt and our predicted value also confirms the same (test labels[88] = 0; test_images[88] – T-shirt).

Conclusion

With our final CNN model, we could achieve a training accuracy of 94% and test accuracy of 93% confirming that model is fine with no overfitting.

If you remember, with Machine Learning model (XGBoost) I had achieved a test accuracy of 84.72 %, and with Deep Learning model (CNN) here I could achieve a test accuracy of 93 %. Thus, we got around 8% improvement in accuracy by using Deep Learning.

Though, in this case, we got a good improvement in accuracy score (8%), still there may be a chance to improve performance further, by say, increasing the number of convolutional layers (and neurons/filters) or trying out different combinations of different layers.

The complete code for this project you can find here : https://github.com/cloudxlab/ml/tree/master/projects/Fashion-MNIST

For the complete course on Machine Learning, please visit Specialization Course on Machine Learning & Deep Learning


Things to Consider While Managing Machine Learning Projects

Generally, Machine Learning (or Deep Learning) projects are quite unique and also different from traditional web application projects due to the inherent complexity involved with them.

The goal of this article is, not to go through full project management life cycle, but to discuss a few complexities and finer points which may impact different project management phases and aspects of a Machine Learning(or Deep Learning) project, and, which should be taken care of, to avoid any surprises later.

Below is a quick ready reckoner for the topics that we will be discussing in this article.

Project Management - ML Project
Project Management – ML Project

‘Machine Learning’ term in this article means both – ‘Machine Learning’ and ‘Deep Learning’.

Continue reading “Things to Consider While Managing Machine Learning Projects”

Conference on Computer Vision at Google Asia, Singapore

The deep learning algorithms and frameworks have changed the approach to computer vision entirely. With the recent development in computer vision with Convolutional Neural Networks such as Yolo, a new era has begun. It would open doors to new industries as well as personal applications.

After the successful bootcamps held at IIT Bombay, NUS Singapore, RV College of Engineering, etc, CloudxLab in collaboration with IoTSG and Google Asia conducted a successful conference on Understanding Computer Vision with AI using Tensorflow on May 11, 2019, at Google Asia, Singapore office.

Continue reading “Conference on Computer Vision at Google Asia, Singapore”

Creating AI Based Cameraman

Whenever we have our live talks of CloudxLab, in presentations or in a conference, we want to live stream and record it. The main challenge that occurs is the presenter gets out of focus as the presenter moves. And for us, hiring a cameraman for three hours of a session is not a viable option. So, we thought of creating an AI-based pan and tilt platform which will keep the camera focussed on speaker.

So, Here are the step-by-step instructions to create such a camera along with the code needed.

Continue reading “Creating AI Based Cameraman”

Regression with Neural Networks using TensorFlow Keras API

As part of this blog post, I am going to walk you through how an Artificial Neural Network figures out a complex relationship in data by itself without much of our hand-holding. You should modify the data generation function and observe if it is able to predict the result correctly. I am going to use the Keras API of TensorFlow. Keras API makes it really easy to create Deep Learning models.

Machine learning is about computer figuring out relationships in data by itself as opposed to programmers figuring out and writing code/rules. Machine learning generally is categorized into two types: Supervised and Unsupervised. In supervised, we have the supervision available. And supervised learning is further classified into Regression and Classification. In classification, we have training data with features and labels and the machine should learn from this training data on how to label a record. In regression, the computer/machine should be able to predict a value – mostly numeric. An example of Regression is predicting the salary of a person based on various attributes: age, years of experience, the domain of expertise, gender.

The notebook having all the code is available here on GitHub as part of cloudxlab repository at the location deep_learning/tensorflow_keras_regression.ipynb . I am going to walk you through the code from this notebook here.

Generate Data: Here we are going to generate some data using our own function. This function is a non-linear function and a usual line fitting may not work for such a function

Continue reading “Regression with Neural Networks using TensorFlow Keras API”

Deploying Machine Learning model in production

In this article, I am going to explain steps to deploy a trained and tested Machine Learning model in production environment.

Though, this article talks about Machine Learning model, the same steps apply to Deep Learning model too.

Below is a typical setup for deployment of a Machine Learning model, details of which we will be discussing in this article.

Process to build and deploy a REST service (for ML model) in production
Process to build and deploy a REST service (for ML model) in production

The complete code for creating a REST service for your Machine Learning model can be found at the below link:

https://github.com/cloudxlab/ml/tree/master/projects/deploy_mnist

Let us say, you have trained, fine-tuned and tested Machine Learning(ML) model – sgd_clf, which was trained and tested using SGD Classifier on MNIST dataset.  And now you want to deploy it in production, so that consumers of this model could use it. What are different options you have to deploy your ML model in production?

Continue reading “Deploying Machine Learning model in production”

One-on-one discussion on Gradient Descent

Usually, the learners from our classes schedule 1-on-1 discussions with the mentors to clarify their doubts. So, thought of sharing the video of one of these 1-on-1 discussions that one of our CloudxLab learner – Leo – had with Sandeep last week.

Below are the questions from the same discussion.

You can go through the detailed discussion which happened around these questions, in the attached video below.

One-on-one discussion with Sandeep on Gradient Descent
Continue reading “One-on-one discussion on Gradient Descent”

How To Optimise A Neural Network?

When we are solving an industry problem involving neural networks, very often we end up with bad performance. Here are some suggestions on what should be done in order to improve the performance.

Is your model underfitting or overfitting?

You must break down the input data set into two parts – training and test. The general practice is to have 80% for training and 20% for testing.

You should train your neural network with the training set and test with the testing set. This sounds like common sense but we often skip it.

Compare the performance (MSE in case of regression and accuracy/f1/recall/precision in case of classification) of your model with the training set and with the test set.

If it is performing badly for both test and training it is underfitting and if it is performing great for the training set but not test set, it is overfitting.

Continue reading “How To Optimise A Neural Network?”