Fashion-MNIST using Machine Learning

One of the classic problem that has been used in the Machine Learning world for quite sometime is the MNIST problem. The objective is to identify the digit based on image. But MNIST is not very great problem because we come up with great accuracy even if we are looking at few pixels in the image. So, another common example problem against which we test algorithms is Fashion-MNIST.

The complete code for this project you can find here :

Fashion-MNIST is a dataset of Zalando’s fashion article images —consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each instance is a 28×28 grayscale image, associated with a label.

Fashion MNIST dataset images
Fashion-MNIST dataset images


This work is part of my experiments with Fashion-MNIST dataset using various Machine Learning algorithms/models. The objective is to identify (predict) different fashion products from the given images using various best possible Machine Learning Models (Algorithms) and compare their results (performance measures/scores) to arrive at the best ML model. I have also experimented with ‘dimensionality reduction’ technique for this problem.


I have used Fashion-MNIST dataset for this experiment with Machine Learning. Fashion-MNIST dataset is a collection of fashion articles images provided by Zalando . Thanks to Zalando Research for hosting the dataset.

Understanding and Analysing the dataset

Fashion MNIST Training dataset consists of 60,000 images and each image has 784 features (i.e. 28×28 pixels). Each pixel is a value from 0 to 255, describing the pixel intensity. 0 for white and 255 for black.

The class labels for Fashion MNIST are:

9Ankle boot

Let us have a look at one instance (an article image) of the training dataset.

A sample instance(image) from Fashion-MNIST dataset.
sample image Fashion-MNIST

Problem Definition

The ‘target’ dataset has 10 class labels, as we can see from above (0 – T-shirt/top, 1 – Trouser,,….9 – Ankle Boot).

Given the images of the articles, we need to classify them into one of these classes, hence, it is essentially a ‘Multi-class Classification’ problem.

We will be using various Classifiers and comparing their results/scores.

Preparing the Data

We already have the splitted dataset (training and test) available in ratio 85:15 (60,000:10,000) from Zalando Research, we will use the same for this experiment.

As part of data preparation, following techniques were applied on the dataset:

      • Shuffling
      • Feature Scaling

We shuffle the training dataset to get uniform samples for cross-validation. This also  ensures that we don’t miss out any article in a cross-validation fold.

Each image (instance) in the dataset has 784 pixels (features) and value of each feature(pixel) ranges from 0 to 255, this range is too wide, hence we have performed feature scaling (using Standardization) on the training dataset, so that values of all features (pixels) are in a small range.

The scaled dataset is created using the below formula:

This essentially means that we calculate how many standard deviation away is each point from the mean.

Please note that scaling is not needed for Decision Tree based ML algorithms which also includes Random Forest and XGBoost.

Training various ML Models

Based on the problem type (multi-class classification), various relevant classifiers were used to train the model.

Training dataset was trained on following ML Algorithms:

      1. SGD Classifier
      2. Softmax Regression
      3. Decision Tree Classifier
      4. Random Forest Classifier
      5. Ensemble with soft voting (with Softmax Regression and Random Forest Classifier)
      6. XGBoost Classifier

Since, accuracy score is not relevant for skewed dataset, and in future we may get skewed dataset, I decided to go for other scores too like – Precision, Recall, F1 score.

Validating the Training Results

During the training phase, I validated the results using k-fold cross-validation. Generally, 10 folds (cv=10) are used, but,  in this case, I used only 3 folds (cv=3), just to reduce the cross-validation time.

Below are the results for various ML models (algorithms) from cross-validation:

Fashion MNIST training dataset results for various ML algorithms
F-MNIST training

Standard deviation in all cases was around 0.0020, except for XGBoost (standard deviation – 0.00063).

Cross-validation was used to find the proper score of each model, and also to ensure that the model is not overfitting or underfitting.

From the cross-validation results above, we see that XGBoost model performs best in the training phase, hence, it was selected for the next steps of this problem.

Fine-Tuning the Selected Model

For fine-tuning of the selected XGBoost model, I used Grid Search technique.

Grid search process on training dataset (with 784 features) was taking a lot of time, hence I decided to go for Dimensionality Reduction (to get the reduced training dataset) to reduce the grid search and prediction time.

For dimensionality reduction, since it’s not Swiss-roll kind of data, I decided to go for PCA for this dataset.

I tried various values of variance ratio (0.95, 0.97, 0.99), but, since, 0.99 variance ratio gave enough number of features(459 features out of original 784 features) without losing any significant information (quality), I selected variance ratio of 0.99for performing the dimensionality reduction on the dataset.

Grid search was performed on the selected XGBoost model, on reduced training dataset with below set of hyperparameters:

Below are the grid search results:

Best Parameters:

Best Estimator XGBClassifier with following parameter values:

Evaluating the Final Model on Test Dataset

I have used the final model, that we got from grid search (best estimator) above, for predictions on the reduced test dataset (got after applying dimensionality reduction with 0.99 variance ratio).

Following are the results for the test dataset:

Fashion MNIST test dataset results
F-MNIST test

Below are the Results from training dataset that we got earlier:

Fashion MNIST training dataset results for final model
F-MNIST training final


Difference between results (say accuracy score) of test dataset and training dataset is

As we see, the difference between scores of test dataset and training dataset is very low (3.3%), hence, the conclusion is that our final model (XGBoost) is good enough, and it doesn’t have any overfitting or underfitting.

Although XGBoost (with n_estimators=20 and max_depth = 10) is good enough, there may be a chance to improve this model further, by say, increasing the number of estimators and trying out some more hyperparameters.

As we see above, Ensemble also has given good results, we can try Ensemble with some more models and with some more hyperparameters to improve the results further.

This experiment was limited to Machine Learning algorithms. You can try Deep Learning techniques (say CNN, etc.) to improve the results further.

I didn’t try SVM for this problem, because for larger dataset, SVM (kernel=poly) takes a long time to train.

Again, for your reference, the complete code for this project you can find here :

For the complete course on Machine Learning, please visit Specialization Course on Machine Learning & Deep Learning

One-on-one discussion on Gradient Descent

Usually, the learners from our classes schedule 1-on-1 discussions with the mentors to clarify their doubts. So, thought of sharing the video of one of these 1-on-1 discussions that one of our CloudxLab learner – Leo – had with Sandeep last week.

Below are the questions from the same discussion.

You can go through the detailed discussion which happened around these questions, in the attached video below.

One-on-one discussion with Sandeep on Gradient Descent

Q.1. In the Life Cycle of Node Value chapter, please explain me the below line of code. What are zz_v, z_v and y_v values ?

Ans.1. The complete code looks like below:[zz, z, y]) expression above is basically evaluating zz, z and y, and is returning their evaluated values, which we are storing in variables zz_v, z_v and y_v variables respectively.  Basically, the the run function of session object returns the same data type as passed in the first argument. Here, the run method is returning an array of values.

TensorFlow is a Python library and in Python, we can return multiple values from a function in the form of a tuple. Here is a simple example:

Same thing is happening here, is returning multiple values which are being stored in variables zz_v, z_v and y_v respectively.

So, the evaluated value of zz is stored in variable zz_v, evaluated value of variable z in z_v and evaluated value of variable y in variable y_z.

Q.2. In Linear Regression chapter, we are using housing price dataset. How do we know that the model for housing price dataset is a linear equation ? Is it just an assumption ?

Ans.2. Yes, it is an assumption that model for housing price dataset is a linear equation.

We can use linear equation for a non-linear problem also.

We convert most of the non-linear problems into a linear problem by using polynomial features.

Even Polynomial Regression problem is solved using Linear Regression by converting a non-linear problem to linear problem by adding polynomial features.

Suppose your equation is

where x1 and x2 are polynomial features and

ϴ0, ϴ1, ϴ2, …. etc are weights or also called coefficients.

In Linear Regression, when Gradient Descent is applied on this equation, weights ϴ1 and ϴ3 will go down to 0 (zero) and weight ϴ2 will become bigger. Hence, at the end of Gradient Descent, our above equation will look like below i.e. we get a non-linear equation

Q.3. Equations of Gradient and Gradient Descent, I don’t understand them

Equation for Gradient for Linear Regression


The below equation is for calculating the Gradient

Equation for Gradient for Linear Regression

MSE is ‘Mean Squared Error’

m is total number of instances.

X dataset is a matrix with ‘n’ columns (features) and ‘m’ rows (instances).

y is a vector (containing  actual values of label) with ‘m’ rows and 1 column

y^ is a also a vector (containing predicted values of label) with ‘m’ rows and 1 column

Therefore, we get,

Equation for Gradient for Linear Regression

Below equation is for calculating the Gradient Descent

Gradient Descent Equation for Linear Regression

η  is the learning rate here.

ϴ is an array of theta values.

 is also an array of values, and is called the Gradient or the rate of change of error (E).

If the Gradient increases, we need to decrease the ϴ, and if the Gradient decreases, we need to increase the ϴ. Eventually, we need to move towards making the Gradient equal to 0 (zero) or nearly 0.

Q.4. In Gradient Descent, what we can do to avoid getting stuck in local minima ?

Ans.4. You can use Stochastic Gradient Descent to avoid getting stuck in local minima.

You can find more details about this in our Machine Learning course.

For the complete course on Machine Learning, please visit Specialization Course on Machine Learning & Deep Learning

Use-cases of Machine Learning in E-Commerce

What computing did to the usual industry earlier, Machine Learning is doing the same to usual rule-based computing now. It is eating the market of the same. Earlier, in organizations, there used to be separate groups for Image Processing, Audio Processing, Analytics and Predictions. Now, these groups are merged because machine learning is basically overlapping with every domain of computing. Let us discuss how machine learning is impacting e-commerce in particular.

The first use case of Machine Learning that became really popular was Amazon Recommendations. Afterwards, the Netflix launched a challenge of Movie Recommendations which gave birth to Kaggle, now an online platform of various machine learning challenges.

Before I dive deep into the details further, lets quickly brief the terms that are found often confusing. AI stands for Artificial Intelligence which means being able to display human-like intelligence. AI is basically an objective. Machine learning is making computers learn based on historical or empirical data instead of explicitly writing the rules. Artificial Neural networks are the computing constructs designed on a similar structure like the animal brain. Deep Learning is a branch of machine learning where we use a complex Artificial Neural network for predictions.

Continue reading “Use-cases of Machine Learning in E-Commerce”

Top Machine Learning Interview Questions for 2018 (Part-1)


These Machine Learning Interview Questions, are the real questions that are asked in the top interviews.

For hiring machine learning engineers or data scientists, the typical process has multiple rounds.

  1. A basic screening round – The objective is to check the minimum fitness in this round.
  2. Algorithm Design Round – Some companies have this round but most don’t. This involves checking the coding / algorithmic skills of the interviewee.
  3. ML Case Study – In this round, you are given a case study problem of machine learning on the lines of Kaggle. You have to solve it in an hour.
  4. Bar Raiser / Hiring Manager  – This interview is generally with the most senior person in the team or a very senior person from another team (at Amazon it is called Bar raiser round) who will check if the candidate fits in the company-wide technical capabilities. This is generally the last round.

Continue reading “Top Machine Learning Interview Questions for 2018 (Part-1)”

Scholarship Test for Machine Learning Course

After receiving a huge response in our last scholarship test, we are once again back with a basic conceptual test to attain scholarship for our upcoming Specialization course on Machine Learning and Deep Learning.

Concepts to be tested: Linear algebra, probability theory, statistics, multivariable calculus, algorithms and complexity, aptitude and Data Interpretation.

  • Date and Time: September 2, 2018, 8:00 am PDT (8:30 pm IST)
  • Type: objective (MCQ)
  • Number of questions: 25
  • Duration: 90 minutes
  • Mode: Online

If you have a good aptitude and general problem-solving skills, this test is for you. So, go ahead and earn what you deserve.

If you have any questions on the test or if anything else comes up, just click here to let us know. We’re always happy to help.


How To Optimise A Neural Network?

When we are solving an industry problem involving neural networks, very often we end up with bad performance. Here are some suggestions on what should be done in order to improve the performance.

Is your model underfitting or overfitting?

You must break down the input data set into two parts – training and test. The general practice is to have 80% for training and 20% for testing.

You should train your neural network with the training set and test with the testing set. This sounds like common sense but we often skip it.

Compare the performance (MSE in case of regression and accuracy/f1/recall/precision in case of classification) of your model with the training set and with the test set.

If it is performing badly for both test and training it is underfitting and if it is performing great for the training set but not test set, it is overfitting.

Continue reading “How To Optimise A Neural Network?”

Machine Learning with Mahout

[This blog is from It is pretty old. Many things have changed since then. People have moved to MLLib. We have also moved to]

What is Machine Learning?

Machine Learning is programming computers to optimize a Performance using example data or past experience, it is a branch of Artificial Intelligence.

Types of Machine Learning

Machine learning is broadly categorized into three buckets:

  • Supervised Learning – Using Labeled training data, to create a classifier that can predict the output for unseen inputs.
  • Unsupervised Learning – Using Unlabeled training data to create a function that can predict the output.
  • Semi-Supervised Learning – Make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data.

Machine Learning Applications

  • Recommend Friends, Dates, Products to end-user.
  • Classify content into pre-defined groups.
  • Find Similar content based on Object Properties.
  • Identify key topics in large Collections of Text.
  • Detect Anomalies within given data.
  • Ranking Search Results with User Feedback Learning.
  • Classifying DNA sequences.
  • Sentiment Analysis/ Opinion Mining
  • Computer Vision.
  • Natural Language Processing,
  • BioInformatics.
  • Speech and HandWriting Recognition.


Mahout – Keeper/Driver of Elephants. Mahout is a Scalable Machine Learning Library built on Hadoop, written in Java and its Driven by Ng et al.’s paper “MapReduce for Machine Learning on Multicore”. Development of Mahout Started as a Lucene sub-project and it became Apache TLP in Apr’10.

Topics Covered

  • Introduction to Machine Learning and Mahout
  • Machine Learning- Types
  • Machine Learning- Applications
  • Machine Learning- Tools
  • Mahout – Recommendation Example
  • Mahout – Use Cases
  • Mahout Live Example
  • Mahout – Other Recommender Algos

Machine Learning with Mahout Presentation

Machine Learning with Mahout Video

AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP

Can a machine create quiz which is good enough for testing a person’s knowledge of a subject?

So, last Friday, we wrote a program which can create simple ‘Fill in the blank’ type questions based on any valid English text.

This program basically figures out sentences in a text and then for each sentence it would first try to delete a proper noun and if there is no proper noun, it deletes a noun.

We are using textblob which is basically a wrapper over NLTK – The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.

Continue reading “AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP”