Deploying Machine Learning model in production

In this article, I am going to explain steps to deploy a trained and tested Machine Learning model in production environment.

Though, this article talks about Machine Learning model, the same steps apply to Deep Learning model too.

Below is a typical setup for deployment of a Machine Learning model, details of which we will be discussing in this article.

Process to build and deploy a REST service (for ML model) in production
Process to build and deploy a REST service (for ML model) in production

The complete code for creating a REST service for your Machine Learning model can be found at the below link:

Let us say, you have trained, fine-tuned and tested Machine Learning(ML) model – sgd_clf, which was trained and tested using SGD Classifier on MNIST dataset.  And now you want to deploy it in production, so that consumers of this model could use it. What are different options you have to deploy your ML model in production?

Before you start thinking about the deployment of your ML model in production, there are a few more tasks that needs to be performed.

There may be more steps involved, depending on what specific requirements you have, but below are some of the main steps:

      • Packaging your ML model
      • Securing your packaged ML model
      • Planning of how to serve/expose your ML model to the consumers (as a REST service, etc.)
      • If you have planned for a REST service, then, creating a REST API for your ML model
      • Securing your REST API
      • Deploying your REST service in production (using Docker and Kubernetes)

Packaging your ML model

Instead of saving the ML model as it is, you can package your ML model (say mnist), and create a .pkl file for it, using Python’s joblib library.

For creating and restoring the .pkl file, you can either use joblib library or pickle library of Python. You normally use joblib to save an object having large data, else, you use pickle library. Here, in this case, we have used joblib library.

.pkl file is nothing but a serialized pickle file, which if you want, you can compress it further, to save storage space, using say Python’s gzip library. After you apply compression, your ML model file name will look like – mnist.pkl.gz

Securing your packaged ML model

The content of your pickle file (in this case your ML model) could be intercepted and modified by anyone over the network if your network is not secured. Hence, it is advisable to use secured (encrypted) network connection while exchanging the pickle file.

In addition to this, the pickle file could be signed (using ‘cryptographic signature’) before storing or transmitting, and this signature can be verified before it is restored at the receiver’s end (say your REST API). Cryptographic signature helps in detecting any alterations to your pickle file data.

Cryptographic signature uses a cryptographic algorithm which generates a cryptographic hash of your pickle file data along with shared secret key. SHA-1 cryptographic algorithm is considered to be the best algorithm to create a stronger hash, hence, it is highly advisable to use it.

Options to deploy your ML model in production

First option

One way to deploy your ML model is, simply save the trained and tested ML model (sgd_clf), with a proper relevant name (e.g. mnist), in some file location on the production machine. The consumers can read (restore) this ML model file (mnist.pkl) from this file location and start using it to make predictions on their dataset.

But, simply deploying your ML model file on the production machine may not be sufficient, as only a handful of consumers who have access to your production machine, will be able to use it.

Second option

In most of the cases, the consumers of your model may not limited to your team members, who have access to your production machine. There may be consumers, who are from different departments (and located globally) who don’t have access to your production environment. Also, if you are building the ML model to be consumed “as a service” by public (anyone), then, in that case also your consumers will not have access to your production environment.

In this case, where the consumers don’t have access to your production environment, how do you make your deployed ML model available to them?

The answer is – exposing your deployed ML model to the consumers, as a service (say REST service or REST API).

REST API increases the reach of your ML model to wider audience and as it can be called from any Application – mobile app, Java application, web application, .Net application, PHP/javascript, Python etc.

The ‘second option’ option seems to be a good option, if your audience is globally located and you want to provide this ML model as a service to a wider audience.
Hence, in this article, we will be focussing our discussion around this ‘second option’ – exposing your ML model as a REST service.

Tools and libraries to build REST API in Python

Most of the ML models are written using Python libraries now a days, hence, in this article, we will discuss about how to expose your ML model as a REST service using Python frameworks.

The most common Python framework, which is used to create a REST API in Python is, Flask.

Flask is a lightweight micro web framework written in Python, which can be used to create small web applications in Python.

Flask is not only a framework, but it also has a web server, which could be used to develop and test small Python web applications.

Flask also has APIs/functions using which you can create a REST API (service) in Python, and for testing this REST service, you can deploy in on Flask web server.

However, since, Flask is a lightweight web server, it should not be used in production environment, to deploy your web application or REST service. It should only be used in development environment, for development and testing of the web applications or REST API.

For production purposes, you can deploy this REST service (written using Python and Flask framework) on a more matured WSGI protocol compliant application server like Gunicorn or uWSGI along with Nginx as the web server.

If you are coming from Java background, then, you can think of:

      • Flask framework as something like Spring or Jersey framework – to create REST API
      • Nginx as like Apache Tomcat  – web server
      • uWSGI (or Gunicorn) as like JBoss or Websphere  – application server

Process to build and deploy a REST service (for ML model) in production
Process to build and deploy a REST service (for ML model) in production

Building (and testing) your REST API (service) using Flask framework

In your ML model training Python code, you can save your trained and tested ML model (say sgd_clf), using a proper file name, on a file location of your production application server using joblib library of Python, as shown below:

Here, ML model sgd_clf is being saved with a file name mnist_model.pkl in file location ‘trained_models/’ on the production machine. This ML model you can retrieve (restore) inside your REST API code to make the predictions on the input digit images.

Below is the code in Python (using Flask framework) to create a REST API (predict_image()) which predicts a digit from a given image (containing a handwritten digit).

Currently, this REST API just takes a ‘image file name’ (‘file’) as input (sent in the request)

The below line of code loads (retrieves) the stored (saved) ML model mnist_model.pkl from the file location ‘../trained_models/’ using the Scikit Learn’s (sklearn)  library joblib and stores it in a variable called model.

We can use this model object to make the ML predictions using its predict() function.

We use something called as decorators in Python (like we have annotations in Java/Spring) to modify behaviour of a function or a class.

Below, we are defining a decorator ( @app.route(…) ) for the predict_image() function, which says that any ‘POST’ request URL which matches ‘/predict’ pattern, should be redirected to this function – predict_image().

Thus, here, we are defining the ‘End Point’ for our REST API (predict_image()). The ‘End Point URL’ for your REST API will look like as below

To convert an input image to grayscale image, which is required by mnist ML model (as the model was trained on grayscale images), we can use convert() function of Image module of Python’s Pillow library.

Please don’t forget to pass value ‘L’ value to the convert() function.

Now, we can make predictions on this input image using the model object, using its predict() function.

We have used Flask’s jsonify() function to convert our ‘result’ of the ML model to a JSON object. Our REST API predict_image() returns result in JSON format.

This JSON object (‘digit’: result), you can use to retrieve the result in your calling application.

This REST API can be called from any application – mobile app, Java application, web application, .Net application, PHP/javascript, Python, etc. The only information your consumers need about this REST API is its ‘end point URL’ and the credentials to access it.

Using a web browser to test a web service isn’t the best idea since web browsers can’t generate all types of http requests easily. Instead, use ‘curl’command of unix to test your REST API.

Still some more work needs to be done, after you have created and deployed your REST API, you need to secure it. But, not all REST API functions may need to be secured.

You can use HTTP secure server (https://….) to encrypt all the communications, to make your REST service communications more secure.

You can secure your REST service using Flask extension called flask-httpauth. Using @auth.login_required decorator with your REST API functions, you can specify which functions (APIs) in the REST service are secured (protected).

You can use a ‘token’ based authentication for requests. In this type of authentication, the client application (which is calling this REST API), for the first request, sends the credentials to the REST service, and in return gets back a token, which it needs to send to the REST service in all its future requests. You can use any of the hashing algorithms (like SHA1) to create these tokens.

Also, you may get unauthorized access error thrown from these protected REST APIs (functions), which would need to be handled in the code.

In your REST API code file, you can write a separate function to handle these authorization errors. And, to specify, which function in your REST API code file, will handle these authorization errors, you can use @auth.error_handler decorator on top of such a function, as shown below:

Here, unauthorized() is the function that will be invoked if your protected REST API function throws an unauthorized access error. You can return a relevant error message from this function as a response.

There are many other security implementations available that you can use according to the level of your security requirements. The main idea is to secure your REST API.

Achieving Scalability and Fault Tolerance for your deployed ML model

When you are creating the deployment plan for your model, you may need to consider two important aspects – Scalability and Fault Tolerance.

To achieve fault tolerance and scalability for your REST service, you can use Docker and Kubernetes for the deployment.

Using Docker application, you can package your REST service code/application in a Docker container. For this you need to create a Dockerfile for your REST service application (code), mentioning the following:

      • the OS (operating system) requirement for your application, say CentOS 7, Ubuntu 16.04 LTS, etc.
      • images of required libraries, servers like – Python, Flask, Nginx, uWSGI
      • working directory of your application
      • pip install command to install all the dependent libraries and packages (e.g. Flask, Pillow, etc.) for your code as listed in your requirements.txt file.
      • Python command to create package file for your REST API application/code

You can use the above created Dockerfile to create a Docker container using ‘docker build’ command.

Now, after packaging your REST service in a Docker container, you can deploy and run your Docker containers on any machine or the VM (virtual machine).

Docker enables you to create new Docker containers (for your REST service). And Kubernetes can be used to deploy these Docker containers on a Kubernetes cluster.

Since, you can spin (create) new Docker containers on the fly (say when an existing container goes down or you need new containers due to increase in user requests), it provides you the required ‘fault tolerance’.

Kubernetes cluster (cluster of machines) provides the required ‘scalability’ for your REST service, it can spin (create) new Docker containers based on demand, using the Docker image. Kubernetes master node which also takes care of ‘fault tolerance’, if a container goes down, it spins another Docker container.

Kubernetes also has a load balancer mechanism, which takes care of distributing the load (user requests) on the containers.

Deployment architecture for REST service using Nginx, uWSGI, Flask, Docker and Kubernetes
Deployment architecture for REST service using Nginx, uWSGI, Flask, Docker and Kubernetes

Kubernetes, Spark MLLib and TensorFlow Distributed

Since, we are discussing about clusters (of machines) and Kubernetes, thought of sharing a small note on Kubernetes clusters and Spark clusters. Often, when we talk about Big Data and Spark MLLib, people get confused between Kubernetes clusters and Spark clusters. When to use which cluster?

To understand, when (and why) we use Kubernetes clusters and when we use Spark clusters, we need to understand their purpose of use.

Spark MLLib is basically a library of Spark, which has various Machine Learning algorithms (which are also available in Scikit Learn), customized to run on a Spark cluster i.e. using multiple machines. We use the ML algorithms from Spark MLLib library (in place of normal Scikit Learn version of ML algorithms), when our dataset is so huge that we need Big Data kind of processing to reduce the training and prediction time of our ML model.

For Deep Learning models, you can use TensorFlow Distributed instead of Spark MLLib.

Hence, in a nutshell, we use Spark MLLib on Spark cluster to reduce the training and prediction time of our ML model.

Whereas, we use Kubernetes cluster is used to achieve ‘scalability’, of the finally trained and tested ML model, when we deployed it on production.

When we get multiple requests simultaneously, Kubernetes spins new Docker containers (containing your ML model) and distributes the requests to the multiple containers to reduce the load.

Hence, after we have trained and tested our Spark MLLib Machine Learning model, using huge amount of data (Big Data), on a Spark cluster, we can package and deploy the same on a Kubernetes cluster in production.

But, in case, your requirement is to run Spark on a Kubernetes cluster of machines, you can do so, as Spark (version 2.3 onwards) supports this. You can create Docker containers for Spark nodes and deploy these Docker containers on Kubernetes cluster (of machines). Spark (version 2.3 and above) already comes with a Dockerfile that you can use for this purpose. For more details on this, you can refer Spark documentation.

Case Studies

Below are a few case studies for different types of deployment modes:

    1. Building a ‘You may also like it’ Amazon recommendations:
      1. Perform ML model training on Spark Cluster
      2. Perform prediction on the ML model on a scalable environment built using a mature application server and Kubernetes cluster.
    2. Say, on  a huge database like  LinkedIn, we need to find profile pictures which don’t have faces
      1. Train a Deep Learning Model using GPU etc.
      2. Run the prediction on the yarn cluster (Spark, EC2, etc.) or Kubernetes
    3. Build a small gadget that detects motions or tracks faces without any internet connectivity
      1. Train a Deep Learning Model using CPU/GPU etc. and push it to a device (e.g. mobile phone)
      2. Prediction on the device without any service. In this case you need to copy the model on the device itself.
    4. Build a small gadget that detects motions or tracks faces without any internet connectivity
      1. Train a Deep Learning Model using CPU/GPU etc. and push it to a device (e.g. mobile phone)
      2. Prediction on the device without any service. In this case you need to copy the model on the device itself.
    5. Build a firewall which predict the packets as malicious or not
      1. Train a Deep Learning Model using CPU/GPU etc. and push it to the router.
      2. Prediction on the router without any service. You will need to copy the model on the router for this.

Again, complete code for creating a REST service for your Machine Learning model, can be found at the below link:

For the complete course on Machine Learning, please visit Specialization Course on Machine Learning & Deep Learning

Fashion-MNIST using Machine Learning

One of the classic problem that has been used in the Machine Learning world for quite sometime is the MNIST problem. The objective is to identify the digit based on image. But MNIST is not very great problem because we come up with great accuracy even if we are looking at few pixels in the image. So, another common example problem against which we test algorithms is Fashion-MNIST.

The complete code for this project you can find here :

Fashion-MNIST is a dataset of Zalando’s fashion article images —consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each instance is a 28×28 grayscale image, associated with a label.

Continue reading “Fashion-MNIST using Machine Learning”

Top Machine Learning Interview Questions for 2018 (Part-1)


These Machine Learning Interview Questions, are the real questions that are asked in the top interviews.

For hiring machine learning engineers or data scientists, the typical process has multiple rounds.

  1. A basic screening round – The objective is to check the minimum fitness in this round.
  2. Algorithm Design Round – Some companies have this round but most don’t. This involves checking the coding / algorithmic skills of the interviewee.
  3. ML Case Study – In this round, you are given a case study problem of machine learning on the lines of Kaggle. You have to solve it in an hour.
  4. Bar Raiser / Hiring Manager  – This interview is generally with the most senior person in the team or a very senior person from another team (at Amazon it is called Bar raiser round) who will check if the candidate fits in the company-wide technical capabilities. This is generally the last round.

Continue reading “Top Machine Learning Interview Questions for 2018 (Part-1)”

Phrase matching using Apache Spark

Recently, a friend whose company is working on large scale project reached out to us to seek a solution to a simple problem of finding a list of phrases (approximately 80,000) in a huge set of rich text documents (approx 6 million).

The problem at first looked simple. The way engineers had solved it is by simply loading the two documents in Apache Spark’s DataFrame and joining those using “like”. Something on these lines:

select, from phrases, docs where docs.txt like ‘%’ + phrases.phrase + ‘%’

But it was taking huge time even on the small subset of the data and processing is done in distributed fashion. Any Guesses, why?

They had also tried to use Apache Spark’s broadcast mechanism on the smaller dataset but still, it was taking a long while finishing even a small task.

Continue reading “Phrase matching using Apache Spark”

AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP

Can a machine create quiz which is good enough for testing a person’s knowledge of a subject?

So, last Friday, we wrote a program which can create simple ‘Fill in the blank’ type questions based on any valid English text.

This program basically figures out sentences in a text and then for each sentence it would first try to delete a proper noun and if there is no proper noun, it deletes a noun.

We are using textblob which is basically a wrapper over NLTK – The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.

Continue reading “AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP”

Predicting Income Level, An Analytics Casestudy in R

Percentage of Income more than 50k Country wise

1. Introduction

In this data analytics case study, we will use the US census data to build a model to predict if the income of any individual in the US is greater than or less than USD 50000 based on the information available about that individual in the census data.

The dataset used for the analysis is an extraction from the 1994 census data by Barry Becker and donated to the public site This dataset is popularly called the “Adult” data set. The way that we will go about this case study is in the following order:

  1. Describe the data- Specifically the predictor variables (also called independent variables features) from the Census data and the dependent variable which is the level of income (either “greater than USD 50000” or “less than USD 50000”).
  2. Acquire and Read the data- Downloading the data directly from the source and reading it.
  3. Clean the data- Any data from the real world is always messy and noisy. The data needs to be reshaped in order to aid exploration of the data and modeling to predict the income level.
  4. Explore the independent variables of the data- A very crucial step before modeling is the exploration of the independent variables. Exploration provides great insights to an analyst on the predicting power of the variable. An analyst looks at the distribution of the variable, how variable it is to predict the income level, what skews it has, etc. In most analytics project, the analyst goes back to either get more data or better context or clarity from his finding.
  5. Build the prediction model with the training data- Since data like the Census data can have many weak predictors, for this particular case study I have chosen the non-parametric predicting algorithm of Boosting. Boosting is a classification algorithm (here we classify if an individual’s income is “greater than USD 50000” or “less than USD 50000”) that gives the best prediction accuracy for weak predictors. Cross validation, a mechanism to reduce over fitting while modeling, is also used with Boosting.
  6. Validate the prediction model with the testing data- Here the built model is applied on test data that the model has never seen. This is performed to determine the accuracy of the model in the field when it would be deployed. Since this is a case study, only the crucial steps are retained to keep the content concise and readable.

Continue reading “Predicting Income Level, An Analytics Casestudy in R”

Building Real-Time Analytics Dashboard Using Apache Spark

Apache Spark


In this blog post, we will learn how to build a real-time analytics dashboard using Apache Spark streaming, Kafka, Node.js, Socket.IO and Highcharts.

Complete Spark Streaming topic on CloudxLab to refresh your Spark Streaming and Kafka concepts to get most out of this guide.

Problem Statement

An e-commerce portal ( wants to build a real-time analytics dashboard to visualize the number of orders getting shipped every minute to improve the performance of their logistics.


Before working on the solution, let’s take a quick look at all the tools we will be using:

Apache Spark – A fast and general engine for large-scale data processing. It is 100 times faster than Hadoop MapReduce in memory and 10x faster on disk. Learn more about Apache Spark here

Python – Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Learn more about Python here

Kafka – A high-throughput, distributed, publish-subscribe messaging system. Learn more about Kafka here

Node.js – Event-driven I/O server-side JavaScript environment based on V8. Learn more about Node.js here

Socket.IO – Socket.IO is a JavaScript library for real-time web applications. It enables real-time, bi-directional communication between web clients and servers. Read more about Socket.IO here

Highcharts – Interactive JavaScript charts for web pages. Read more about Highcharts here

CloudxLab – Provides a real cloud-based environment for practicing and learn various tools. You can start practicing right away by just signing up online.

How To Build A Data Pipeline?

Below is the high-level architecture of the data pipeline

Data Pipeline
Data Pipeline

Our real-time analytics dashboard will look like this

Real-Time Analytics Dashboard
Real-Time Analytics Dashboard

Continue reading “Building Real-Time Analytics Dashboard Using Apache Spark”