Deploying Machine Learning model in production

In this article, I am going to explain steps to deploy a trained and tested Machine Learning model in production environment.

Though, this article talks about Machine Learning model, the same steps apply to Deep Learning model too.

Below is a typical setup for deployment of a Machine Learning model, details of which we will be discussing in this article.

Process to build and deploy a REST service (for ML model) in production
Process to build and deploy a REST service (for ML model) in production

The complete code for creating a REST service for your Machine Learning model can be found at the below link:

Let us say, you have trained, fine-tuned and tested Machine Learning(ML) model – sgd_clf, which was trained and tested using SGD Classifier on MNIST dataset.  And now you want to deploy it in production, so that consumers of this model could use it. What are different options you have to deploy your ML model in production?

Before you start thinking about the deployment of your ML model in production, there are a few more tasks that needs to be performed.

There may be more steps involved, depending on what specific requirements you have, but below are some of the main steps:

      • Packaging your ML model
      • Securing your packaged ML model
      • Planning of how to serve/expose your ML model to the consumers (as a REST service, etc.)
      • If you have planned for a REST service, then, creating a REST API for your ML model
      • Securing your REST API
      • Deploying your REST service in production (using Docker and Kubernetes)

Packaging your ML model

Instead of saving the ML model as it is, you can package your ML model (say mnist), and create a .pkl file for it, using Python’s joblib library.

For creating and restoring the .pkl file, you can either use joblib library or pickle library of Python. You normally use joblib to save an object having large data, else, you use pickle library. Here, in this case, we have used joblib library.

.pkl file is nothing but a serialized pickle file, which if you want, you can compress it further, to save storage space, using say Python’s gzip library. After you apply compression, your ML model file name will look like – mnist.pkl.gz

Securing your packaged ML model

The content of your pickle file (in this case your ML model) could be intercepted and modified by anyone over the network if your network is not secured. Hence, it is advisable to use secured (encrypted) network connection while exchanging the pickle file.

In addition to this, the pickle file could be signed (using ‘cryptographic signature’) before storing or transmitting, and this signature can be verified before it is restored at the receiver’s end (say your REST API). Cryptographic signature helps in detecting any alterations to your pickle file data.

Cryptographic signature uses a cryptographic algorithm which generates a cryptographic hash of your pickle file data along with shared secret key. SHA-1 cryptographic algorithm is considered to be the best algorithm to create a stronger hash, hence, it is highly advisable to use it.

Options to deploy your ML model in production

First option

One way to deploy your ML model is, simply save the trained and tested ML model (sgd_clf), with a proper relevant name (e.g. mnist), in some file location on the production machine. The consumers can read (restore) this ML model file (mnist.pkl) from this file location and start using it to make predictions on their dataset.

But, simply deploying your ML model file on the production machine may not be sufficient, as only a handful of consumers who have access to your production machine, will be able to use it.

Second option

In most of the cases, the consumers of your model may not limited to your team members, who have access to your production machine. There may be consumers, who are from different departments (and located globally) who don’t have access to your production environment. Also, if you are building the ML model to be consumed “as a service” by public (anyone), then, in that case also your consumers will not have access to your production environment.

In this case, where the consumers don’t have access to your production environment, how do you make your deployed ML model available to them?

The answer is – exposing your deployed ML model to the consumers, as a service (say REST service or REST API).

REST API increases the reach of your ML model to wider audience and as it can be called from any Application – mobile app, Java application, web application, .Net application, PHP/javascript, Python etc.

The ‘second option’ option seems to be a good option, if your audience is globally located and you want to provide this ML model as a service to a wider audience.
Hence, in this article, we will be focussing our discussion around this ‘second option’ – exposing your ML model as a REST service.

Tools and libraries to build REST API in Python

Most of the ML models are written using Python libraries now a days, hence, in this article, we will discuss about how to expose your ML model as a REST service using Python frameworks.

The most common Python framework, which is used to create a REST API in Python is, Flask.

Flask is a lightweight micro web framework written in Python, which can be used to create small web applications in Python.

Flask is not only a framework, but it also has a web server, which could be used to develop and test small Python web applications.

Flask also has APIs/functions using which you can create a REST API (service) in Python, and for testing this REST service, you can deploy in on Flask web server.

However, since, Flask is a lightweight web server, it should not be used in production environment, to deploy your web application or REST service. It should only be used in development environment, for development and testing of the web applications or REST API.

For production purposes, you can deploy this REST service (written using Python and Flask framework) on a more matured WSGI protocol compliant application server like Gunicorn or uWSGI along with Nginx as the web server.

If you are coming from Java background, then, you can think of:

      • Flask framework as something like Spring or Jersey framework – to create REST API
      • Nginx as like Apache Tomcat  – web server
      • uWSGI (or Gunicorn) as like JBoss or Websphere  – application server

Process to build and deploy a REST service (for ML model) in production
Process to build and deploy a REST service (for ML model) in production

Building (and testing) your REST API (service) using Flask framework

In your ML model training Python code, you can save your trained and tested ML model (say sgd_clf), using a proper file name, on a file location of your production application server using joblib library of Python, as shown below:

Here, ML model sgd_clf is being saved with a file name mnist_model.pkl in file location ‘trained_models/’ on the production machine. This ML model you can retrieve (restore) inside your REST API code to make the predictions on the input digit images.

Below is the code in Python (using Flask framework) to create a REST API (predict_image()) which predicts a digit from a given image (containing a handwritten digit).

Currently, this REST API just takes a ‘image file name’ (‘file’) as input (sent in the request)

The below line of code loads (retrieves) the stored (saved) ML model mnist_model.pkl from the file location ‘../trained_models/’ using the Scikit Learn’s (sklearn)  library joblib and stores it in a variable called model.

We can use this model object to make the ML predictions using its predict() function.

We use something called as decorators in Python (like we have annotations in Java/Spring) to modify behaviour of a function or a class.

Below, we are defining a decorator ( @app.route(…) ) for the predict_image() function, which says that any ‘POST’ request URL which matches ‘/predict’ pattern, should be redirected to this function – predict_image().

Thus, here, we are defining the ‘End Point’ for our REST API (predict_image()). The ‘End Point URL’ for your REST API will look like as below

To convert an input image to grayscale image, which is required by mnist ML model (as the model was trained on grayscale images), we can use convert() function of Image module of Python’s Pillow library.

Please don’t forget to pass value ‘L’ value to the convert() function.

Now, we can make predictions on this input image using the model object, using its predict() function.

We have used Flask’s jsonify() function to convert our ‘result’ of the ML model to a JSON object. Our REST API predict_image() returns result in JSON format.

This JSON object (‘digit’: result), you can use to retrieve the result in your calling application.

This REST API can be called from any application – mobile app, Java application, web application, .Net application, PHP/javascript, Python, etc. The only information your consumers need about this REST API is its ‘end point URL’ and the credentials to access it.

Using a web browser to test a web service isn’t the best idea since web browsers can’t generate all types of http requests easily. Instead, use ‘curl’command of unix to test your REST API.

Still some more work needs to be done, after you have created and deployed your REST API, you need to secure it. But, not all REST API functions may need to be secured.

You can use HTTP secure server (https://….) to encrypt all the communications, to make your REST service communications more secure.

You can secure your REST service using Flask extension called flask-httpauth. Using @auth.login_required decorator with your REST API functions, you can specify which functions (APIs) in the REST service are secured (protected).

You can use a ‘token’ based authentication for requests. In this type of authentication, the client application (which is calling this REST API), for the first request, sends the credentials to the REST service, and in return gets back a token, which it needs to send to the REST service in all its future requests. You can use any of the hashing algorithms (like SHA1) to create these tokens.

Also, you may get unauthorized access error thrown from these protected REST APIs (functions), which would need to be handled in the code.

In your REST API code file, you can write a separate function to handle these authorization errors. And, to specify, which function in your REST API code file, will handle these authorization errors, you can use @auth.error_handler decorator on top of such a function, as shown below:

Here, unauthorized() is the function that will be invoked if your protected REST API function throws an unauthorized access error. You can return a relevant error message from this function as a response.

There are many other security implementations available that you can use according to the level of your security requirements. The main idea is to secure your REST API.

Achieving Scalability and Fault Tolerance for your deployed ML model

When you are creating the deployment plan for your model, you may need to consider two important aspects – Scalability and Fault Tolerance.

To achieve fault tolerance and scalability for your REST service, you can use Docker and Kubernetes for the deployment.

Using Docker application, you can package your REST service code/application in a Docker container. For this you need to create a Dockerfile for your REST service application (code), mentioning the following:

      • the OS (operating system) requirement for your application, say CentOS 7, Ubuntu 16.04 LTS, etc.
      • images of required libraries, servers like – Python, Flask, Nginx, uWSGI
      • working directory of your application
      • pip install command to install all the dependent libraries and packages (e.g. Flask, Pillow, etc.) for your code as listed in your requirements.txt file.
      • Python command to create package file for your REST API application/code

You can use the above created Dockerfile to create a Docker container using ‘docker build’ command.

Now, after packaging your REST service in a Docker container, you can deploy and run your Docker containers on any machine or the VM (virtual machine).

Docker enables you to create new Docker containers (for your REST service). And Kubernetes can be used to deploy these Docker containers on a Kubernetes cluster.

Since, you can spin (create) new Docker containers on the fly (say when an existing container goes down or you need new containers due to increase in user requests), it provides you the required ‘fault tolerance’.

Kubernetes cluster (cluster of machines) provides the required ‘scalability’ for your REST service, it can spin (create) new Docker containers based on demand, using the Docker image. Kubernetes master node which also takes care of ‘fault tolerance’, if a container goes down, it spins another Docker container.

Kubernetes also has a load balancer mechanism, which takes care of distributing the load (user requests) on the containers.

Deployment architecture for REST service using Nginx, uWSGI, Flask, Docker and Kubernetes
Deployment architecture for REST service using Nginx, uWSGI, Flask, Docker and Kubernetes

Kubernetes, Spark MLLib and TensorFlow Distributed

Since, we are discussing about clusters (of machines) and Kubernetes, thought of sharing a small note on Kubernetes clusters and Spark clusters. Often, when we talk about Big Data and Spark MLLib, people get confused between Kubernetes clusters and Spark clusters. When to use which cluster?

To understand, when (and why) we use Kubernetes clusters and when we use Spark clusters, we need to understand their purpose of use.

Spark MLLib is basically a library of Spark, which has various Machine Learning algorithms (which are also available in Scikit Learn), customized to run on a Spark cluster i.e. using multiple machines. We use the ML algorithms from Spark MLLib library (in place of normal Scikit Learn version of ML algorithms), when our dataset is so huge that we need Big Data kind of processing to reduce the training and prediction time of our ML model.

For Deep Learning models, you can use TensorFlow Distributed instead of Spark MLLib.

Hence, in a nutshell, we use Spark MLLib on Spark cluster to reduce the training and prediction time of our ML model.

Whereas, we use Kubernetes cluster is used to achieve ‘scalability’, of the finally trained and tested ML model, when we deployed it on production.

When we get multiple requests simultaneously, Kubernetes spins new Docker containers (containing your ML model) and distributes the requests to the multiple containers to reduce the load.

Hence, after we have trained and tested our Spark MLLib Machine Learning model, using huge amount of data (Big Data), on a Spark cluster, we can package and deploy the same on a Kubernetes cluster in production.

But, in case, your requirement is to run Spark on a Kubernetes cluster of machines, you can do so, as Spark (version 2.3 onwards) supports this. You can create Docker containers for Spark nodes and deploy these Docker containers on Kubernetes cluster (of machines). Spark (version 2.3 and above) already comes with a Dockerfile that you can use for this purpose. For more details on this, you can refer Spark documentation.

Case Studies

Below are a few case studies for different types of deployment modes:

    1. Building a ‘You may also like it’ Amazon recommendations:
      1. Perform ML model training on Spark Cluster
      2. Perform prediction on the ML model on a scalable environment built using a mature application server and Kubernetes cluster.
    2. Say, on  a huge database like  LinkedIn, we need to find profile pictures which don’t have faces
      1. Train a Deep Learning Model using GPU etc.
      2. Run the prediction on the yarn cluster (Spark, EC2, etc.) or Kubernetes
    3. Build a small gadget that detects motions or tracks faces without any internet connectivity
      1. Train a Deep Learning Model using CPU/GPU etc. and push it to a device (e.g. mobile phone)
      2. Prediction on the device without any service. In this case you need to copy the model on the device itself.
    4. Build a small gadget that detects motions or tracks faces without any internet connectivity
      1. Train a Deep Learning Model using CPU/GPU etc. and push it to a device (e.g. mobile phone)
      2. Prediction on the device without any service. In this case you need to copy the model on the device itself.
    5. Build a firewall which predict the packets as malicious or not
      1. Train a Deep Learning Model using CPU/GPU etc. and push it to the router.
      2. Prediction on the router without any service. You will need to copy the model on the router for this.

Again, complete code for creating a REST service for your Machine Learning model, can be found at the below link:

For the complete course on Machine Learning, please visit Specialization Course on Machine Learning & Deep Learning

9 Indian Mathematicians Who Transformed The Norms Of Knowledge- Now It’s On Us

Mathematics is the science which deals with the logic of quantity, shape, and arrangement. Undeniably, math is all around us, in fact in everything we do. It wouldn’t be wrong to say, math is the building block for everything in our daily life period. Money, sports, architecture (ancient and modern), television, mobile devices, and even art, all of it has some mathematical concepts involved in it.

In India, mathematics has its origins in Vedic literature which is nearly four thousand years old. It should come as no surprise that the concept of number ‘0’ was discovered in India; also, various treatises on mathematics were authored by Indian mathematicians. The techniques of trigonometry, algebra, algorithm, square root, cube root, negative numbers, and the most significant decimal system are concepts which were discovered by Indian mathematician from ancient India and are employed worldwide even today.

Continue reading “9 Indian Mathematicians Who Transformed The Norms Of Knowledge- Now It’s On Us”

Fashion-MNIST using Machine Learning

One of the classic problem that has been used in the Machine Learning world for quite sometime is the MNIST problem. The objective is to identify the digit based on image. But MNIST is not very great problem because we come up with great accuracy even if we are looking at few pixels in the image. So, another common example problem against which we test algorithms is Fashion-MNIST.

The complete code for this project you can find here :

Fashion-MNIST is a dataset of Zalando’s fashion article images —consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each instance is a 28×28 grayscale image, associated with a label.

Continue reading “Fashion-MNIST using Machine Learning”

One-on-one discussion on Gradient Descent

Usually, the learners from our classes schedule 1-on-1 discussions with the mentors to clarify their doubts. So, thought of sharing the video of one of these 1-on-1 discussions that one of our CloudxLab learner – Leo – had with Sandeep last week.

Below are the questions from the same discussion.

You can go through the detailed discussion which happened around these questions, in the attached video below.

One-on-one discussion with Sandeep on Gradient Descent
Continue reading “One-on-one discussion on Gradient Descent”

Women’s Day 2019 – Showcasing 11 Incredible Women In AI Today

You already know that artificial intelligence is grabbing the world, transforming nearly every industry, business, trade, and function. But what you might not know are the incredible AI technologist and researchers powering the edge of this momentous revolution. 

Breakthroughs in AI are incredible, isn’t it? But how does it all occur? It’s when extremely talented and diverse thinkers from unique backgrounds, disciplines, expertise, and perspectives come forward. You might think of names like Andrew Ng (Baidu), Amit Singhal (Uber), Elon Musk (SpaceX & Tesla), Ginni Rometty (IBM) and Ray Kurzweil (Google) – but wait why all men?

2019 March, on International Women’s Day, CloudXLab aims to showcase 11 incredible women in AI.

Continue reading “Women’s Day 2019 – Showcasing 11 Incredible Women In AI Today”

Career In Artificial Intelligence: Myths vs. Realities

In recent years, career opportunities in artificial intelligence (AI) has grown exponentially to meet the rising demand of digitally transformed industries. But while there are amply of jobs available in AI, there’s a momentous shortage of top talent with the essential skills. But why does this demand and supply gap exist? Many aspiring candidates wish to join the AI bandwagon, but several myths hold them behind.

Job site Indeed highlights, the demand for AI skills has doubled over the last 3 years, and the total number of job postings has upsurged by 119 percent. But, job-aspirants interest in a career in artificial intelligence seems to have leveled off. This clearly indicates employers are struggling to get good talent. This is surely good news for all those planning to transit their careers into AI!

Well, building a career in artificial intelligence demands a self-controlled approach. You might be interested in a career switch because of the exciting opportunities floating in this booming industry or maybe for that long deep-rooted interest to pursue AI as a career. Regardless of what your inspiration is, the first step to moving ahead in the AI career path is to ditch the myths and misconceptions that for long has been blocking your path.

Continue reading “Career In Artificial Intelligence: Myths vs. Realities”

How to create an Apache Thrift Service – Tutorial


Say you come up with a wonderful idea such as a really great phone service. You would want this phone service to be available to the APIs in various languages. Whether people are using Python, C++, Java or any other programming language, the users should be able to use your service. Also, you would want the users to be able to access globally. In such scenarios, you should create the Thrift Service. Thrift lets you create a generic interface which can be implemented on the server. The clients of this generic interface can be automatically generated in all kinds of languages.

Let us get started! Here we are going to create a very simple service that just prints the server time.

Continue reading “How to create an Apache Thrift Service – Tutorial”

Use-cases of Machine Learning in E-Commerce

What computing did to the usual industry earlier, Machine Learning is doing the same to usual rule-based computing now. It is eating the market of the same. Earlier, in organizations, there used to be separate groups for Image Processing, Audio Processing, Analytics and Predictions. Now, these groups are merged because machine learning is basically overlapping with every domain of computing. Let us discuss how machine learning is impacting e-commerce in particular.

The first use case of Machine Learning that became really popular was Amazon Recommendations. Afterwards, the Netflix launched a challenge of Movie Recommendations which gave birth to Kaggle, now an online platform of various machine learning challenges.

Before I dive deep into the details further, lets quickly brief the terms that are found often confusing. AI stands for Artificial Intelligence which means being able to display human-like intelligence. AI is basically an objective. Machine learning is making computers learn based on historical or empirical data instead of explicitly writing the rules. Artificial Neural networks are the computing constructs designed on a similar structure like the animal brain. Deep Learning is a branch of machine learning where we use a complex Artificial Neural network for predictions.

Continue reading “Use-cases of Machine Learning in E-Commerce”

What are the pre-requisites to learn big data?

Pre-requisites for Big Data Hadoop

We, at CloudxLab, keep getting a lot of questions online, sometimes offline, asking us

“I want to learn big data. But, just don’t know whether I am eligible or not.”

“I am so and so, can I learn big data?”

We have compiled the most common questions here. And, we will answer each one of them.

So, here we go.

What are those questions?

  1. I am from a non-technical background. Can I learn big data?
  2. Do I need to know programming languages such as Java, Python, PHP, etc.?
  3. Or, since it is big data, do I need to know any other relational databases such as Oracle or in general do I need to be well versed with SQL?
  4. And also, do I need to know the Unix or Linux?

Continue reading “What are the pre-requisites to learn big data?”

Top Machine Learning Interview Questions for 2018 (Part-1)


These Machine Learning Interview Questions, are the real questions that are asked in the top interviews.

For hiring machine learning engineers or data scientists, the typical process has multiple rounds.

  1. A basic screening round – The objective is to check the minimum fitness in this round.
  2. Algorithm Design Round – Some companies have this round but most don’t. This involves checking the coding / algorithmic skills of the interviewee.
  3. ML Case Study – In this round, you are given a case study problem of machine learning on the lines of Kaggle. You have to solve it in an hour.
  4. Bar Raiser / Hiring Manager  – This interview is generally with the most senior person in the team or a very senior person from another team (at Amazon it is called Bar raiser round) who will check if the candidate fits in the company-wide technical capabilities. This is generally the last round.

Continue reading “Top Machine Learning Interview Questions for 2018 (Part-1)”