Most Asked Machine Learning Interview Questions with Answers

Machine Learning jobs have been awarded the sexiest job of the 21st century by many websites.  Looking at the demand for machine learning professionals and ever-growing pay package for such roles, we can safely say it is indeed true. The job opportunities in machine learning are endless in both academic and corporate settings. So if you are one of the aspirants who would like to join the machine learning bandwagon, then you’ve come to the right place.

In this post, we have collected the most asked machine learning interview questions in startups and corporates. Also, we don’t want you to struggle with finding the answers, thus we’ve tried to provide the simple explanation to each question.

What is regularization and how is it used to solve the problem of overfitting?

In Statistical models, overfitting is a very common problem. One of the methods to solve this problem is Regularization. Before I go further and write a plain definition of regularization, it is very important for you to understand the problem of overfitting.

Let’s take an example. Let’s say, you’ve been given a problem to predict the genre of music one likes based on one’s age. You first try a linear regression model with age as an independent variable and music genre as a dependent one. Sad for you, but this model will mostly fail because of its too simplistic nature.

You then sure want to add more explaining variables to make your model more interesting. You then go ahead and add the sex and the education of each individual in your dataset. Now, you measure its accuracy by a loss metric L(X,Y)L(X,Y) where XX is your design matrix and YY is the denoted targets (music genre in your case). You find out that results are good but not very accurate.

So you go ahead and add more variables like marital status, location, profession, education, etc. Much to your surprise, you find that your model may have poor prediction power. You have just experienced a problem of overfitting. Which means you model sticks too much to the data and might have learned the background noise. In other words, your model has high variance and low bias.

To overcome this problem, we use the technique called regularization. Basically, you need to penalize the loss function by adding a multiple of L1L1 (Lasso) norm of the weight vector ww. You will then come up with the following equation

L(X,Y) + λN(w), where λ is regularisation term and N is either L1 (Lasso), L2 (Lasso), or any other norm.

Explain how a ROC curve works?

Receiver Operating Characteristics or ROC curve is a graphical representation of the performance of a binary classifier system at various thresholds.

Let’s understand this definition step by step. First, we need to understand what discrimination threshold is.

In a binary classifier system, you get the probability of an observation to be classified as 0 or 1. i.e when you decide the threshold you classify the output into two classes. For example, you have a problem where your model needs to classify a tumor as cancerous or non-cancerous. Now, you set the threshold of your system as 0.8, which means tumor of diameter above this number will be considered cancerous. If you notice, the performance of the system varies as you change the threshold.

Now that you understand what discrimination threshold is, let’s understand two more important terms to understand how ROC curve works.

True Positive Rate:

This tells you how many times your model is able to classify the positives as positives.

False Positive Rate:

This tells you how many times your model classify a negative as a positive.

Now, to get the ROC curve, you plot the True Positive Rate against the False Positive Rate at various threshold settings.

ROC Curve


What is MAP Hypothesis?

You need to understand the Bayes Theorem first. Bayes theorem gives a formula that calculates the conditional probability that something will happen, given that something else has already happened.

The formula for calculating conditional probability is

P (h | d) = P (d | h) * P (h) / P(d)


  • P(h|d) is the probability of hypothesis h given the data d. This is also called the posterior probability
  • P(d|h) is the probability of data d given that the hypothesis h was true.
  • P(h) is the probability of hypothesis h being true. This is also called the prior probability of h.
  • P(d) is the probability of the data.

We calculate the posterior probability for different hypotheses and select the hypothesis with the highest probability. What you get is the maximum probable hypothesis or maximum a posteriori (MAP) hypothesis.

Give the difference between concordant and discordant pairs with an example.

Concordant and Discordant pairs are calculated for ordinal variables and they tell you if there is agreement or disagreement between scores. Please note that you must order your data and place them into pairs before calculating the concordance or discordance.

Let’s see an example to show the difference between the two

Say, you have a score data of 5 job applicants given by two interviewers.

Candidate Interviewer 1 Interviewer 2
A 1 1
B 2 2
C 3 4
D 4 3
E 5 6

By and large, you are looking if both the Interviewers have scored the candidate in the same or opposite order. Let’s observe the score of candidates A and C. Their scores are in the same order, thus A and C are the concordant pairs. Similarly, C and D are discordant pairs because the order of their score is opposite.

Describe a situation when you would use logistic regression vs random forest vs SVM.

When classifiers are clean and data is huge then one should go with SVM

If model interpretability is not that important and you need outliers to be part of your model then you should go with random forest

When data is small and one has to do two class classification and distribution is normal in each of the classes then you should go with logistic regression.

What is Homoscedasticity and how is it different from Heteroscedasticity?

In linear regression, you must ensure that the data is homoscedastic in nature. i.e variance is same for all the points in data. You can see if the data is homoscedastic by observing the distance for each point from the regression line. This distance should be same for the data to be homoscedastic.

Technically, the data is homoscedastic if the ratio of the largest variance to the smallest variance is below 1.5


But, in reality, you often have to deal with the heteroscedastic data where variance is not constant for all the data points in the scattered data. Heteroscedastic data has a cone shape that spreads out in either direction i.e left to right, or right to left.

One example of such data is the prediction of annual income by age. More often than not, people in their teens earn close to the minimum wage, so the variance of such data points seems constant at low age. But, if you observe, the income gap widens with the age. For example, One could be driving a Ferrari and other could not even afford a car. We can illustrate this example with the below graph


What is bagging?

Bagging is another term for bootstrap aggregation. In order to understand it better, you should have a clear understanding of a statistical method called bootstrap. Now, let’s see one example to make it more easy to grasp the concept.

Let’s say, you want to estimate the mean of a sample of 100 values. Unless you are living under the rock, you already know that you can calculate the mean directly from the following formula:

Mean = Sum of all values / Total no. of values

But, if your sample is small then sure as shooting your mean has an error in it. What should you do to improve your estimation? This is where bootstrap method comes into the picture. Below are the steps to follow the bootstrap method

  • Create many sub-samples out of your dataset with replacement. For ex: If your dataset is (1,2,3,4,5) then your subsets would be (1,2,3) or (3,4,5). Note that we have replaced 3 in the second subset.
  • Calculate the mean of each subset
  • Average the means of all the many subsets

Now, let’s come back to bagging. Bagging is nothing but the application of the bootstrap method in machine learning algorithms like Decision Trees and Random Forest.

What are the different ways of model assessment?

  • Complexity parameter
  • AIC
  • BIC
  • Adjusted R Square
  • ROC curve
  • AOC Value
  • Cross-Validation
  • K-Fold Validation

Next Steps

Don’t worry, we will continue to expand the list of questions with answers. Also, if you couldn’t understand or need a more explanation to a particular question then we encourage you to ask your query in the comment section below. We will try our best to address your query.

We also run live courses on Machine Learning that are taught by industry experts. We do know that hands-on is very important to master what you learn, which is why we provide free access to our lab where you can practice along with the training.

We provide the following courses on machine learning

Streaming Twitter Data using Flume

In this blog post, we will learn how to stream Twitter data using Flume on CloudxLab

For downloading tweets from Twitter, we have to configure Twitter App first.

Create Twitter App

Step 1-

Navigate to Twitter app URL and sign in with your Twitter account

Step 2-

Click on “Create New App”

Create New App

Continue reading “Streaming Twitter Data using Flume”

A Simple Tutorial on Scala – Part – 2

Welcome back to the Scala tutorial.

This post is the continuation of A Simple Tutorial on Scala – Part – 1

In the Part-1 we learned the following topics on Scala

  • Scala Features
  • Variables and Methods
  • Condition and Loops
  • Variables and Type Inference
  • Classes and Objects

Keeping up the same pace, we will learn the following topics in the 2nd part of the Scala series.

  • Functions Representation
  • Collections
  • Sequence and Sets
  • Tuples and Maps
  • Higher Order Functions
  • Build Tool – SBT

Functions Representation

We have already discussed functions. We can write a function in different styles in Scala. The first style is the usual way of defining a function.

Please note that the return type is specified as Int.

In the second style, please note that the return type is omitted, also there is no “return” keyword. The Scala compiler will infer the return type of the function in this case.

If the function body has just one statement, then the curly braces are optional. In the third style, please note that there are no curly braces.

Continue reading “A Simple Tutorial on Scala – Part – 2”

A Simple Tutorial on Scala – Part – 1

Welcome to the Scala tutorial. We will cover the Scala in two-part blog series. In this part, we will learn the following topics

  • Scala Features
  • Variables and Methods
  • Condition and Loops
  • Variables and Type Inference
  • Classes and Objects

For better understanding, do hands-on with this tutorial. We’ve made this post in such a way that the reader will find easy to follow the tutorial with hands-on.

Scala Features

Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way.

It is a statically typed language. Which means it does type checking at compile-time as opposed to run-time. Let me give you an example to better understand this concept.

When we deploy jobs which will run for hours in production, we do not want to discover midway that the code has unexpected runtime errors. With Scala, you can be sure that your code will not give you unexpected errors while running in production.

Since Scala is statically typed we get performance and speed over dynamic languages.

How is Scala different than Java?

Unlike Java, in Scala, we do not have to write quite as much code to perform simple tasks and its syntax is very similar to other data-centric languages. You could say that Scala is the modified version of Java with less boilerplate code.

Continue reading “A Simple Tutorial on Scala – Part – 1”

A Simple Tutorial on Linux – Part-2

This post is the continuation of A Simple Tutorial on Linux – Part-1

In the Part-1 we learned the following topics on Linux.

  • Linux Operating System
  • Linux Files & Process
  • The Directory Structure
  • Permissions
  • Process

Keeping up the same pace, we will learn the following topics in the 2nd part of the Linux series.

  • Shell Scripting
  • Networking
  • Files & Directories
  • Chaining Unix Commands
  • Pipes
  • Filters
  • Word Count Exercise
  • Special System commands
  • Environment variables

Writing first shell script

A shell script is a file containing a list of commands. Let’s create a simple command that prints two words:

1. Open a text editor to create a file

2. Write the following into the editor:

Note: In Unix, the extension doesn’t dictate the program to be used while executing a script. It is the first line of the script that would dictate which program to use. In the example above, the program is “/bin/bash” which is a Unix shell.

1. Press Ctrl +x to save and then “y” to exit

2. Now, by default, it would not have executable permission. You can make it executable like this:

3. To run the script, use:

Continue reading “A Simple Tutorial on Linux – Part-2”

A Simple Tutorial on Linux – Part-1

We have started this series of tutorials for Linux which is divided into two blog posts. Each one of them will cover basic concepts with practical examples. Also, we have provided the quiz on some of the topics that you can attend for free.

In the first part of the series, we will learn the following topics in detail

  • Linux Operating System
  • Linux Files & Process
  • The Directory Structure
  • Permissions
  • Process


Linux is a Unix like operating system. It is open source and free. We might sometimes use the word “Unix” instead of Linux.

A user can interact with Linux either using a ‘graphical interface’ or using the ‘command line interface’.

Learning to use the command line interface has a bigger learning curve than the graphical interface but the former can be used to automate very easily. Also, most of the server side work is generally done using the command line interface.

Linux Operating System

The operating system is made of three parts:

1. The Programs

A user executes programs. AngryBird is a program that gets executed by the kernel, for example. When a program is launched, it creates processes. Program or process will be used interchangeably.

2. The Kernel

The Kernel handles the main work of an operating system:

  • Allocates time & memory to programs
  • Handles File System
  • Responds to various Calls

3. The Shell

A user interacts with the Kernel via the Shell. The console as opened in the previous slide is the shell. A user writes instructions in the shell to execute commands. Shell is also a program that keeps asking you to type the name of other programs to run.

Continue reading “A Simple Tutorial on Linux – Part-1”

A Successful Machine Learning Bootcamp by CloudxLab

CloudxLab has hosted several webinars in the past and all of them have been successful. But this time we thought to try something different. So, we all sat together and decided to do an offline meetup for Machine Learning. Though we had done some in the past, the engagement and interaction that one can get in the online webinar are not comparable. Anyhow, we then got in touch with Drupal Bangalore and they were having this event in R. V College of engineering. And one of the topics was Introduction to Machine Learning. We found this a good opportunity to bring our knowledge in the offline circle too.

Machine Learning Bootcamp

So it all happened on Nov 17 where Machine Learning enthusiasts gathered to attend the one day workshop on Machine Learning. The presenter was none other than Mr. Sandeep Giri, who has over 15 years of experience in the domain of Machine learning and Big Data technologies. He has worked in companies like Amazon, InMobi, and D. E. Shaw.

Continue reading “A Successful Machine Learning Bootcamp by CloudxLab”

Machine Learning Bootcamp – Introduction and Hands-on @ RV College of Engineering, Bangalore

We have a one-day workshop on Introduction to Machine Learning with Drupal Bangalore. In this workshop, you will learn how to apply various Machine Learning techniques for everyday business problems.

  • Date: Saturday, Dec 16, 2017
  • Place: R. V. College of Engineering, Bangalore
  • Time: 11.30 am – 1.30 pm: Presentation and Demo, 2.30 pm – 4.30 pm: Hands-on

What will be covered?

An exposure to Machine Learning using Python to analyze, draw intelligence and build powerful models using real-world datasets. You’ll also gain the insights to apply data processing and Machine Learning techniques in real time.

After completing this workshop, you will be able to build and optimize your own automated classifier to extract insights from real-world data sets.

Continue reading “Machine Learning Bootcamp – Introduction and Hands-on @ RV College of Engineering, Bangalore”

Introduction to NumPy and Pandas – A Simple Tutorial

Python is increasingly being used as a scientific language. Matrix and vector manipulations are extremely important for scientific computations. Both NumPy and Pandas have emerged to be essential libraries for any scientific computation, including machine learning, in python due to their intuitive syntax and high-performance matrix computation capabilities.

In this post, we will provide an overview of the common functionalities of NumPy and Pandas. We will realize the similarity of these libraries with existing toolboxes in R and MATLAB. This similarity and added flexibility have resulted in wide acceptance of python in the scientific community lately. Topic covered in the blog are:

  1. Overview of NumPy
  2. Overview of Pandas
  3. Using Matplotlib

This post is an excerpt from a live hands-on training conducted by CloudxLab on 25th Nov 2017. It was attended by more than 100 learners around the globe. The participants were from countries namely; United States, Canada, Australia, Indonesia, India, Thailand, Philippines, Malaysia, Macao, Japan, Hong Kong, Singapore, United Kingdom, Saudi Arabia, Nepal, & New Zealand.

Continue reading “Introduction to NumPy and Pandas – A Simple Tutorial”

Introduction to Machine Learning – An Informative Webinar

On November 3, CloudxLab conducted a successful webinar on “Introduction to Machine Learning”.  It was a 3-hour session wherein the instructor shed some light on Machine Learning and its terminologies.

It was attended by more than 200 learners around the globe. The participants were from countries namely; United States, Canada, Australia, Indonesia, India, Thailand, Philippines, Malaysia, Macao, Japan, Hong Kong, Singapore, United Kingdom, Saudi Arabia, Nepal, & New Zealand.

Presented By

Sandeep Giri - Instructor for the Machine Learning webinar

Sandeep Giri

Topics Covered in The Webinar

  • What is Machine Learning?
  • Automating Mario Game
  • The Machine Learning Tsunami
  • Collecting Data
  • Processing Data
  • Spam filter Using Traditional and Machine Learning
  • What is AI?
  • Sub-objectives of AI
  • Different Type of Machine Learning
  • Artifical Neural Network
  • Introduction to Deep Learning
  • TensorFlow Demo
  • Machine Learning Frameworks
  • Deep Learning Frameworks

Continue reading “Introduction to Machine Learning – An Informative Webinar”