Machine Learning Process

3 / 38

Machine Learning Types

Not able to play video? Try with youtube

In machine learning, there are different kinds of problems to solve. Knowledge of the type of the problem helps us communicate the problem better and makes it easier to solve.

There are so many different types of Machine Learning systems that it is useful to classify them in broad categories based on:

Whether or not they are trained with human supervision (supervised, unsupervised, semi-supervised, and Reinforcement Learning)

Whether or not they can learn incrementally on the fly (online versus batch learning)

Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do. This is known as instance-based learning versus model-based learning

Let us explore further the first way of classifying machine learning on the basis of whether it requires human supervision or not.

The machine learning problems can be divided into four major categories - supervised learning, unsupervised learning, semi-supervised learning, and Reinforcement Learning.

What does it mean by Supervised machine learning? The tasks that need the supervision are known as supervised machine learning tasks. In other words, whether or not models are trained with given data which has a history of what is needed.

In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels. We tend to predict the values or classes based on the given data. Say, we want to predict the value of the house in the US from the given data on houses in the US.

Or Predict whether a fruit is an apple or banana

There are two types of tasks in supervised machine learning: classification and regression.

In classification, we predict whether an instance belongs to one of the predefined classes, for example, we want to predict whether an email is a spam or not a spam.

So, the machine learning task that involves predicting whether something belongs to one of the classes is called classification.

Let us take some examples.

Classifying the objects present in an image is an Example of a supervised classification task

Or you are given a huge set of emails which are already labeled as spam or ham. Now, using this data, you need to build a model that predicts whether a new email is a spam or not.

Say, you are given images of handwritten digits which are between 0 and 9. Each image is also labeled with the actual digit. The objective of this task to train a model which will be able to predict the digit in a given image. It needs to predict one of the 10 labels.

This is a classification supervised machine learning task.

In regression supervised machine learning tasks, we predict a value instead of categories. Let us take some examples.

Let us say we need to predict the price of a car and we are given the historical data of cars having various features also called predictors such as mileage, age, and brand etcetera.

Since we are trying to predict a value instead of classifying, this task is regression which is supervised machine learning.

Let me take another simple example. Say, I am trying to predict what will be the height of 11 year old will be and I have kept the record of his heights since birth every month.

First, I will be training the model on the heights data we already have and then this model will be able to predict the height on the basis of the age.

In unsupervised learning, we don't have a label or target to predict. Instead, the unsupervised learning tasks usually involve detecting pattern in given data.

In unsupervised learning, the most common example is clustering. In clustering tasks, we try to form the clusters in the data based on some measure of similarity.

Let us take an example. We have a blog which has many users. Say, we want to form the groups of users such that similar users are in the same group. The similarity could be based on how many posts they have read, their city, their topics of interest etcetera.

We don't have a label or value to predict. Instead, we group the users together. Such a group could be useful in identifying outliers and also in the understanding of user behavior.

We don't have a predefined set of groups. At no point, we tell the algorithm which group a visitor belongs to. Instead, we let the group formation happen automatically.

Such tasks are termed as unsupervised learning tasks.

As a result, we may discover various insights such as 40% of visitors are comic lovers and read the blog in the evening and 20% of visitors are sci-fi lovers and read the blog during weekends.

This data helps us in targeting our blog posts for each group.

A type of clustering is hierarchical clustering where grouping happens in the form of a tree.

Say, We are given the information about the characteristics of all species on the planet and we need to form a tree such that the sibling at every node are very similar.

This is what the biologists like Charles Darvin did to come up with the theory of evolution.

A common use case of unsupervised learning is to detect unusual credit card transactions in order to prevent the fraud.

By forming the groups of transactions based on the similarity, we try to figure out the anomaly in the credit card transactions.

Semi-supervised learning falls between unsupervised learning and supervised learning. It makes use of labeled as well as unlabeled data for training.

This is an example of semi-supervised learning where the model is trained typically with a small amount of labeled data and with a large amount of unlabeled data to categorize the type of news in a document.

Reinforcement learning is one of the most exciting fields of machine learning today. In Reinforcement Learning, a software agent makes observations and takes actions within an environment, and in return it receives rewards. In other words, the model learns to make predictions on the basis of rewards and penalties it gets.

The learning system an agent in this context

Observes the environment

Selects and performs the actions and

Get rewards or penalties in return

It learns by itself what is the best strategy or policy to get the most reward over time

Let’s look into some applications of Reinforcement learning

Used by robots to learn how to walk

DeepMind’s AlphaGo

Which defeated world champion Lee Sedol at the game of Go

Usually, a project involves multiple kinds of tasks. Let us take an example of identifying the claim value of insurance based on a photo of a car.

First, we would identify which car is it based on the picture. This is a classification task. And then using the photo or the picture and the information about the car we would identify the amount of damage and thus the claim value which is a regression task.

So this project involves a classification as well as regression task.

Another criterion used to classify Machine Learning is whether or not the system can learn incrementally from a stream of incoming data.

In batch learning, the system is trained first using all the available data and then the trained model is launched into production and then it runs there without any further training.

When the new data is available, we train the model again using both the old data and new data. In batch learning, we generally get better model because the model is built using the whole data.

If the data is huge then batch learning takes a lot of time and computing resources. That is why batch learning is generally done offline.

So what are disadvantages of batch learning?

Training using a full set of data can take many hours. That is why in batch learning we train a new model only when it is really required.

If the data is huge and the model needs to be updated frequently using the newly available data, then the batch learning may be impossible to use as it will take time to train a model again and again. And by the time the new model is trained, it may not be of any use. It may have gone stale by the time. For example in the stock market, the model should learn from the new trends immediately. If it takes time to train the new model, by the time the new model will start giving predictions, it will be too late for trading the stocks.

Moreover, training on the full set of data requires a lot of computing resources such as CPU, memory and disk space. It will end up costing you a lot of money.

Also if your system has limited resources such as smartphone application or a rover on the Mars, then storing a lot of amount of data and processing it every day for hours is not feasible.

Unlike batch learning, in online learning, the model is trained incrementally by feeding it the data sequentially or in batches.

In online learning, the model learns about new data on the fly as it arrives.

Online learning is great for systems that receive data continuously (like stock market applications) and need to learn from the new data rapidly.

Online learning is also a good option if you have limited computing resources. Once the model has learned from the new data, it does not need the new data anymore. So we can discard the new data and save huge amount of space.

Online learning algorithms can also be used on huge datasets. Huge datasets cannot fit in one machine's memory. In online learning, the algorithm loads part of the data into the memory, runs a training on that part and repeats the process until it has run on all of the data.

One big challenge in online learning is that if bad data is fed to the model, the model’s performance will gradually decline.

Bad data could come from a malfunctioning sensor on a robot or

from someone spamming a search engine to try to rank high in the search results.

To reduce the drop in performance, we should closely monitor the model and switch off the learning as soon as there is a drop in the performance. Also, it is better to revert the model to previously working state when there is a drop in performance.

Also to reduce the drop in the performance, we can monitor the input data and remove the anomalies in the input data before feeding it to the model.

One more way to categorize Machine Learning systems is by how well they generalize. Most machine learning tasks are about making predictions. The machine learning system should be able to make good predictions for examples it has never seen before. The true goal of machine learning is to perform well on the unknown or new instances.

There are two main approaches to generalization: instance-based learning and model-based learning.

Let’s understand instance-based learning.

The most trivial way of learning is to learn by heart. If you were to create a spam filter this way, your spam filter could just flag all the emails that are identical to the emails that have been already flagged by the users.

Instead of just flagging emails that are identical to known spam, your spam filter could also be programmed to filter emails that are similar to the known spam emails.

In this case, you require a measure of similarity between the two emails. A very basic measure of similarity can be to count the number of words they have in common. Then the spam filter would flag an email as spam if the email has too many words in common with known spam emails.

This is called instance-based learning, the system learns the examples by heart, then generalizes to new or unknown cases using a similarity measure.

Let’s understand model-based learning.

In model-based learning, system trains a model from the set of examples. Then the trained model is used for making the predictions.

We will learn more about model-based learning later in the course.

We all get that the machine learning can help us do prediction just like human brains do. But how do we get to that?

We need to convert the problem in the format that algorithm wants. Basically we need to convert the problem into numbers - present real life objects as numbers.

On the left you see a robot from a movie called ex-machina and on the right we see numbers from a movie called matrix. Basically, in matrix the real world objects were being interpreted as sequence of numbers. This is what we are going to do. Not really but close!

We need to present our data in the form of a table which has rows and columns.

The rows represent the instances or data points and columns represent the features or dimensions.

Let us take a concrete example.

The iris dataset is quite popular in machine learning domain. The dataset contains 50 samples of three varieties of iris flower. The objective is to identify the iris flower type whether it is versicolor, setosa or virginica ...

... based on the measurements of sepal and petal.

This dataset is represented as table. Each sample or observation is a row. As there are 50 samples of each variety, there are 150 total rows. Each sample is also called instance or observation.

And the features such as length and width of the sepal and length and width of the petal in each sample are represented as columns. One of the columns is the label identifying the variety.

Features are also called attributes, measurements or dimensions.

We feed this table of data to an algorithm which creates the model. In other words, the model is trained using this tabular data. And once model is ready, model can be used to predict the label given the features of an instance.

Now let us take a case of MNIST dataset. MNIST stands Modified National Institute of Standards and Technology database. The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning.

Now, question is how will we present this is a tabular form. What will be the features and what will be the instances?

Since each observation is a single digit, a row in our dataset will be representing a single digit. And the manually identified what is actually written in handwriting is the label.

Now, let us understand what will be the features. How can we represent the details of an image as columns?

Let us understand what does it mean by an image. Here the image is grayscale or you can black-and-white. Such an image is basically two dimensional array of numbers. The black color is represented as 0 and white is represented as 255 and everything in between. So, an image is a two dimensional matrix of number. The are many ways of converting a two dimensional image into one dimensional array or list of numbers. For now, we will take a simple approach whereby will append all rows and make a single dimension array. An image having size 28 by 28 with total 784 pixels will be represented an array of 784 numbers.

So, each row will have 784 columns hence 784 features.

MNIST dataset can be represented in the tabular form where each row represents one hand-drawn digit and value of each pixel in the drawing is a feature.

We will train our model using this dataset and then our model will be able to predict the label given the instance features.

We learnt that once we have formatted our data in the form of a table, we can feed it to algorithm to train our model. But how would we know if the the trained model is doing good or bad?

To solve this problem, we generally break the dataset into two sets: training and test. Using the training set, we will train the model and using the testing set, we will measure how well the model is doing.

We will not let our model peek into the testing set to avoid snooping bias. Usually we split the dataset into 80:20 ratio. 80% for training and 20% for testing.The test dataset should be picked at random.

Also, if the dataset is not well balanced, the splitting may not be right. For example, say we are identifying whether an image represents male or female. Suppose we have 10 records having 8 male and 2 females. And we pick 80% for training set. And say we have no samples representing females. There we will end training our model only on males and the model will not have clue about the female pick. So, we need to ensure that the records from all stratas go in equal numbers. Hence, we need to do the stratified sampling on the basis of the features that are very important or labels which do not have equal representation.

Let us take a quick at various kinds of sampling. There are majorly three kinds of sampling: Random Sampling, cluster sampling and stratified sampling.

In random sampling, we pick the sample for test set completely at random. If there is class imbalance (i.e. some of the values of labels are more frequent than others), we can't use the random sampling.

Second one is the cluster sampling, we form the clusters of samples and we pick some of the clusters. This is again not very useful in case of picking the test set.

Third one is stratified sampling. In stratified sampling, we form the stratas such male and female, kids and adults, rich and poor. And we pick the random sample from each strata. This is generally we useful in sampling for picking a random test set.

Once we have splitted the data and trained a model on training set, we can compare the prediction on test set with actual values and measure the performance of the model.

Question is how do we measure the performance?

In case of regression, we can compare the predicted values against the actual values by using mean square error as the criteria. Mean squared error is basically summing up the squared differences of each prediction and actual values.

But in case of classification, we can't use the mean squared error. For that we need to either use accuracy or other performance measure such as confusion matrix, precision, recall rate, f1 score or ROC. Let us briefly discuss what it means by accuracy: How many instances were correctly labelled out of total predictions is known as accuracy.

By choosing the right performance measure as per the problem and having a right test set, we can build our confidence on the model. If we don't have a proper way of testing the model, we will never be able to trust our model.

In this chapter, we learned about AI and how it is used everywhere. We discussed how the idea of AI originated, sub-objectives of AI and how it is achieved.

We also learned about machine learning and deep learning and how to achieve them. In the end, we discussed use cases of AI in the various industries.

In the next chapter, we will learn the machine learning process and the key concepts in the machine learning. Hope you enjoyed the video. Stay tuned and happy learning!

https://discuss.cloudxlab.com/c/course-discussions/ai-and-ml-for-managers

We can summarize the process in the following way. Using machine learning algorithm on historical data we generate the model ..

.. and with model we do the predictions.

Or we want to predict whether an email is a spam or not. These are some of the examples of supervised learning.

What kind of machine learning task is this?

Is this is a supervised learning or unsupervised learning?

This is supervised learning because the training data has labels marked. The given dataset of email has each email marked as spam or ham.

Also, since we are trying to predict the label or category, it is a classification task.

In regression supervised learning tasks, we predict a value instead of categories. Let us take some examples.

As a result, we may discover various insights such as 40% visitors are comic lovers and read the blog in evening and 20% visitors are sci-fi lovers and read the blog during weekends.

This data helps us in targeting our blog posts for each group.

A type of clustering is hierarchical clustering where grouping happens in the form of a tree.

Say, We are given the information about the characteristics of all species on the planet and we need to form a tree such that the sibling at every node are very similar.

This is what the biologists like Charles Darvin did to come up with theory of evolution.

Another criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data.

Welcome to the second chapter of AI and ML course for managers.

As part of this chapter, we are going to first learn how the machine learning approach to solving problems is different from the traditional approach.

Then we will learn the training and prediction phases. We will also learn about the various types of machine learning tasks. We will go through various examples of supervised, unsupervised, reinforcement etcetera type of machine learning tasks. Afterwards, we will learn how to represent our data and frame the problem. We will also learn the training and testing phases along with an understanding of various biases. Afterwards, We will learn with a very simple example what it means by underfitting and overfitting.

Alright, let's get started!

So what is the difference between the traditional approach and machine learning based approach to solving problems and making machines intelligent? Let us understand this with a couple of examples.

We've already discussed the spam filter in the first chapter. A spam filter is a program which marks the incoming emails as spam or ham. Something that is not a spam is called as ham. If the incoming email is not a spam, it appears in the inbox, else it appears in the spam folder. This helps in keeping the inbox clean by avoiding promotional or unwanted emails.

Now, the question is how would you build such a spam filter?

Let us first look at the spam filter using the traditional approach. Or the approach without AI or machine learning. This approach is also known as the rule-based approach.

Let us look at this diagram.

In this diagram, the first step usually is to study the problem and look at what classifies an email as a spam email. We study the various features or properties of email.

We may observe that the subject line or the email body of a spam email generally have words like "credit card", "free" and "Amazing" etcetera.

Also, we may observe that the emails sent from a particular email address are spam.

In step 2, we would write an algorithm or set of rules which detect if an incoming email has above observations.

When an incoming email passes through the rules we have written, it would be classified correctly as spam or ham.

Once we have written the rules or conditions for classifying emails, we will evaluate our rules on the data that has already been marked as spam or ham.

In case, our algorithm or set of rules is giving good accuracy we launch. In other words, if it doing good on test dataset, we put in production to do the real job.

If the accuracy is not good, we go back to studying the problem and write better or more rules.

This was the traditional approach where manually identify and code the rules.

So what are the problems with this traditional approach?

We will have to manually write a long list of complex rules because we do not know what all words or parameters are the characteristics of a spam email.

Also, let's say we have written a rule to classify an email as spam if it contains the word "amazing". If spammers notice that all their emails that contain "amazing" are blocked, they might start writing "Fantastic" instead of "amazing" .

Now we will have to write one more rule to block emails containing the word "fantastic". Spammers will keep working around our spam filter, we will need to keep writing new rules forever.

Let us take a look at how the same problem is solved using machine learning.

In the ML approach, Instead of us writing the rules, the rules are inferred by the algorithm based on the data.

Unlike the traditional approach, a spam filter based on Machine Learning techniques automatically notices that "fantastic" has

become unusually frequent in spam flagged by the users, and it starts flagging them without your intervention.

Let us take an example of speech recognition. Speech recognition involves inferring what is being spoken basically it is a conversion of sound into text.

How would we write a speech recognition program? For simplicity let's assume that the program should be able to distinguish only two words one and two.

How would we write such a program using traditional approach?

The word "two" starts with a high-pitch sound "T". So we can write an algorithm that measures the high-pitch sound intensity and hard code a rule that if the pitch is high, classify the word as "Two", else classify it as "one".

What are the problems with the traditional approach in speech recognition example?

The traditional approach will not work for the words spoken by millions of very different people in noisy environments.

Also, the rule-based approach will not really work great with the voice of people with different accents.

How does machine learning approach solves this problem?

We collect the recordings of each word by millions of people in different accents and noisy environments. Then we feed the recordings to the algorithm and the algorithm learns on its own. Then our algorithm learns on itself using the collected data. The algorithm creates its own rules to recognize the spoken words. This is found to be performing better than human created rules.

The other advantage is that the ML-based Approach of solving problems can help us in improving our understanding of a subject. Machine Learning algorithms can be inspected to see what they have learned.

For example, in the case of a spam filter, our algorithm can reveal the list and combinations of the words that it believes are the best predictors of the spam. Sometimes this will reveal

unsuspected correlation or the new trends, and thereby leads to a better understanding of the problem.

Now, we understand the difference between the machine learning approach and the traditional approach. Let us take a look at typical machine learning process.

During the training phase, the algorithm creates the model and using the model the predictions are done.

The model is nothing but the list of rules as we discussed earlier just that these rules are created by an algorithm based on the past data, not humans.

Let us take a detail look at the training phase.

Before we start training, we need to collect the data, clean the data and process the data.

The collection of data might involve building apps, installing sensors and writing programs that gather the data. Once we have collected the data, we clean it to bring it to a manageable format.

Afterwards, we perform various kinds of preprocessing tasks such that it can be fed to the machine learning algorithm.

To understand the training phase, let us take the example of the spam filter.

We have data containing the historical emails along with the label marking which email is spam and which is not a spam.

During training phase, we feed these emails along with the label to an algorithm which generates the model.

Once we have the model, we can do the prediction.

For example, if we have a model trained to classify spam or ham, we can feed the incoming email in the real world and it will predict whether it is spam or not.

We can summarize the process in the following way. Using a machine learning algorithm on historical data we generate the model ...

... and with model we do the predictions.

Let us understand the meaning of features, labels, and instances before diving deep into machine learning. You can frame the business problem into a machine learning problem if you can identify features, labels, and instances. Let’s understand these with the spam filter example again.

Say our data contains subject, sender and if the email is spam or not. We feed this data to the algorithm to train the model. And then later we use the trained model to classify if a new incoming email is a spam or not.

Here subject and sender are “features”. You can think of features as attributes of an object. For example, color, size, weight and taste are the attributes of an an apple.

A label is a column which we have to predict for the unknown data. In case of email filter, the algorithm classifies if we the incoming emails are spam or ham so “Is Spam” is a label.

Each row of the input data is called “instance”. In this data, there are 3 rows so the number of instances is 3.

We will learn more about features, labels, and instances later in the chapter.

Here subject and sender are “features”. You can think of features as attributes of an object. For example, color, size, taste are attributes of an apple.

For example, color, size, taste are attributes of an apple.

A label is a column which the model predicts. In the case of email filter, the algorithm classifies if the incoming emails are spam or ham so “Is Spam” is a label.

Each row of the input data is called and “instance”. In this data, there are 3 rows so the number of instances are 3.

We will learn more about features, labels, and instances later in the chapter again.