As of today, the hottest jobs in the industry are around AI, Machine Learning and Deep Learning. Let me try to outline the learning path for you in machine learning for the job profiles such as Data Scientist, Machine Learning Engineer, AI Engineer or ML Researcher.
AI basically means Artificial Intelligence – Making machines behave like an intelligent being. AI is defined around its purpose. To achieve AI, we use various hardware and software. In software, we basically use two kinds of approaches: Rule-Based and Machine Learning based.
In the rule-based approach, the logic is coded by people by understanding the problem statement. In the machine learning approach, the logic is inferred using the data or experience.
There are various algorithms or approaches that are part of the machine learning such as linear regression (fitting a line), Support vector machines, decision trees, random forest, ensemble learning and artificial neural networks etc.
The artificial neural network-based algorithms have proven very effective in recent years. The area of machine learning that deals with a complex neural network is called Deep Learning.
As part of this post, I want to help you plan your learning path in Machine Learning.
If you are looking for a non-mathematical and light on coding approach, please go through the course on “AI for Managers“. It is a very carefully curated and a very unique course that deals with AI and Machine Learning for those who are looking for a less mathematical approach.
If you are planning to become the Data Scientist, Machine Learning Engineer or Machine Learning Researcher, please follow this learning path. This learning path is also covered completely in our Certification Course on Machine Learning Specialization
A. Foundations
1. Linux/Unix or Command Line
When you are planning a career in a stream such ML Engineer, you must make yourself comfortable with the basics of Linux or Unix or the command line (You will find the terminal in the Mac OS too). This helps you automate the various tasks as well as understand the various ideas of networking etc.
At CloudxLab, you can learn this for free in a hands on way here.
2. Python (Or a programming language)
There are many tools such as AzureML, that let you do machine learning without using any coding but honestly, if you are planning a career as an engineer, it would be better to learn to program. It not only helps in machine learning, but it helps you in various automation.
Also, once you know to program, you will be able to think better. You can learn Python for free here at CloudxLab in a very hands-on manner. Also, finish at least one project such as Project – Churn Emails Inbox with Python.
3. Data Cleaning & Analysis
The 95% of machine learning is all about cleaning data such as loading the data from different formats, parsing the text, adding more columns or converting unstructured data into rows and columns. Once you have very well framed your data in rows and columns such that each row represents an example and each column represents the features, you can use the ML algorithms which are available out of the box.
Once you have structured and cleaned data, you need to do basic analysis of the data to understand the various characteristics of it such as distribution, missing values etc. Further, you will have to visualize the data depending upon your knowledge of data. This is what comes under the Analytics or Data Analysis part.
Usually, Data cleaning and Analysis is performed using Pandas in Python. The end goal is to prepare the data in the form of Arrays of a library called NumPy. Afterwards, you can feed the data to the various algorithms. Pandas and Numpy are covered in Machine Learning Prerequisites and Pandas for Machine Learning at CloudxLab.
B. Getting Started into Machine Learning
This is your first step into machine learning. The objective should be to get started into machine learning instead of learning nuances of the underlying algorithms. This would make solve the problems using machine learning.
1. Basic concepts of ML
You should learn about the different kinds of machine learning problems such as supervised, unsupervised, reinforcement learning. Also, you should learn the difference between the rule-based approach and machine learning-based approach. This is discussed in Introduction to ML.
If we have to prepare a model using a dataset having labels along with images and this model has to predict the label given an image, it is called Supervised machine learning. If we are only given a set of images (and not the labels) and we have to group the images based on some similarity criteria, it is called Unsupervised learning.
If we have to build a model which acts in an environment to accomplish a goal it is called Reinforcement Learning. So, a robot roaming around in an environment to accomplish something is an example of reinforcement learning.
2. Build End-to-End machine learning projects
This is the most important part of machine learning. Learning to build an end-to-end project first specifically via a guided project. This would make you very confident. Usually, in courses, people do the project last but I recommend doing it the first. It is covered in End-to-End Project.
Once you are finished building a machine learning model, you will have a good understanding of Analytics and machine learning process. You would also learn the underfitting and overfitting by now. At this point, you may not understand the exact workings of various models which you will learn in the next chapters.
Just after this, you should try more regression projects such as Project – Forecast Bike Rentals.
C. Immerse into Classic ML
1. Understand underlying models
Once you have mastered the machine learning project, you should learn the working of the models such as Linear Regression, Logistic Regression, Decision Trees, Support vector machines and ensemble learning
In linear regression, you will learn how to draw a straight line through the data points such that the line can be used to predict the outcome. Please note that when there is only one characteristic or feature in data, we can draw the line. If there are many features in the example dataset, we draw a multidimensional plane called a hyperplane.
The logistic regression is basically classification (covered in Classification Topic at CloudxLab) i.e. separating one object from another. In logistic regression, we basically learn to separate the instances using a straight line or plane. You should complement your learning with more projects such as (Noise removal from images, Predicting Titanic Passenger Survival, Building Spam Classifier).
You will also learn how to train the models. This is covered in Training Models at CloudxLab.
In Decision Trees, we learn to build the decision tree – more like a binary tree. Every node represents the condition and the leaves represent the value. Decision Trees are used when the results need to be very interpretable. It is covered in Decision Trees topic at CloudxLab.
Support vector machine was very powerful before the neural networks came into the picture. In the support vector machine, we learn how to draw a maximum size lane that can separate the various class labels.
Ensemble learning is a really powerful technique – it improves the performance using other models. If you have multiple models which are performing nice, we can use all of those models together to get better results. Random Forest model is an ensemble of multiple decision trees. Further, gradient boosting is a form of ensemble learning whereby we learning to gradually improve the performance of a model by chaining it with another. It is covered in Ensemble Learning and XGBoost topic.
2. Unsupervised Learning and Dimensionality Reduction
The area of machine learning which deals with problems where the labels are not provided is called Unsupervised Learning. This is fairly complex and is a continuously improving are.
In dimensionality reduction, we learn about how can we reduce the dimensions i.e the features of our data. This helps in reducing the complexity and also overfitting. This is covered in Dimensionality Reduction topic.
You should be learning the various algorithms such as Kmean, K-Nearest Neighbours, T-SNE and more.
D. Deep Learning – Fundamentals
Once you have finished the classic machine learning. It is now to start with Artificial Neural Networks (ANNs). ANN is nothing but a kind of model which is able to deal with really complex problems easily. The ANNs are roughly based on the structure of animal brains. Artificial neural networks can be used for supervised and unsupervised machine learning. Since you are going to be using either Tensorflow or Pytorch, you should make yourself comfortable in the frame. A tutorial on Hands-On with TensorFlow is provided at CloudxLab.
1. Create Artificial Neural Network using Tensorflow
Learn the structure of ANN and learning build ANN and train the model using Tensorflow. You may not understand at this point how exactly it is learning but that you should cover shortly after this.
2. Understanding Deep Neural Networks
Further, you should learn how exactly the backpropagation algorithm works. Though there are libraries available to do the same I strongly suggest that you learn the mechanics of backpropagation. It would help you understand various complex ideas.
Further, learn about various challenges such as vanishing or exploding gradients in training neural networks and how to overcome those. Also, learn about various optimizers and how to create custom models.
E. Deep learning – Deep Dive
Once you are thorough with the fundamentals of ANN, you should deep dive into how to process images using Convolutional Neural Networks and how to process sequences using Recurrent neural networks. After RNN, you should try a times series project such as Predict the hourly rain gauge total.
Also, learn what it means by Autoencoders.
Learn Natural language processing using the classic machine learning models as well as using the deep learning methods. Learn what is word embedding and sentence embeddings.
Further, learn about Transformers and Wavenet and how these being used in translation and speech recognition.
Learn how can GAN – Generative Adversarial Neural network be used for generating image etc.
F. Reinforcement Learning
Reinforcement Learning is a complete subject in itself. It involves game theory, classic machine learning approaches and deep learning approaches.
You should learn about various environments such as OpenAI’s Gym and libraries such as TF-Agents. You should study various techniques such as Behavior replication, Markov Decision Process, Q-Learning, Deep Q-Learning and more.
Once you complete the entire course, you must try the various projects at CloudxLab. The learning is going to be life long!
If you are able to complete the above, please apply for the jobs.