Welcome to a short session on MLLib the machine learning library of Spark.
Let us first understand what does it mean by Machine Learning. Machine learning essentially is programming computers to optimize the performance using Example Data or Past experience.
In the words of Arthur Samuel, Machine Learning is the Field of study that gives "computers the ability to learn without being explicitly programmed."
Let us understand the meaning of Machine Learning by the way of an example.
Have you played Mario? I am sure, most of you must have.
How much time did it take you to learn and win the princess? A couple of months?
Did anyone teach you? No, right?
Can you believe that a computer can also learn and play Mario by itself like you without any knowledge of the physical world?
Let's take a look at this automation.
In this setup, the bot will automatically learn and play Mario to maximize the score.
Here, the game Super Mario Brothers is being run inside a virtual machine or emulator. And a computer program or a bot is hooked to this emulator.
The computer program is observing this memory of emulator and pressing keys. So, this bot is observing the screen and pressing those six keys to maximize the score. The bot also observing the location in the memory where the score is kept. The single aim is to maximize the score.
Let us see the day 1.
It is pressing the keys randomly. Just jumping. Going nowhere. Mario dies of old age. After days of learning, it is a little better. Looks more like a toddler. It is able to look few frames into the future. But for some reason, it is walking backward.
After few more weeks of learning it seems to have no problem at all in playing the Mario. It is now able to cross the bigger hurdle.
After few more weeks of learning, it is an expert now at playing Mario. It is even better than most of the player.
You are able to see how it exploits the bug in the Mario game. What was that? The goomba jumped on Mario and it still got point - It figured out automatically that the moment goomba is about to touch it from the top, it starts moving downwards quickly and the game makes mistake in detecting the collision.
The program was able to discover such bugs or mistakes in the game which we could never do. You are about to see it exploiting another bug. What was that? It is about to fall but it saves itself. No idea how did he do that.
Next, the program was made to play other games, and for some games such as Tetris, it could not do much but you will observe another sign of intelligence. Here, when it is about to lose the game, it just pauses the game. It realizes that pausing is the best way to retain the score.
So, the program learned to play Mario and other games without any need of explicit programming. Of course, there was programming required to interact with the emulator but not in actually writing rules on how to play the individual game.
Question to you is: If we have to make this program learn any other games such as PacMan, what will we have to do? Would we have to write new rules per each game or do we just need to hook to the new game and let it play for a while.
The answer is 2. We just need to hook the program to the new game and let it play the game for a while.
Now this definition of machine learning would make sense.
So, machine learning is basically the branch of Artificial Intelligence used for design and development of Algorithms so that the computers evolve their behavior based on Empirical Data.
Taking you to the next exercise in seconds...