Introduction to Big Data and Distributed Systems

14 / 22

Why do we need big data now?




Not able to play video? Try with vimeo

Paper, Tapes etc are Analog storage while CDs, DVDs, hard disk drives are considered digital storage.

This graph shows that the digital storage has started increasing exponentially after 2002 while Analog storage remained practically same.

The year 2002 is called beginning of the digital age.

Why so? The answer is two fold: Devices, Connectivity On one hand, the devices became cheaper, faster and smaller. Smart Phone is a great example. On another, the connectivity improved. We have wifi, 4G, Bluetooth, NFC etc.

This lead to a lot of very useful applications such as a very vibrant world wide web, social networks, and Internet of things leading to huge data generation.

Roughly, the computer is made of 4 components. 1. CPU - Which executes instructions. CPU is characterized by its speed. More the number of instructions it can execute per second, faster it is considered.

Then comes RAM. Random access memory. While processing, we load data into RAM. If we can load more data into ram, CPU can perform better. So, RAM has two main attributes which matter: Size and its speed of reading and writing.

To permanently store data, we need hard disk drive or solid state drive. The SSD is faster but smaller and costlier. The faster and bigger the disk, faster we can process data.

Another component that we frequently forget while thinking about the speed of computation is a network. Why? Often our data is stored on different machines and we need to read it over a network to process.

While processing Big Data at least one of these four components become the bottleneck. That's where we need to move to multiple computers or distributed computing architecture.


Loading comments...