Introduction to Big Data and Distributed Systems

8 / 22

What is Big Data?




Not able to play video? Try with vimeo

What is Big Data?

In very simple words, Big Data is data of very big size which can not be processed with usual tools. And to process such data we need to be distributed architecture. This data could be structured or unstructured.

Generally, we classify the problems related to the handling of data into three buckets: Volume: When the problem we are solving is related to how we would store such huge data, we call it Volume. Examples of Volume are Facebook handling more than 500 TB data per day. Facebook is having 300 PB of data storage.

Velocity: When we are trying to handle many requests per second, we call this characteristic Velocity. The problems as the number of requests received by Facebook or Google per second is an example of Big Data due to Velocity.

Variety: If the problem at hand is complex or data that we are processing is complex, we call such problems as related to variety.

Imagine you have to find the fastest route on a map since the problem involves enumerating through many possibilities, it is a complex problem even though the map's size would not be too huge.

Data could be termed as Big Data if either Volume, Velocity or Variety becomes impossible to handle using traditional tools.


Loading comments...