Learn Python, NumPy, Pandas, Scikit-learn, HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume, Sqoop, Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, Regression, Clustering, Classification, SVM, Random Forests, Decision Trees, Dimensionality Reduction, TensorFlow 2, Keras, Convolutional & Recurrent Neural Networks, Autoencoders, and Reinforcement Learning
In this chapter, we learn the basics of Big Data which include various concepts, use-cases and understanding of the eco-system.
This chapter doesn't require any knowledge of programming or technology. We believe it is very useful for every to learn the basics of Big Data. So, jump in!
Whenever you make a request to a web server for a page, it records it in a file which is called logs.
The logs of a webserver are the gold mines for gaining insights in the user behaviour. Every data scientists usually look at the logs first to understand the behaviour of the users. But since the logs are humongous in size, it takes a distributed framework like Hadoop or Spark to process it.
As part of this project, you will learn to parse the text data stored in logs of a web server using the Apache Spark.
Learn how to write Spark applications.