Login using Social Account
     Continue with GoogleLogin using your credentials
Note: In some of the videos of this course, you may notice that the instructor has used Scala instead of Python. Even though this course is based on Python, you will see as you progress along the course that you can also use Scala, and other languages with Apache Spark.
We do have a dedicated course for learning Apache Spark with Scala, you can explore our course page for more details on the same.
You may also notice use of Hue in some of the videos. We have deprecated Hue in our lab, you can follow this discussion on our forum for more details on the same.
What is Apache Spark?
Resource Managers - A cluster resource manager or resource manager is a software component which manages the various resources such as memory, disk, CPU of the machines connected in the cluster. - Apache Spark can run on top of many cluster resource managers such YARN, Amazon EC2 or Mesos. - If you don't have any resource managers yet, you can use Apache Spark in Standalone mode.
Sources - Instead of building own file or data storages, Apache spark made it possible to read from all kinds of data sources:
Libraries
Apache Spark comes with great set of libraries.
Spark and its libraries can be used with Scala, Java, Python, R, and SQL. The only exception is GraphX which can only be used with Scala and Java.
With these set of libraries, it is possible to do ETL, Machine Learning, Real time data processing and graph processing on Big Data.
We will cover each component in details as we go forward.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Loading comments...