Registrations Closing Soon for DevOps Certification Training by CloudxLab | Registrations Closing inEnroll Now
In this project, we will learn about the basics of Apache Spark using Python. However, to run these codes there are a couple of things you need to do before you can continue further:
Note 1: To run the codes from this course, it is mandatory to first run the codes given in this slide.
Note 2: It is mandatory to run the codes on the default Jupyter notebook given on the right side of this split screen so that the assessment engine can detect and asses your code.
Note 3: Some of the steps throughout this course may depend on the output or successful execution of the previous steps. So it is mandatory to go through the course sequentially.
Note 4: Do not open more than one Jupyter notebook while you are completing this course, that would result in an error. If by mistake you do open more than one Jupyter notebook, close the other tabs, shutdown the kernel for this Jupyter notebook and then restart it to mitigate the error.
Here is a link to a post in our discussion forum which talks about various debugging steps that you yourself can go through to solve most of the issues that you might come across in this course. If these do not resolve your issue/error, please reach out to us by leaving a comment in the slide where you are facing the issue (preferably with a screenshot), and we would be more than happy to help.
To appending the Spark and Python location to the path, please copy paste the code given below as-is on the right side of this split screen and run the same
import os import sys os.environ["SPARK_HOME"] = "/usr/hdp/current/spark2-client" os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib" os.environ["PYSPARK_PYTHON"] = "/usr/local/anaconda/bin/python" os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/local/anaconda/bin/python" sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.4-src.zip") sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")
Now, to initialize Spark you first need to import
pyspark. Then you can intialize the
config object using
SparkConf, and finally initialize
SparkContext with the variable
from pyspark import SparkContext, SparkConf conf = SparkConf().setAppName("appName") <<your code goes here>> = SparkContext(conf=conf)
No hints are availble for this assesment
Answer is not availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here