Login using Social Account
     Continue with GoogleLogin using your credentials
On the right-hand side is the Jupyter notebook of Python where you can type your Python code and run it by pressing SHIFT+ENTER.
Your task is to initialize spark entry point SparkContext with variable name sc. This spark context should be having the yarn as its master.
The usual steps are:
Step 1. Appending the Spark and Python location to the path
import os
import sys
os.environ["SPARK_HOME"] = "/usr/hdp/current/spark2-client"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
# In below two lines, use /usr/bin/python2.7 if you want to use Python 2
os.environ["PYSPARK_PYTHON"] = "/usr/local/anaconda/bin/python"
os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/local/anaconda/bin/python"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.4-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")
Step 2. Initialize the Spark
from pyspark import SparkContext, SparkConf
# Initialize the config object, You will have to correct this line
conf = SparkConf().setAppName("appName")
# Create Spark Context
sc = SparkContext(conf=conf)
You have to change the step number 2 above such that the master is run on YARN. This is usually done by setting a property on the conf object.
Please note that you cannot initialize the spark twice. To do that you will have to restart the kernel from the menu.
Please change the step# 2 such that the master is YARN.
Note: You can press tab after typing to see the members of a defined object. Also, you can use dir
and help
method on an object to understand the various method.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...