Apache Spark with Python - Pyspark Assessment

1 / 8

Initialize the spark with YARN as master

On the right-hand side is the Jupyter notebook of Python where you can type your Python code and run it by pressing SHIFT+ENTER.

Your task is to initialize spark entry point SparkContext with variable name sc. This spark context should be having the yarn as its master.

The usual steps are:

Step 1. Appending the Spark and Python location to the path

import os
import sys
os.environ["SPARK_HOME"] = "/usr/hdp/current/spark2-client"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
# In below two lines, use /usr/bin/python2.7 if you want to use Python 2
os.environ["PYSPARK_PYTHON"] = "/usr/local/anaconda/bin/python" 
os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/local/anaconda/bin/python"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.4-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")

Step 2. Initialize the Spark

from pyspark import SparkContext, SparkConf
# Initialize the config object, You will have to correct this line
conf = SparkConf().setAppName("appName")
# Create Spark Context
sc = SparkContext(conf=conf)

You have to change the step number 2 above such that the master is run on YARN. This is usually done by setting a property on the conf object.

Please note that you cannot initialize the spark twice. To do that you will have to restart the kernel from the menu.

INSTRUCTIONS

Please change the step# 2 such that the master is YARN.

Note: You can press tab after typing to see the members of a defined object. Also, you can use dir and help method on an object to understand the various method.



Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...