Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
In this project, we will learn about the basics of Apache Spark using Python. However, to run these codes there are a couple of things you need to do before you can continue further:
Note 1: To run the codes from this course, it is mandatory to first run the codes given in this slide.
Note 2: It is mandatory to run the codes on the default Jupyter notebook given on the right side of this split screen so that the assessment engine can detect and asses your code.
Note 3: Some of the steps throughout this course may depend on the output or successful execution of the previous steps. So it is mandatory to go through the course sequentially.
Note 4: Do not open more than one Jupyter notebook while you are completing this course, that would result in an error. If by mistake you do open more than one Jupyter notebook, close the other tabs, shutdown the kernel for this Jupyter notebook and then restart it to mitigate the error.
Here is a link to a post in our discussion forum which talks about various debugging steps that you yourself can go through to solve most of the issues that you might come across in this course. If these do not resolve your issue/error, please reach out to us by leaving a comment in the slide where you are facing the issue (preferably with a screenshot), and we would be more than happy to help.
Happy learning!
To appending the Spark and Python location to the path, please copy paste the code given below as-is on the right side of this split screen and run the same
import os
import sys
os.environ["SPARK_HOME"] = "/usr/hdp/current/spark2-client"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
os.environ["PYSPARK_PYTHON"] = "/usr/local/anaconda/bin/python"
os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/local/anaconda/bin/python"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.4-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")
Now, to initialize Spark you first need to import SparkContext
and SparkConf
from pyspark
. Then you can intialize the config
object using SparkConf
, and finally initialize SparkContext
with the variable sc
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("appName")
<<your code goes here>> = SparkContext(conf=conf)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...