Running PySpark in Jupyter / IPython notebook

You can run PySpark code in Jupyter notebook on CloudxLab. The following instructions cover 2.2, 2.3 and 2.4 versions of Apache Spark.

What is Jupyter notebook?

The IPython Notebook is now known as the Jupyter Notebook. It is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media. For more details on the Jupyter Notebook, please see the Jupyter website.

Please follow below steps to access the Jupyter notebook on CloudxLab

To start python notbook, Click on “Jupyter” button under My Lab and then click on “New -> Python 3”

This code to initialize is also available in GitHub Repository here.

If you want to access Spark 2.2, use below code:

If you plan to use 2.3 version, please use below code to initialize

If you plan to use 2.4 version, please use below code to initialize

Now, initialize the entry points of Spark: SparkContext and SparkConf

Once you are successful in initializing the sc and conf, please use the below code to test