Spark

Purpose: Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Home Page: https://spark.apache.org/

Documentation: https://spark.apache.org/docs/latest/

https://cloudxlab.com/blog/running-pyspark-jupyter-notebook/

How to get started:

In the web console tab on the right side of the screen, type the following code to runs spark scala interactive command line
```
spark-shell
```
Type the following code to runs python spark interactive command line
```
pyspark
```
Type the following code to runs R on spark (/usr/spark2.6/bin/sparkR)
```
sparkR
```
Type the following code to submit a jar or python application for execution on cluster
```
 spark-submit
```
Type the following code to runs the spark sql interactive shell
```
spark-sql
```

Previous Index Next

Getting Started with various Tools

Spark

XP

Please login to comment

Be the first one to comment!