Getting Started with various Tools

23 / 43

Spark

Purpose: Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Home Page: https://spark.apache.org/

Documentation: https://spark.apache.org/docs/latest/

Related resources to learn: https://cloudxlab.com/assessment/playlist-intro/17/apache-spark-basics?course_id=1&playlist_id=17

https://cloudxlab.com/blog/running-pyspark-jupyter-notebook/

How to get started:

  1. In the web console tab on the right side of the screen, type the following code to runs spark scala interactive command line

    spark-shell
    
  2. Type the following code to runs python spark interactive command line

    pyspark
    
  3. Type the following code to runs R on spark (/usr/spark2.6/bin/sparkR)

    sparkR
    
  4. Type the following code to submit a jar or python application for execution on cluster

     spark-submit
    
  5. Type the following code to runs the spark sql interactive shell

    spark-sql
    

No hints are availble for this assesment

Answer is not availble for this assesment

Loading comments...