Apache Spark - Running On Cluster - Cluster Mode - Standalone

The other mode is the one in which spark is run on real resource manager. It can be either run in standalone mode, on Mesos, on YARN or On EC2.

In standalone, it doesn't need resource manager such as YARN. Instead it uses the spark's built-in resource manager.

So If you do not have an existing cluster, you can use the spark's own built-in component to establish the cluster for distributing the computing. All, you would need to do is install Spark on all nodes. The inform of of the nodes about each other. Then launch the spark on all nodes. The spark nodes will discover each other and form a cluster and whatever you launch it would get distributed amongst many computer.

As outlined earlier, if you want to install your own standalone cluster. Here are more details.

Copy a compiled version of Spark to the same location on all your machines—for example, /home/yourname/spark.

Set up password-less SSH access from your master machine to the others.

Edit the conf/slaves file on your master and fill in the workers’ hostnames.

run sbin/start-all.sh on your master

Check http://masternode:8080

To stop the cluster, run bin/stop-all.sh on your master node.

Apache Spark - Running On Cluster