Spark On Cluster

18 / 23

Apache Spark - Running On Cluster - Cluster Mode - Mesos+AWS

Mesos Is a general-purpose cluster manager

it runs both analytics workloads and long-running services (DBs)

To use Spark on Mesos, pass a mesos:// URI to spark-submit:

spark-submit --master mesos://masternode:5050 yourapp

You can use ZooKeeper to elect master in mesos in case of multi-master

Use a mesos://zk:// URI pointing to a list of ZooKeeper nodes.

Ex:, if you have 3 nodes (n1, n2, n3) having ZK on port 2181, use URI:

mesos://zk://n1:2181/mesos,n2:2181/mesos,n3:2181/mesos

Spark comes with a built-in script to launch clusters on Amazon EC2.

First create an Amazon Web Services (AWS) account

Obtain an access key ID and secret access key.

export these as environment variables:

export AWS_ACCESS_KEY_ID="..."

export AWS_SECRET_ACCESS_KEY="..."

Create an EC2 SSH key pair and download its private key file (helps in SSH)

Launch command of the spark-ec2 script:

cd /path/to/spark/ec2

./spark-ec2 -k mykeypair -i mykeypair.pem launch mycluster

Start with a local mode if this is a new deployment. To use richer resource scheduling capabilities (e.g., queues), use YARN and Mesos When sharing amongst many users is primary criteria, use Mesos In all cases, it is best to run Spark on the same nodes as HDFS for fast access to storage. You can either install Mesos or Standalone cluster on Datanodes Or Hadoop distributions already install YARN and HDFS together

Apache Spark - Running On Cluster


No hints are availble for this assesment

Answer is not availble for this assesment

Loading comments...