Spark On Cluster

18 / 23

Apache Spark - Running On Cluster - Cluster Mode - Mesos+AWS

Mesos Is a general-purpose cluster manager

it runs both analytics workloads and long-running services (DBs)

To use Spark on Mesos, pass a mesos:// URI to spark-submit:

spark-submit --master mesos://masternode:5050 yourapp

You can use ZooKeeper to elect master in mesos in case of multi-master

Use a mesos://zk:// URI pointing to a list of ZooKeeper nodes.

Ex:, if you have 3 nodes (n1, n2, n3) having ZK on port 2181, use URI:

mesos://zk://n1:2181/mesos,n2:2181/mesos,n3:2181/mesos

Spark comes with a built-in script to launch clusters on Amazon EC2.

First create an Amazon Web Services (AWS) account

Obtain an access key ID and secret access key.

export these as environment variables:

export AWS_ACCESS_KEY_ID="..."

export AWS_SECRET_ACCESS_KEY="..."

Create an EC2 SSH key pair and download its private key file (helps in SSH)

Launch command of the spark-ec2 script:

cd /path/to/spark/ec2

./spark-ec2 -k mykeypair -i mykeypair.pem launch mycluster

Start with a local mode if this is a new deployment. To use richer resource scheduling capabilities (e.g., queues), use YARN and Mesos When sharing amongst many users is primary criteria, use Mesos In all cases, it is best to run Spark on the same nodes as HDFS for fast access to storage. You can either install Mesos or Standalone cluster on Datanodes Or Hadoop distributions already install YARN and HDFS together

Apache Spark - Running On Cluster


No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

9 Comments

Please help in this 

  Upvote    Share

Hi Mohd,

These commands will not work on the lab as we do not have mesos. Also these commands should be run in the console not in spark shell

  Upvote    Share

Which cluster/resource manager is more efficient Yarn or Mesos with respect to Spark?

  Upvote    Share

Hi Punit,

Both YARN and Mesos are good for distributed resource management and they support a variety of workloads like MapReduce, Spark, Flink, Storm, etc. So there is no specific answer for which one is more efficient with respect to Spark.

Hope this helps.

  Upvote    Share

Thanks Abhinav

  Upvote    Share

1. Is there a MESOS setup on CloudXLab? Are there any examples for Mesos and EC2?

2. "Spark comes with a built-in script to launch clusters on Amazon EC2."

Is built script available for other cloud platforms like Azure, GCP etc.?

  Upvote    Share

> 1. Is there a MESOS setup on CloudXLab?

No.

>Are there any examples for Mesos and EC2?

No.

> 2. "Spark comes with a built-in script to launch clusters on Amazon EC2." Is built script available for other cloud platforms like Azure, GCP etc.?

Nop.

  Upvote    Share

ok..not fully understood

 1  Upvote    Share

What is your doubts,? don't spam.

  Upvote    Share