Spark On Cluster

20 / 23

Apache Spark - Running On Cluster - Deployment Modes

Now, another question that arises is where does Driver run? Like the executors, the driver can also run inside the cluster.

Based on where the driver is deployed, we define two deployment modes - Client and cluster.

If we are running the driver program locally, we call it client deployment mode.

If the driver is also running inside the cluster on one of the worker machines, we call it cluster deployment mode.

The diagram above shows that the driver application is running on the local machine or we are having client deployment mode. But the spark application or executors are running inside the cluster.

The example we ran earlier while discussing yarn resource manager was having client deployment mode.

If driver application shuts down the process is killed. This mode does not have resilience but it is quicker to run.

In the architecture diagram, we are running the driver inside the cluster on one of the nodes. If we are using YARN, the driver would be running inside one of the containers.

If launcher shuts down, the process continues like a batch process in background.

This is the preferred way to run the long running processes

Now, let us run the same example of computing PI in cluster mode.

Here we are first exporting the two environment variables having the hadoop's configuration directory. These variables are utlized by spark to locate the hadoop configuration and then using configuration spark figure out yarn and hdfs details.

Then we run the spark-submit with addition argument deploy-mode with value as cluster.

Now, let us run the same example of computing PI in cluster mode.

First login to Cloudxlab webconsole or login using ssh.

Then export the two variable YARN_CONF_DIR and HADOOP_CONF_DIR.

Now, lets launch the spark-submit command. Once finished, open the tracking url. When you open the tracking url, it would redirect to internal hostname. Find out the corresponding public domain name using "IP Mappings" tab of "My lab". The replace the private hostname with this hostname in the url and open. You can see that the complete details about job. Go to the "Environment" Tab and scroll down to deploy mode. You can see that the job was executed in cluster mode.

Apache Spark - Running On Cluster


No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

28 Comments

Seeing error as below.Please advise

screenshot

  Upvote    Share

Hi, 

It's working fine on my end. Please use

spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi /usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

Thanks.

  Upvote    Share

The submit command does not provide any result on the console. The result like "Pi is roughly 3.1434751434751433" which used to get in local mode.

The instrcution is given - To check the status, use:
? http://e.cloudxlab.com:4040/ [4040 to be replaced by the spark port on which user is connected. In my case it is 4041. e.cloudxlab could be 'f' or some other letter. In my case it is 'f']
? http://a.cloudxlab.com:8088/cluster

What would be the status like? What is to be exactly checked and in which section of sparkUI? Image 1 and 2 depicts spark jobs and executors after execution of Yarn submit

8080 is giving error "connection time out. Please check image 3.

  Upvote    Share

1. Is there a possibility (from the architecture perspective) to have a seperate machine/node for driver application in cluster mode instead of using one of the worker node?

2. Is the standalone deployment mode default since we explicitely provided 'Cluster' mode?

spark-submit --master yarn --deploy-mode cluster --class

 

  Upvote    Share

Hi Punit,

It is not a good idea that the driver will run only on a specific machine/node in a production environment. If you think like this then what happens if that machine goes down, the whole system will down as there is no driver node.

So if we are using cluster mode then we also need to utilize this property as well.

 

Hope this helps you.

 1  Upvote    Share

Fine thanks Vikas

  Upvote    Share

HI,,

I don't see the value of Pi computed in the log.

Please advise.

 

  Upvote    Share

Hi,

Please let me know if this issue was not resolved.

Thanks.

  Upvote    Share

This comment has been removed.

@Sandeep -- Sir, When running in local mode with parameters as local[5], where in the command line do you specify the worker/node on which to run the driver,assuming in this case there are 5 nodes?

  Upvote    Share

Hi,

I ran this:

spark-submit
--master yarn --deploy-mode cluster --class
org.apache.spark.examples.SparkPi
/usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

I executed successfully. But I don't see pi result (3.14whatever) in the output. What's the point ?

 2  Upvote    Share

Hi Siddarth,
Please post full screenshot.

-- Sachin Giri

  Upvote    Share

export YARN_CONF_DIR=/etc/hadoop/conf/
export HADOOP_CONF_DIR=/etc/hadoop/conf/
spark-submit --master yarn --deploy-mode cluster --class
org.apache.spark.examples.SparkPi
/usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

getting error :Permission denied ?

  Upvote    Share

Hi Team,

The video is incomplete please fix it.

Thanks

  Upvote    Share

Hi,
Auctually the video is complete.
You need to run the below code only just to have a handson to run a job in cluster using Apache Spark.

spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi /usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

All the best!

  Upvote    Share

vedio is incomplete!!!!
please fix it ASAP.
DIDn't expect this..

  Upvote    Share

Hi,
Actually the video is complete.
You need to run the below code only just to have a handson to run a job in cluster using Apache Spark.

spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi /usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

All the best!

  Upvote    Share


spark-submit command not executing. Pl check screenshot attached.

  Upvote    Share

Hi Sandeep,
This video is not completed, It shows only execution of commands, but final output is not displayed.

  Upvote    Share

When running in cluster mode, where in the command line do you specify the worker/node on which to run the driver?

  Upvote    Share

Hi Harry,

Good Question!

If we are running the driver on cluster, the decision is taken by cluster manager such as YARN based on the availability of resources and we don't have control over it.

So, there is no option to specify a particular node for driver while running in the cluster mode.

  Upvote    Share

Hi Sandeep
Please configure spark to Spark 2.x version.Its now pointing to Saprk 1.5 version.
I am not able to run Spark 2 version program in cluster mode. Running spark2 in yarn mode getting below expection

java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig

Thanks
Kapil

  Upvote    Share

We are working on it.

  Upvote    Share

This should be rectified by now, right?

  Upvote    Share

Hi
This video is also not proper.
I am very disappointed.

  Upvote    Share

Hi Manoj,
Let me into these.

  Upvote    Share

Hi Manoj,

Thank you for letting us know. I have located the problem. There is a green screen.

Let me try to fix it.

Regards,
Sandeep Giri

  Upvote    Share

This has been fixed. Thanks to you manoj.

Please let us know if you face any other problems.

  Upvote    Share