Spark On Cluster

20 / 23

Previous Index Next

Apache Spark - Running On Cluster - Deployment Modes

Now, another question that arises is where does Driver run? Like the executors, the driver can also run inside the cluster.

Based on where the driver is deployed, we define two deployment modes - Client and cluster.

If we are running the driver program locally, we call it client deployment mode.

If the driver is also running inside the cluster on one of the worker machines, we call it cluster deployment mode.

The diagram above shows that the driver application is running on the local machine or we are having client deployment mode. But the spark application or executors are running inside the cluster.

The example we ran earlier while discussing yarn resource manager was having client deployment mode.

If driver application shuts down the process is killed. This mode does not have resilience but it is quicker to run.

In the architecture diagram, we are running the driver inside the cluster on one of the nodes. If we are using YARN, the driver would be running inside one of the containers.

If launcher shuts down, the process continues like a batch process in background.

This is the preferred way to run the long running processes

Now, let us run the same example of computing PI in cluster mode.

Here we are first exporting the two environment variables having the hadoop's configuration directory. These variables are utlized by spark to locate the hadoop configuration and then using configuration spark figure out yarn and hdfs details.

Then we run the spark-submit with addition argument deploy-mode with value as cluster.

Now, let us run the same example of computing PI in cluster mode.

First login to Cloudxlab webconsole or login using ssh.

Then export the two variable YARN_CONF_DIR and HADOOP_CONF_DIR.

Now, lets launch the spark-submit command. Once finished, open the tracking url. When you open the tracking url, it would redirect to internal hostname. Find out the corresponding public domain name using "IP Mappings" tab of "My lab". The replace the private hostname with this hostname in the url and open. You can see that the complete details about job. Go to the "Environment" Tab and scroll down to deploy mode. You can see that the job was executed in cluster mode.

Apache Spark - Running On Cluster

Previous Index Next

Please login to comment

28 Comments

Pragya Shukla

4 years ago

Seeing error as below.Please advise

screenshot

Vagdevi K

4 years ago

Hi,

It's working fine on my end. Please use

spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi /usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

Thanks.

Punit Bhilota

4 years ago

The submit command does not provide any result on the console. The result like "Pi is roughly 3.1434751434751433" which used to get in local mode.

The instrcution is given - To check the status, use:
? http://e.cloudxlab.com:4040/ [4040 to be replaced by the spark port on which user is connected. In my case it is 4041. e.cloudxlab could be 'f' or some other letter. In my case it is 'f']
? http://a.cloudxlab.com:8088/cluster

What would be the status like? What is to be exactly checked and in which section of sparkUI? Image 1 and 2 depicts spark jobs and executors after execution of Yarn submit

8080 is giving error "connection time out. Please check image 3.

Punit Bhilota

4 years ago

1. Is there a possibility (from the architecture perspective) to have a seperate machine/node for driver application in cluster mode instead of using one of the worker node?

2. Is the standalone deployment mode default since we explicitely provided 'Cluster' mode?

spark-submit --master yarn --deploy-mode cluster --class

Vikas Gupta

4 years ago

Hi Punit,

It is not a good idea that the driver will run only on a specific machine/node in a production environment. If you think like this then what happens if that machine goes down, the whole system will down as there is no driver node.

So if we are using cluster mode then we also need to utilize this property as well.

Hope this helps you.

Punit Bhilota

4 years ago

Fine thanks Vikas

Soumyaranjan Mahapatra

5 years ago

HI,,

I don't see the value of Pi computed in the log.

Please advise.

Rajtilak Bhattacharjee

4 years ago

Hi,

Please let me know if this issue was not resolved.

Thanks.

This comment has been removed.

Ajinkya Gavi

5 years ago

@Sandeep -- Sir, When running in local mode with parameters as local[5], where in the command line do you specify the worker/node on which to run the driver,assuming in this case there are 5 nodes?

Siddharth

5 years ago

Hi,

I ran this:

spark-submit
--master yarn --deploy-mode cluster --class
org.apache.spark.examples.SparkPi
/usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

I executed successfully. But I don't see pi result (3.14whatever) in the output. What's the point ?

CloudxLab

5 years ago

Hi Siddarth,
Please post full screenshot.

-- Sachin Giri

Olivia Bhadani

5 years ago

export YARN_CONF_DIR=/etc/hadoop/conf/
export HADOOP_CONF_DIR=/etc/hadoop/conf/
spark-submit --master yarn --deploy-mode cluster --class
org.apache.spark.examples.SparkPi
/usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

getting error :Permission denied ?

Chandrashekar Garudkar

5 years ago

Hi Team,

The video is incomplete please fix it.

Thanks

Satyajit Das

5 years ago

Hi,
Auctually the video is complete.
You need to run the below code only just to have a handson to run a job in cluster using Apache Spark.

spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi /usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

All the best!

Anubhav Gupta

5 years ago

vedio is incomplete!!!!
please fix it ASAP.
DIDn't expect this..

Satyajit Das

5 years ago

Hi,
Actually the video is complete.
You need to run the below code only just to have a handson to run a job in cluster using Apache Spark.

spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi /usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

All the best!

Sanjay Ray

5 years ago

spark-submit command not executing. Pl check screenshot attached.

Dhirendra Bamrara

5 years ago

Hi Sandeep,
This video is not completed, It shows only execution of commands, but final output is not displayed.

Harry

6 years ago

When running in cluster mode, where in the command line do you specify the worker/node on which to run the driver?

Sandeep Giri

6 years ago

Hi Harry,

Good Question!

If we are running the driver on cluster, the decision is taken by cluster manager such as YARN based on the availability of resources and we don't have control over it.

So, there is no option to specify a particular node for driver while running in the cluster mode.

Kapil Tyagi

7 years ago

Hi Sandeep
Please configure spark to Spark 2.x version.Its now pointing to Saprk 1.5 version.
I am not able to run Spark 2 version program in cluster mode. Running spark2 in yarn mode getting below expection

java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig

Thanks
Kapil

Sandeep Giri

7 years ago

We are working on it.

Sandeep Giri

6 years ago

This should be rectified by now, right?

Manoj

8 years ago

Hi
This video is also not proper.
I am very disappointed.

Sandeep Giri

8 years ago

Hi Manoj,
Let me into these.

Sandeep Giri

8 years ago

Hi Manoj,

Thank you for letting us know. I have located the problem. There is a green screen.

Let me try to fix it.

Regards,
Sandeep Giri

Sandeep Giri

8 years ago

This has been fixed. Thanks to you manoj.

Please let us know if you face any other problems.