Spark On Cluster

20 / 23

Apache Spark - Running On Cluster - Deployment Modes

Now, another question that arises is where does Driver run? Like the executors, the driver can also run inside the cluster.

Based on where the driver is deployed, we define two deployment modes - Client and cluster.

If we are running the driver program locally, we call it client deployment mode.

If the driver is also running inside the cluster on one of the worker machines, we call it cluster deployment mode.

The diagram above shows that the driver application is running on the local machine or we are having client deployment mode. But the spark application or executors are running inside the cluster.

The example we ran earlier while discussing yarn resource manager was having client deployment mode.

If driver application shuts down the process is killed. This mode does not have resilience but it is quicker to run.

In the architecture diagram, we are running the driver inside the cluster on one of the nodes. If we are using YARN, the driver would be running inside one of the containers.

If launcher shuts down, the process continues like a batch process in background.

This is the preferred way to run the long running processes

Now, let us run the same example of computing PI in cluster mode.

Here we are first exporting the two environment variables having the hadoop's configuration directory. These variables are utlized by spark to locate the hadoop configuration and then using configuration spark figure out yarn and hdfs details.

Then we run the spark-submit with addition argument deploy-mode with value as cluster.

Now, let us run the same example of computing PI in cluster mode.

First login to Cloudxlab webconsole or login using ssh.

Then export the two variable YARN_CONF_DIR and HADOOP_CONF_DIR.

Now, lets launch the spark-submit command. Once finished, open the tracking url. When you open the tracking url, it would redirect to internal hostname. Find out the corresponding public domain name using "IP Mappings" tab of "My lab". The replace the private hostname with this hostname in the url and open. You can see that the complete details about job. Go to the "Environment" Tab and scroll down to deploy mode. You can see that the job was executed in cluster mode.

Apache Spark - Running On Cluster


No hints are availble for this assesment

Answer is not availble for this assesment

Loading comments...