Writing Spark Applications

9 / 16
Process for Large Spark Projects

The approach outline earlier is great if you as a data scientist have to do it once and forget. But, if we had to do it every day as a scheduled task, we can not really use the interactive spark shell. Also, this code or work is unmanageable - we can track changes, there is not backup etc. Therefore, this approach is hardly used in real world environment.

Instead, it is done in the way briefly displayed below in the diagram. Lets try to understand the approach.

In smaller teams and projects, there may not be an Artifact Repository, so the workflow would look like this.

Process for large spark applications

The developer writes the code and unit test cases. Using build tool tests compiles and runs the code. Developer commits the code to source code repository. The testing is performed after checking out the code from source code repository.

Once testing is finished, the code is deployed on the production servers.

We are going with the approach shown in diagram in the following steps.