GraphFrames on CloudxLab

GraphFrames is quite a useful library of spark which helps in bringing Dataframes and GraphX package together.

From the website of Graphframes:

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

You can use graph frames very easily with spark-shell at CloudxLab by using —package option in the following way.

For spark-shell:

For python spark shell:

When you launch the shell with the –packages argument, it is going to download graphframes and make available in the shell. Now, lets create a graph frame. Here is some example code (scala):

This would display the total in degrees of each vertex:

Now, lets try to filter. The following code would display the counts of edges that have follow relationship which 2.

Now, lets try to run the an algorithm such as pagerank on the graph.

After few iterations, it should display the page rank of each element as follows: