{"id":598,"date":"2017-07-18T05:17:01","date_gmt":"2017-07-18T05:17:01","guid":{"rendered":"http:\/\/blog.cloudxlab.com\/?p=598"},"modified":"2017-09-15T06:16:41","modified_gmt":"2017-09-15T06:16:41","slug":"graphframes-on-cloudxlab","status":"publish","type":"post","link":"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/","title":{"rendered":"GraphFrames on CloudxLab"},"content":{"rendered":"<p><a href=\"https:\/\/graphframes.github.io\/\">GraphFrames<\/a> is quite a useful library of spark which helps in bringing Dataframes and GraphX package together.<\/p>\n<p>From the website of Graphframes:<\/p>\n<blockquote><p>GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.<br \/>\n&#8212;<\/p><\/blockquote>\n<p>You can use graph frames very easily with spark-shell at CloudxLab by using &#8212;package option in the following way.<!--more--><\/p>\n<p>For spark-shell:<\/p>\n<pre class=\"theme:github toolbar:2 plain:false plain-toggle:false popup:false lang:default decode:true\">\/usr\/spark2.0.1\/bin\/spark-shell --packages graphframes:graphframes:0.4.0-spark2.0-s_2.11<\/pre>\n<p>For python spark shell:<\/p>\n<pre class=\"theme:github toolbar:2 plain:false plain-toggle:false popup:false lang:default decode:true\">\/usr\/spark2.0.1\/bin\/pyspark --packages graphframes:graphframes:0.4.0-spark2.0-s_2.11<\/pre>\n<p>When you launch the shell with the &#8211;packages argument, it is going to download graphframes and make available in the shell. Now, lets create a graph frame. Here is some example code (scala):<\/p>\n<pre class=\"theme:github toolbar:2 plain:false plain-toggle:false popup:false lang:default decode:true\">\/\/Lets import all classes of graphframes package\r\nimport org.graphframes._\r\nimport spark.sqlContext\r\n\r\n\/\/Create a DataFrame representing Vertices of the graph from a list of tuples using toDF function\r\n\/\/This dataframe has unique ID \"id\" column and other details.\r\nval v = sqlContext.createDataFrame(List(\r\n  (\"x\", \"Jack\", 34),\r\n  (\"y\", \"Jill\", 36),\r\n  (\"z\", \"Maggie\", 30)\r\n)).toDF(\"id\", \"name\", \"age\")\r\n\r\n\/\/Now we would create a dataframe representing the edges of the graph\r\n\/\/ Create an Edge DataFrame with \"src\" and \"dst\" columns\r\nval e = sqlContext.createDataFrame(List(\r\n  (\"x\", \"y\", \"follow\"), \/\/Jack follows Jill\r\n  (\"y\", \"z\", \"friend\"),\r\n  (\"z\", \"y\", \"follow\")\r\n)).toDF(\"src\", \"dst\", \"relationship\")\r\n\r\n\/\/ Now, with these two Dataframes, we can create a GraphFrame\r\nimport org.graphframes.GraphFrame\r\nval g = GraphFrame(v, e)\r\n\r\n\/\/ Query: Get in-degree of each vertex.\r\ng.inDegrees.show()\r\n<\/pre>\n<p>This would display the total in degrees of each vertex:<\/p>\n<pre class=\"theme:github toolbar:2 plain:false plain-toggle:false popup:false lang:default decode:true\">+---+--------+                                                                  \r\n| id|inDegree|\r\n+---+--------+\r\n|  z|       1|\r\n|  y|       2|\r\n+---+--------+\r\n<\/pre>\n<p>Now, lets try to filter. The following code would display the counts of edges that have follow relationship which 2.<\/p>\n<pre class=\"theme:github toolbar:2 plain:false plain-toggle:false popup:false lang:default decode:true\">\/\/ Query: Count the number of \"follow\" connections in the graph.\r\ng.edges.filter(\"relationship = 'follow'\").count()\r\n<\/pre>\n<p>Now, lets try to run the an algorithm such as pagerank on the graph.<\/p>\n<pre class=\"theme:github toolbar:2 plain:false plain-toggle:false popup:false lang:default decode:true\">\/\/ Run PageRank algorithm, and show results.\r\nval results = g.pageRank.resetProbability(0.01).maxIter(20).run()\r\nresults.vertices.select(\"id\", \"pagerank\").show()\r\n<\/pre>\n<p>After few iterations, it should display the page rank of each element as follows:<\/p>\n<pre class=\"theme:github toolbar:2 plain:false plain-toggle:false popup:false lang:default decode:true\">+---+-------------------+\r\n| id|           pagerank|\r\n+---+-------------------+\r\n|  x|               0.01|\r\n|  z|0.27995525261339177|\r\n|  y| 0.2808611427228327|\r\n+---+-------------------+\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>GraphFrames is quite a useful library of spark which helps in bringing Dataframes and GraphX package together. From the website of Graphframes: GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking &hellip; <a href=\"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;GraphFrames on CloudxLab&#8221;<\/span><\/a><\/p>\n","protected":false},"author":14,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[14],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v16.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>GraphFrames on CloudxLab | CloudxLab Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"GraphFrames on CloudxLab | CloudxLab Blog\" \/>\n<meta property=\"og:description\" content=\"GraphFrames is quite a useful library of spark which helps in bringing Dataframes and GraphX package together. From the website of Graphframes: GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking &hellip; Continue reading &quot;GraphFrames on CloudxLab&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/\" \/>\n<meta property=\"og:site_name\" content=\"CloudxLab Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cloudxlab\" \/>\n<meta property=\"article:published_time\" content=\"2017-07-18T05:17:01+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2017-09-15T06:16:41+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:site\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"2 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"CloudxLab Blog\",\"description\":\"Learn AI, Machine Learning, Deep Learning, Devops &amp; Big Data\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/cloudxlab.com\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/#webpage\",\"url\":\"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/\",\"name\":\"GraphFrames on CloudxLab | CloudxLab Blog\",\"isPartOf\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\"},\"datePublished\":\"2017-07-18T05:17:01+00:00\",\"dateModified\":\"2017-09-15T06:16:41+00:00\",\"author\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/4835f1b3d5000626cb15e9311d748e09\"},\"breadcrumb\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/graphframes-on-cloudxlab\/#webpage\"}}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/4835f1b3d5000626cb15e9311d748e09\",\"name\":\"Sandeep Giri\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1393214840cf7455bb4cba055cb30468?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1393214840cf7455bb4cba055cb30468?s=96&d=mm&r=g\",\"caption\":\"Sandeep Giri\"},\"sameAs\":[\"https:\/\/cloudxlab.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/598"}],"collection":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/comments?post=598"}],"version-history":[{"count":6,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/598\/revisions"}],"predecessor-version":[{"id":703,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/598\/revisions\/703"}],"wp:attachment":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media?parent=598"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/categories?post=598"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/tags?post=598"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}