{"id":226,"date":"2016-10-10T13:16:08","date_gmt":"2016-10-10T13:16:08","guid":{"rendered":"http:\/\/blog.cloudxlab.com\/?p=226"},"modified":"2017-09-15T06:20:14","modified_gmt":"2017-09-15T06:20:14","slug":"access-s3-files-spark","status":"publish","type":"post","link":"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/","title":{"rendered":"Access S3 Files in Spark"},"content":{"rendered":"<p>In this blog post we will learn how to access S3 Files using Spark on CloudxLab.<br \/>\nPlease follow below steps to access S3 files:<\/p>\n<pre class=\"toolbar:2 toolbar-overlay:false toolbar-hide:false toolbar-delay:false show-title:false nums-toggle:false wrap-toggle:false plain:false plain-toggle:false popup:false lang:sh decode:true\">#Login to Web Console\r\n\r\n#Specify the hadoop config\r\nexport YARN_CONF_DIR=\/etc\/hadoop\/conf\/\r\nexport HADOOP_CONF_DIR=\/etc\/hadoop\/conf\/\r\n\r\n#Specify the Spark Class Path\r\nexport SPARK_CLASSPATH=\"$SPARK_CLASSPATH:\/usr\/hdp\/current\/hadoop-client\/hadoop-aws.jar\"\r\nexport SPARK_CLASSPATH=\"$SPARK_CLASSPATH:\/usr\/hdp\/current\/hadoop-client\/lib\/aws-java-sdk-1.7.4.jar\"\r\nexport SPARK_CLASSPATH=\"$SPARK_CLASSPATH:\/usr\/hdp\/current\/hadoop-client\/lib\/guava-11.0.2.jar\"\r\n\r\n#Launch Spark Shell\r\n\/usr\/spark1.6\/bin\/spark-shell\r\n\r\n#On the spark shell Specify the AWS Key\r\nsc.hadoopConfiguration.set(\"fs.s3n.awsAccessKeyId\", \"YOUR_AWS_ACCESS_KeY\")\r\nsc.hadoopConfiguration.set(\"fs.s3n.awsSecretAccessKey\", \"YOUR_AWS_SECRET_ACCESS_KeY\")\r\n\r\n#Now Access s3 files using spark\r\n#Create RDD out of s3 file\r\nval nationalNames = sc.textFile(\"s3n:\/\/cxl-spark-test-data\/sss\/baby-names.csv\")\r\n\r\n#Just check the first line\r\nnationalNames.take(1)\r\n<\/pre>\n<div id=\"cxl-affiliate\"><\/div>\n<p><script>\/\/ <![CDATA[ (function(a, b, c) { affiliate_code = \"CW7CF05WBUZ55SHRSH8R\"; lab = \"hadoop\"; s = b.createElement('script'); s.type = 'text\/javascript'; s.src = \"\/\/s3.amazonaws.com\/cloudxlab\/embed\/affl-without-analytics.js\"; s.async = 1; (b.head || b.body).appendChild(s); } (window, document)); \/\/ ]]><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog post we will learn how to access S3 Files using Spark on CloudxLab. Please follow below steps to access S3 files: #Login to Web Console #Specify the hadoop config export YARN_CONF_DIR=\/etc\/hadoop\/conf\/ export HADOOP_CONF_DIR=\/etc\/hadoop\/conf\/ #Specify the Spark Class Path export SPARK_CLASSPATH=&#8221;$SPARK_CLASSPATH:\/usr\/hdp\/current\/hadoop-client\/hadoop-aws.jar&#8221; export SPARK_CLASSPATH=&#8221;$SPARK_CLASSPATH:\/usr\/hdp\/current\/hadoop-client\/lib\/aws-java-sdk-1.7.4.jar&#8221; export SPARK_CLASSPATH=&#8221;$SPARK_CLASSPATH:\/usr\/hdp\/current\/hadoop-client\/lib\/guava-11.0.2.jar&#8221; #Launch Spark Shell \/usr\/spark1.6\/bin\/spark-shell #On the spark shell &hellip; <a href=\"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Access S3 Files in Spark&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[14],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v16.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Access S3 Files in Spark | CloudxLab Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Access S3 Files in Spark | CloudxLab Blog\" \/>\n<meta property=\"og:description\" content=\"In this blog post we will learn how to access S3 Files using Spark on CloudxLab. Please follow below steps to access S3 files: #Login to Web Console #Specify the hadoop config export YARN_CONF_DIR=\/etc\/hadoop\/conf\/ export HADOOP_CONF_DIR=\/etc\/hadoop\/conf\/ #Specify the Spark Class Path export SPARK_CLASSPATH=&quot;$SPARK_CLASSPATH:\/usr\/hdp\/current\/hadoop-client\/hadoop-aws.jar&quot; export SPARK_CLASSPATH=&quot;$SPARK_CLASSPATH:\/usr\/hdp\/current\/hadoop-client\/lib\/aws-java-sdk-1.7.4.jar&quot; export SPARK_CLASSPATH=&quot;$SPARK_CLASSPATH:\/usr\/hdp\/current\/hadoop-client\/lib\/guava-11.0.2.jar&quot; #Launch Spark Shell \/usr\/spark1.6\/bin\/spark-shell #On the spark shell &hellip; Continue reading &quot;Access S3 Files in Spark&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"CloudxLab Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cloudxlab\" \/>\n<meta property=\"article:published_time\" content=\"2016-10-10T13:16:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2017-09-15T06:20:14+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:site\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"1 minute\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"CloudxLab Blog\",\"description\":\"Learn AI, Machine Learning, Deep Learning, Devops &amp; Big Data\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/cloudxlab.com\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/#webpage\",\"url\":\"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/\",\"name\":\"Access S3 Files in Spark | CloudxLab Blog\",\"isPartOf\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\"},\"datePublished\":\"2016-10-10T13:16:08+00:00\",\"dateModified\":\"2017-09-15T06:20:14+00:00\",\"author\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/0efa3c54df68406de820ea466f002d3c\"},\"breadcrumb\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/access-s3-files-spark\/#webpage\"}}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/0efa3c54df68406de820ea466f002d3c\",\"name\":\"Abhinav Singh\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fc74fe31169bf872f6ab11bbab621d53?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fc74fe31169bf872f6ab11bbab621d53?s=96&d=mm&r=g\",\"caption\":\"Abhinav Singh\"},\"sameAs\":[\"https:\/\/cloudxlab.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/226"}],"collection":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/comments?post=226"}],"version-history":[{"count":2,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/226\/revisions"}],"predecessor-version":[{"id":754,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/226\/revisions\/754"}],"wp:attachment":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media?parent=226"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/categories?post=226"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/tags?post=226"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}