{"id":1133,"date":"2018-01-09T11:46:11","date_gmt":"2018-01-09T11:46:11","guid":{"rendered":"http:\/\/blog.cloudxlab.com\/?p=1133"},"modified":"2019-01-08T12:18:36","modified_gmt":"2019-01-08T12:18:36","slug":"streaming-twitter-data-using-flume","status":"publish","type":"post","link":"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/","title":{"rendered":"Streaming Twitter Data using Flume"},"content":{"rendered":"<p>In this blog post, we will learn how to stream Twitter data using Flume on CloudxLab<\/p>\n<p>For downloading tweets from Twitter, we have to configure Twitter App first.<\/p>\n<h2>Create Twitter App<\/h2>\n<h3 style=\"padding-left: 30px;\">Step 1<\/h3>\n<p style=\"padding-left: 30px;\">Navigate to <a href=\"https:\/\/apps.twitter.com\/\" target=\"_blank\" rel=\"noopener\">Twitter app URL\u00a0<\/a>and sign in with your Twitter account<\/p>\n<h3 style=\"padding-left: 30px;\">Step 2<\/h3>\n<p style=\"padding-left: 30px;\">Click on &#8220;Create New App&#8221;<\/p>\n<h3 style=\"padding-left: 30px;\"><img class=\"aligncenter size-large wp-image-1134\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.34.02-PM-1024x145.png\" alt=\"Create New App\" width=\"840\" height=\"119\" srcset=\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.34.02-PM-1024x145.png 1024w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.34.02-PM-300x43.png 300w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.34.02-PM-768x109.png 768w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.34.02-PM-1200x170.png 1200w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.34.02-PM.png 1439w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/h3>\n<p><!--more--><\/p>\n<h3 style=\"padding-left: 30px;\">Step 3<\/h3>\n<p style=\"padding-left: 30px;\">Provide Name, Description, and Website of your app. Check the &#8220;<span class=\"fieldset-legend\">Developer Agreement&#8221; checkbox and click on &#8220;Create your Twitter Application&#8221;<\/span><\/p>\n<p style=\"padding-left: 30px;\"><img class=\"aligncenter size-large wp-image-1135\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.35.46-PM-1024x624.png\" alt=\"Create an application form\" width=\"840\" height=\"512\" srcset=\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.35.46-PM-1024x624.png 1024w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.35.46-PM-300x183.png 300w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.35.46-PM-768x468.png 768w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.35.46-PM-1200x732.png 1200w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.35.46-PM.png 1250w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/p>\n<h3 style=\"padding-left: 30px;\">Step 4<\/h3>\n<p style=\"padding-left: 30px;\">After your application is successfully created, Twitter will show\u00a0Consumer Key,\u00a0Consumer Secret,\u00a0<span class=\"heading\">Access Token and\u00a0Access Token Secret. We will need these tokens to get tweets from Twitter. Please do not share these tokens and keys with others.<\/span><\/p>\n<p style=\"padding-left: 30px;\"><img class=\"aligncenter wp-image-1136 size-large\" title=\"Twitter app keys and tokens\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.40.24-PM-1024x604.png\" alt=\"Twitter app keys and tokens\" width=\"840\" height=\"495\" srcset=\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.40.24-PM-1024x604.png 1024w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.40.24-PM-300x177.png 300w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.40.24-PM-768x453.png 768w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.40.24-PM-1200x708.png 1200w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-3.40.24-PM.png 1260w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/p>\n<h2>Setup flume agent<\/h2>\n<h3 style=\"padding-left: 30px;\">Step 1<\/h3>\n<p style=\"padding-left: 30px;\">Login to <a href=\"https:\/\/cloudxlab.com\/faq\/28\/how-do-i-connect-to-cloudxlab-from-my-local-machine\" target=\"_blank\" rel=\"noopener\">web console<\/a><\/p>\n<h3 style=\"padding-left: 30px;\">Step 2<\/h3>\n<p style=\"padding-left: 30px;\">Create directory flume in your home folder in web console<\/p>\n<pre class=\"lang:sh decode:true\">mkdir flume<\/pre>\n<h3 style=\"padding-left: 30px;\">Step 3<\/h3>\n<p style=\"padding-left: 30px;\">Create flume.conf file copy paste the below code<\/p>\n<pre class=\"lang:default decode:true \">vi flume\/flume.conf<\/pre>\n<h3 style=\"padding-left: 30px;\">Step 4<\/h3>\n<p style=\"padding-left: 30px;\">Copy-paste below code in flume.conf<\/p>\n<pre class=\"lang:sh decode:true\">TwitterAgent.sources = Twitter\r\nTwitterAgent.channels = MemChannel\r\nTwitterAgent.sinks = HDFS\r\n\r\nTwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource\r\nTwitterAgent.sources.Twitter.channels = MemChannel\r\nTwitterAgent.sources.Twitter.consumerKey = xxxxxx\r\nTwitterAgent.sources.Twitter.consumerSecret = xxxxxxx\r\nTwitterAgent.sources.Twitter.accessToken = xxxxxxx\r\nTwitterAgent.sources.Twitter.accessTokenSecret = xxxxxx\r\nTwitterAgent.sources.Twitter.keywords = theinterview, 17YearsOfNash, Warnock, RioCompetition, cpfc, Palace, London, Christmas, New Years\r\n\r\n################## SINK #################################\r\nTwitterAgent.sinks.HDFS.channel = MemChannel\r\nTwitterAgent.sinks.HDFS.type = hdfs\r\nTwitterAgent.sinks.HDFS.hdfs.path = hdfs:\/\/\/user\/abhinav9884\/Tweets\r\nTwitterAgent.sinks.HDFS.hdfs.fileType = DataStream\r\nTwitterAgent.sinks.HDFS.hdfs.writeFormat = Text\r\n\r\nTwitterAgent.sinks.HDFS.hdfs.batchSize = 10\r\nTwitterAgent.sinks.HDFS.hdfs.rollSize = 0\r\nTwitterAgent.sinks.HDFS.hdfs.rollInterval = 600\r\nTwitterAgent.sinks.HDFS.hdfs.rollCount = 10000\r\n\r\n#################### CHANNEL #########################\r\nTwitterAgent.channels.MemChannel.type = memory\r\nTwitterAgent.channels.MemChannel.capacity = 100\r\n#default - TwitterAgent.channels.MemChannel.capacity = 100\r\nTwitterAgent.channels.MemChannel.transactionCapacity = 100<\/pre>\n<p style=\"padding-left: 30px;\">Replace\u00a0TwitterAgent.sources.Twitter.consumerKey,\u00a0TwitterAgent.sources.Twitter.consumerSecret,\u00a0TwitterAgent.sources.Twitter.accessToken and\u00a0TwitterAgent.sources.Twitter.accessTokenSecret with your keys and tokens<\/p>\n<p style=\"padding-left: 30px;\">Replace abhinav9884 with your CloudxLab username.<\/p>\n<p style=\"padding-left: 30px;\">Save the file and exit from editor<\/p>\n<h3 style=\"padding-left: 30px;\">Step 5<\/h3>\n<p style=\"padding-left: 30px;\">Run flume agent using below command. Replace abhinav9884 with your CloudxLab username<\/p>\n<pre class=\"lang:sh decode:true\">flume-ng agent -n TwitterAgent -Dtwitter4j.streamBaseURL=https:\/\/stream.twitter.com\/1.1\/ -c conf -f \/home\/abhinav9884\/flume\/flume.conf<\/pre>\n<h3 style=\"padding-left: 30px;\">Step 6<\/h3>\n<p style=\"padding-left: 30px;\">Check the Twitter data in HDFS.\u00a0 There will be files with name FlumeData.* inside Tweets directory in your home directory in HDFS<\/p>\n<pre class=\"lang:sh decode:true \">hadoop fs -ls Tweets\/<\/pre>\n<p><img class=\"aligncenter size-large wp-image-1143\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-4.44.54-PM-1.png\" alt=\"\" width=\"840\" height=\"113\" srcset=\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-4.44.54-PM-1.png 937w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-4.44.54-PM-1-300x40.png 300w, https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/Screen-Shot-2018-01-09-at-4.44.54-PM-1-768x103.png 768w\" sizes=\"(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/p>\n<p style=\"padding-left: 30px;\">We can see tweets with below command. Replace\u00a0FlumeData.1515474234091 with file inside your Tweets directory<\/p>\n<pre class=\"lang:default decode:true\" style=\"padding-left: 30px;\">hadoop fs -cat Tweets\/FlumeData.1515474234091<\/pre>\n<h3>Step 7<\/h3>\n<p>Kill the flume agent once you are done by pressing &#8220;Ctrl + c&#8221;.<\/p>\n<p>In this blog post, we learned how to stream Twitter data using Flume and store it on HDFS. Hope you liked the blog post.Please feel free to leave your comments<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog post, we will learn how to stream Twitter data using Flume on CloudxLab For downloading tweets from Twitter, we have to configure Twitter App first. Create Twitter App Step 1 Navigate to Twitter app URL\u00a0and sign in with your Twitter account Step 2 Click on &#8220;Create New App&#8221;<\/p>\n","protected":false},"author":1,"featured_media":1152,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[24,14],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v16.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Streaming Twitter Data using Flume | CloudxLab Blog<\/title>\n<meta name=\"description\" content=\"In this guide, we will learn how to stream Twitter tweets using Flume and store it on HDFS using CloudxLab. This guide contains the Flume code and steps\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Streaming Twitter Data using Flume | CloudxLab Blog\" \/>\n<meta property=\"og:description\" content=\"In this guide, we will learn how to stream Twitter tweets using Flume and store it on HDFS using CloudxLab. This guide contains the Flume code and steps\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/\" \/>\n<meta property=\"og:site_name\" content=\"CloudxLab Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cloudxlab\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-09T11:46:11+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-01-08T12:18:36+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/flume-3.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:site\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"2 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"CloudxLab Blog\",\"description\":\"Learn AI, Machine Learning, Deep Learning, Devops &amp; Big Data\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/cloudxlab.com\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/flume-3.png\",\"contentUrl\":\"https:\/\/cloudxlab.com\/blog\/wp-content\/uploads\/2018\/01\/flume-3.png\",\"width\":1024,\"height\":512,\"caption\":\"Stream twitter data using flume and hdfs\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/#webpage\",\"url\":\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/\",\"name\":\"Streaming Twitter Data using Flume | CloudxLab Blog\",\"isPartOf\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/#primaryimage\"},\"datePublished\":\"2018-01-09T11:46:11+00:00\",\"dateModified\":\"2019-01-08T12:18:36+00:00\",\"author\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/0efa3c54df68406de820ea466f002d3c\"},\"description\":\"In this guide, we will learn how to stream Twitter tweets using Flume and store it on HDFS using CloudxLab. This guide contains the Flume code and steps\",\"breadcrumb\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/streaming-twitter-data-using-flume\/#webpage\"}}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/0efa3c54df68406de820ea466f002d3c\",\"name\":\"Abhinav Singh\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fc74fe31169bf872f6ab11bbab621d53?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fc74fe31169bf872f6ab11bbab621d53?s=96&d=mm&r=g\",\"caption\":\"Abhinav Singh\"},\"sameAs\":[\"https:\/\/cloudxlab.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/1133"}],"collection":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/comments?post=1133"}],"version-history":[{"count":11,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/1133\/revisions"}],"predecessor-version":[{"id":1266,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/1133\/revisions\/1266"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media\/1152"}],"wp:attachment":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media?parent=1133"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/categories?post=1133"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/tags?post=1133"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}