Scala & Spark Instructor-led Sessions (8th July, 2017)

12 / 13

Session 10 - Hands On - July 01, 2017

Hands-On - Flume:

#Get a copy of sample flume conf from common data
hadoop fs -copyToLocal /data/flume/conf

# Change the port if needed and location in HDFS
nano conf/

#Launch the flume agent
flume-ng agent --conf conf --conf-file conf/ --name a1 Dflume.root.logger=INFO,console

# Open a new console and Connect to the same port that you defined in config
nc localhost 44443

# Generate some data 
Type something in the console

#Open a new console and Check in hdfs using 
hadoop fs -ls flume_webdata
hadoop fs -cat 'flume_webdata/FlumeData*'


#Get the detail of mysql server using "My Lab" tab

# Check the mysql: Connect
mysql -u sqoopuser -p -h ip-172-31-13-154 sqoopex

#Check the mysql: Explore The table in mysql using 
select * from widgets;

# Import - It might ask for password. Keep the password
sqoop import --connect jdbc:mysql:// --table widgets -m 2 --hive-import --username sqoopuser -P --hive-database sqoop_testing

#Start hive Client using "hive" command, On hive prompt
use sqoop_testing;
select * from widgets;


# In Hue, Get the location of mysql Jar

# Go to Workflows -> Editors

# Click on Create Button on right site

#Drag and drop sqoop1

# Set the password in following command and copy paste
# Also, change the HDFS absolute location
import --connect jdbc:mysql://ip-172-31-13-154:3306/sqoopex --username sqoopuser --password  --table widgets --target-dir hdfs:///user/sandeepgiri9034/widgets_import

# Add the files of mysql connector

# Save and Submit

# Open File Browser check if the files are created.

Kafka Streaming - Word Count from NC

# Terminal 1
nc -lk 9999

# Terminal 2

#copy code from the following URL

#Also, try to understand the code and correct the port

#Launch and paste copied code

# Go to Terminal 1
# type something

# See on Terminal 2
# Is something being printed?? Word counts should be printed.