# Apache Spark - More Operations - Transformations & Actions

Edit: at 2:48 to 2:50, it should be flatMap instead of map in `var dbls = nums.map(MultiplyByTwo)`

INSTRUCTIONS
• flatMap

To convert one record of an RDD into multiple records

``````var linesRDD = sc.parallelize( Array("this is a dog", "named jerry"))
def toWords(line:String):Array[String]= line.split(" ")
var wordsRDD = linesRDD.flatMap(toWords)
wordsRDD.collect()
``````
• Using Map

var linesRDD = sc.parallelize( Array("this is a dog", "named jerry")) def toWords(line:String):Array[String]= line.split(" ") var wordsRDD1 = linesRDD.map(toWords) wordsRDD1.collect()

• flatMap as Map

val arr = 1 to 10000 val nums = sc.parallelize(arr) def multiplyByTwo(x:Int) = Array(x*2) multiplyByTwo(5)

Write the following commands in a new cell:

``````var dbls = nums.flatMap(multiplyByTwo);
dbls.take(5)
``````
• flatMap as filter

var arr = 1 to 1000 var nums = sc.parallelize(arr) def isEven(x:Int):Array[Int] = { if(x%2 == 0) Array(x) else Array() }

Write the following commands in a new cell:

``````var evens = nums.flatMap(isEven)
evens.take(3)
``````
• Transformations :: Union

var a = sc.parallelize(Array('1','2','3')); var b = sc.parallelize(Array('A','B','C')); var c=a.union(b) c.collect();

• Actions: saveAsTextFile()

Saves all the elements into HDFS as text files.

``````var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7));
a.saveAsTextFile("myresult");
``````

Check the HDFS. There should myresult folder in your home directory.

• Actions: collect()

var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7)); a

Write the following commands in a new cell:

``````var localarray =  a.collect();
localarray
``````
• Actions: take()

var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7)); var localarray = a.take(4); localarray

• Actions: count()

var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7), 3); var mycount = a.count(); mycount