Apache Spark Basics

57 / 89

Apache Spark - More Operations - Transformations & Actions




Not able to play video? Try with youtube

Edit: at 2:48 to 2:50, it should be flatMap instead of map in var dbls = nums.map(MultiplyByTwo)

INSTRUCTIONS
  • flatMap

    To convert one record of an RDD into multiple records

    var linesRDD = sc.parallelize( Array("this is a dog", "named jerry"))
    def toWords(line:String):Array[String]= line.split(" ")
    var wordsRDD = linesRDD.flatMap(toWords)
    wordsRDD.collect()
    
    • Using Map

      var linesRDD = sc.parallelize( Array("this is a dog", "named jerry")) def toWords(line:String):Array[String]= line.split(" ") var wordsRDD1 = linesRDD.map(toWords) wordsRDD1.collect()

    • flatMap as Map

      val arr = 1 to 10000 val nums = sc.parallelize(arr) def multiplyByTwo(x:Int) = Array(x*2) multiplyByTwo(5)

    Write the following commands in a new cell:

    var dbls = nums.flatMap(multiplyByTwo);
    dbls.take(5)
    
    • flatMap as filter

      var arr = 1 to 1000 var nums = sc.parallelize(arr) def isEven(x:Int):Array[Int] = { if(x%2 == 0) Array(x) else Array() }

    Write the following commands in a new cell:

    var evens = nums.flatMap(isEven)
    evens.take(3)
    
    • Transformations :: Union

      var a = sc.parallelize(Array('1','2','3')); var b = sc.parallelize(Array('A','B','C')); var c=a.union(b) c.collect();

    • Actions: saveAsTextFile()

    Saves all the elements into HDFS as text files.

    var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7));
    a.saveAsTextFile("myresult");
    

    Check the HDFS. There should myresult folder in your home directory.

    • Actions: collect()

      var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7)); a

    Write the following commands in a new cell:

    var localarray =  a.collect();
    localarray
    
    • Actions: take()

      var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7)); var localarray = a.take(4); localarray

    • Actions: count()

      var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7), 3); var mycount = a.count(); mycount


Please login to comment

2 Comments

Getting error on executing following lines of code:
l={1,3,5}
rdd=sc.parallelize(l)
rdd2=rdd.flatMap(lambda x:x%2==0)
Replacing flapMap with map transformation is giving output.
Can anyone please explain why it is so?

  Upvote    Share

Hi

It works fine for me. What error are you getting?

  Upvote    Share