Apache Spark Basics

You are currently auditing this course.
57 / 89

Apache Spark - More Operations - Transformations & Actions




Not able to play video? Try with youtube

Edit: at 2:48 to 2:50, it should be flatMap instead of map in var dbls = nums.map(MultiplyByTwo)

INSTRUCTIONS
  • flatMap

    To convert one record of an RDD into multiple records

    var linesRDD = sc.parallelize( Array("this is a dog", "named jerry"))
    def toWords(line:String):Array[String]= line.split(" ")
    var wordsRDD = linesRDD.flatMap(toWords)
    wordsRDD.collect()
    
    • Using Map

      var linesRDD = sc.parallelize( Array("this is a dog", "named jerry")) def toWords(line:String):Array[String]= line.split(" ") var wordsRDD1 = linesRDD.map(toWords) wordsRDD1.collect()

    • flatMap as Map

      val arr = 1 to 10000 val nums = sc.parallelize(arr) def multiplyByTwo(x:Int) = Array(x*2) multiplyByTwo(5)

    Write the following commands in a new cell:

    var dbls = nums.flatMap(multiplyByTwo);
    dbls.take(5)
    
    • flatMap as filter

      var arr = 1 to 1000 var nums = sc.parallelize(arr) def isEven(x:Int):Array[Int] = { if(x%2 == 0) Array(x) else Array() }

    Write the following commands in a new cell:

    var evens = nums.flatMap(isEven)
    evens.take(3)
    
    • Transformations :: Union

      var a = sc.parallelize(Array('1','2','3')); var b = sc.parallelize(Array('A','B','C')); var c=a.union(b) c.collect();

    • Actions: saveAsTextFile()

    Saves all the elements into HDFS as text files.

    var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7));
    a.saveAsTextFile("myresult");
    

    Check the HDFS. There should myresult folder in your home directory.

    • Actions: collect()

      var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7)); a

    Write the following commands in a new cell:

    var localarray =  a.collect();
    localarray
    
    • Actions: take()

      var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7)); var localarray = a.take(4); localarray

    • Actions: count()

      var a = sc.parallelize(Array(1,2,3, 4, 5 , 6, 7), 3); var mycount = a.count(); mycount


Loading comments...