Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left

  Apply Now

Apache Spark Basics

48 / 89

Apache Spark - Actions - take & saveTextFile




Not able to play video? Try with youtube

INSTRUCTIONS
  • Action example - take()

    val arr = 1 to 1000000
    val nums = sc.parallelize(arr)
    def multipleByTwo(x:Int):Int = x*2
    

    Write the following commands in a new cell:

    var dbls = nums.map(multipleByTwo);
    dbls.take(5)
    
  • Action example - saveAsTextFile()

    val arr = 1 to 1000
    val nums = sc.parallelize(arr)
    def multipleByTwo(x:Int):Int = x*2
    

    Write the following commands in a new cell:

    var dbls = nums.map(multipleByTwo);
    dbls.saveAsTextFile("mydirectory")
    

Note - In this video, we used Hue to access the results in HDFS. We have deprecated the Hue. Please use the below commands in the web console to access the files

  • Login to the web console
  • Check the files

    hadoop fs -ls  mydirectory
    
  • Check the content of the first part

    hadoop fs -cat mydirectory/part-00000 | more
    
  • Check the content of the second part

    hadoop fs -cat mydirectory/part-00001 | more
    

In case of any issues related to DiskQuota, feel free to visit https://discuss.cloudxlab.com/t/the-diskspace-quota-of-user-is-execeeded/5156


Loading comments...