Apache Spark Basics

33 / 89
INSTRUCTIONS
  • Method 1: By Directly Loading a file from remote

    var lines = sc.textFile("/data/mr/wordcount/input/big.txt")
    

    Write the following command in a new cell:

    lines.take(10)
    
  • Method 2: By distributing existing object

    val arr = 1 to 10000
    var nums = sc.parallelize(arr)
    

    Write the following command in a new cell:

    nums.take(10)
    

Please login to comment

6 Comments

scala> sc
<console>:18: error: not found: value sc
       sc

It gives me sc not found .  

  Upvote    Share

Hi Troydon,

Can you please share screenshot of the error, so we can understand your issue better.

  Upvote    Share

is that take function displays the output of variable or is it been used to display the RDD contents.

  Upvote    Share

Why don't you try it out?

  Upvote    Share

what exactly is the sc.parallelizer doing here with the array?

  Upvote    Share

The sc. parallelize() method is the SparkContext's parallelize method to create a parallelized collection. This allows Spark to distribute the data across multiple nodes, instead of depending on a single node to process the data.

 1  Upvote    Share