Method 1: By Directly Loading a file from remote
var lines = sc.textFile("/data/mr/wordcount/input/big.txt")
Write the following command in a new cell:
lines.take(10)
Method 2: By distributing existing object
val arr = 1 to 10000
var nums = sc.parallelize(arr)
Write the following command in a new cell:
nums.take(10)
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
Please login to comment
6 Comments
scala> sc
<console>:18: error: not found: value sc
sc
It gives me sc not found .
Upvote ShareHi Troydon,
Can you please share screenshot of the error, so we can understand your issue better.
Upvote Shareis that take function displays the output of variable or is it been used to display the RDD contents.
Upvote ShareWhy don't you try it out?
Upvote Sharewhat exactly is the sc.parallelizer doing here with the array?
Upvote ShareThe sc. parallelize() method is the SparkContext's parallelize method to create a parallelized collection. This allows Spark to distribute the data across multiple nodes, instead of depending on a single node to process the data.
1 Upvote Share