Apache Spark Basics

Home
Assessment

38 / 89

Apache Spark - Wordcount with spark-shell (scala spark shell)

In this exercise, we are going to learning how to perform wordcount using spark.

Step 1: Start the spark shell using following command and wait for prompt to appear

spark-shell

Step 2: Create RDD from a file in HDFS, type the following on spark-shell and press enter:

var linesRDD = sc.textFile("/data/mr/wordcount/input/big.txt")

Step 3: Convert each record into word

var wordsRDD = linesRDD.flatMap(_.split(" "))

Step 3: Convert each word into key-value pair

var wordsKvRdd = wordsRDD.map((_, 1))

Step 3: Group By key and perform aggregation on each key:

var wordCounts = wordsKvRdd.reduceByKey(_ + _ )

Step 3: Save the results into HDFS:

wordCounts.saveAsTextFile("my_spark_shell_wc_output")

XP

Taking you to the next exercise in seconds...

Want to create exercises like this yourself? Click here.

Result

Checking Please wait.

Success

Error

No hints are availble for this assesment

Answer is not availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here

Previous Index Next

Please login to comment

0 Comments

There are 72 new comments.

Apache Spark Basics

1 Apache Spark ecosystem walkthrough

2 Apache Spark - What is not true about Apache spark?...

3 Apache Spark - What are advantages of using spark?...

4 Apache Spark - If you already have Apache Hadoop cluster setup, you can...

5 Apache Spark - If you need to process data continuously, which library you...

6 Apache Spark - If you need to provide a graph NoSQL storage, you...

7 Apache Spark - Spark does not provide API in which language?...

8 Apache Spark - Which of the following tasks are not possible using Apache...

9 Spark Introduction - Why Spark?

10 Apache Spark - Why is Spark faster than Hadoop?...

11 Apache Spark - Which of the following list is in increasing order in...

12 Getting Started with Spark using CloudxLab

13 Getting Started with Spark - Cluster Installation (optional)

14 Apache Spark - The command for running scala spark interactive shell is:...

15 Apache Spark - The command for talking to Spark using R with the interactive shell on CloudxLab is

16 Apache Spark - Which of the following is not a valid spark shell?...

17 Apache Spark - In Spark 1.x, which objects are provided by the spark-shell...

18 Apache Spark - In Spark 2.x, which objects are provided by the spark-shell...

19 Apache Spark - Whatever command you run on spark-shell or pyspark, they are...

20 Apache Spark - Which of the following is scala code thats reads data...

21 Apache Spark - Which of the following is python code thats reads data...

22 Apache Spark - For running something unattended, which command would you use?...

23 Spark Introduction - What is RDD

24 Apache Spark - The fullform of RDD is...

25 Apache Spark - Which of the following is not true about RDD?...

26 Apache Spark - In case we are creating RDD using sc.textFile(), what does...

27 Apache Spark - In case, we are creating RDD using sc.parallelize(Array(1,2,3,4)), what would...

28 Apache Spark - An RDD is not analogous to:...

29 Apache Spark - Each partition is maintained and processed by:...

30 Apache Spark - The work interacting with Spark Applications is done by:...

31 Apache Spark - When we launch an spark-shell or pyspark, it also launches:...

32 Apache Spark - We can modify an RDD?...

33 Apache Spark - Creating RDD

34 Apache Spark - Which of the following is not a way of creating...

35 Apache Spark - Is this method of creating rdd correct: val myrdd =...

36 Apache Spark - To get first 10 elements of an rdd myrdd, which...

37 Apache Spark - Counting Word Frequencies

38 Apache Spark - Wordcount with spark-shell (scala spark shell)

39 Apache Spark - Transformations - map & filter

40 Apache Spark - The operations provided on RDD are classified as:...

41 Apache Spark - What is not true about transformations?...

42 Apache Spark - The argument to map and filter is a function....

43 Apache Spark - What is not true about map transformations?...

44 Apache Spark - We can not implement the following with map:...

45 Apache Spark - Is this a good way to find elements starting with...

46 Apache Spark - We can not implement the following with filter:...

47 Apache Spark - If the following code returned true, someop was fiter or...

48 Apache Spark - Actions - take & saveTextFile

49 Apache Spark - What is not true about action?...

50 Apache Spark - What is not true about saveAsTextFile(arg)?...

51 Apache Spark - Lazy Evaluation & Lineage Graph

52 Apache Spark - Which of the following has lazy evaluation?...

53 Apache Spark - Which of the following gets executed immediately?...

54 Apache Spark - Which of the following gets executed lazily?...

55 Apache Spark - In the following code, which statements would get evaluated: 1....

56 Apache Spark - What is not true about lazy evaluation?...

57 Apache Spark - More Operations - Transformations & Actions

58 Apache Spark - Which one is equivalent of Hadoop's map phase?...

59 Apache Spark - Which one can be emulated with flatMap()?...

60 Apache Spark - The number of records in various transformations- M&FM

61 Apache Spark - The number of records in various transformations - F&M

62 Apache Spark - To concatenate two RDDs, we use:...

63 Apache Spark - Which one of these are not executed in distributed fashion?...

64 Apache Spark - Reduce, Commutative & Associative

65 Apache Spark - The reduce function can not be used to compute which...

66 Apache Spark - Which function is not commutative?...

67 Apache Spark - Which function is not associative?...

68 Apache Spark - Which statement about the function passed to reduce in case...

69 Apache Spark - Problem Solving - Compute Average

70 Apache Spark - Slides

71 Apache Spark - More RDD Operations

72 Apache Spark - More RDD Operations - Slides

73 More RDD Ops - Does sample transformation involve sorting?...

74 More RDD Ops - If you want to process the whole partition, which function...

75 More RDD Ops - To order data in an RDD, which function do we...

76 More RDD Ops - On what basis does sortBy transformation orders the data?...

77 More RDD Ops - What is the role of the third argument numPartitions of...

78 More RDD Ops - Can we sort key-value RDD or PairRDD using sortBy function?...

79 More RDD Ops - Is every RDD a set by default?...

80 More RDD Ops - What does union transformation do?...

81 More RDD Ops - What transformation to use to find common elements between two...

82 More RDD Ops - If we need have two RDDs, adjectives = ["good", "bad"]...

83 More RDD Ops - If we need to reduce an RDD into a value...

84 More RDD Ops - There is an RDD r having numbers. Which of the...

85 More RDD Ops - There is an RDD r having decimal numbers. Which one...

86 More RDD Ops - If we have to find count of each unique word...

87 More RDD Ops - If you want to process each record of an RDD...

88 More RDD Ops - Say, we want to persist records of an RDD into...

89 More RDD Ops - Does top(n) action involve shuffling?...