Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
In the last slide we saw that since average is not commutative, we could not use reduce
directly to calculate the average on a set of numbers. So how do we calculate average using reduce
in that case? Let's see.
First, let's define a set of elements for which we will be calculating the average, and store them in an RDD named rdd
<<your code goes here>> = sc.parallelize([1.0, 2, 3, 4, 5, 6, 7], 3)
Now let's calculate the average by using reduce
to calculate the sum of the elements, and count
to get the number of elements. Then we divided the sum by the count to get the average and then store the result in a new RDD called avg
avg = rdd.<<your code goes here>>(lambda x, y: x + y) / rdd.count()
The average given here is 4.0
which is correct. However, this is not the correct approach since we are computing RDD twice - during reduce and during count. So, we will move to the next approach
With the next approach, we will first translate all the values into a composite value such that each element of RDD represent a value along with how many elements have been summed up to reach this value. So we transform each element into a tuple with the value, and 1
which represents how many numbers have been added to reach the value (which is initially 1
). We will use map
for this as shown below
rdd_count = rdd.<<your code goes here>>(lambda x: (x, 1))
Next, we will define a function add_tuples
that will keep traversing the elements, and update both their sum and the number of elements that were summed up to reach this value and return a resulting tuple
def <<your code goes here>>(x, y):
return (x[0] + y[0], x[1] + y[1])
Now, we will use reduce
with this function to return a sum of the values and their counts. We will store this in variables sum
and count
(sum, count) = rdd_count.<<your code goes here>>(add_tuples)
Finally, we will calculate the average using these values
avg = sum / <<your code goes here>>
This approach takes a significantly less amount of time than the previous one.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...