In the last slide we saw that since average is not commutative, we could not use `reduce`

directly to calculate the average on a set of numbers. So how do we calculate average using `reduce`

in that case? Let's see.

First, let's define a set of elements for which we will be calculating the average, and store them in an RDD named

`rdd`

`<<your code goes here>> = sc.parallelize([1.0, 2, 3, 4, 5, 6, 7], 3)`

Now let's calculate the average by using

`reduce`

to calculate the sum of the elements, and`count`

to get the number of elements. Then we divided the sum by the count to get the average and then store the result in a new RDD called`avg`

`avg = rdd.<<your code goes here>>(lambda x, y: x + y) / rdd.count()`

The average given here is

`4.0`

which is correct. However, this is not the correct approach since we are computing RDD twice - during reduce and during count. So, we will move to the next approachWith the next approach, we will first translate all the values into a composite value such that each element of RDD represent a value along with how many elements have been summed up to reach this value. So we transform each element into a tuple with the value, and

`1`

which represents how many numbers have been added to reach the value (which is initially`1`

). We will use`map`

for this as shown below`rdd_count = rdd.<<your code goes here>>(lambda x: (x, 1))`

Next, we will define a function

`add_tuples`

that will keep traversing the elements, and update both their sum and the number of elements that were summed up to reach this value and return a resulting tuple`def <<your code goes here>>(x, y): return (x[0] + y[0], x[1] + y[1])`

Now, we will use

`reduce`

with this function to return a sum of the values and their counts. We will store this in variables`sum`

and`count`

`(sum, count) = rdd_count.<<your code goes here>>(add_tuples)`

Finally, we will calculate the average using these values

`avg = sum / <<your code goes here>>`

This approach takes a significantly less amount of time than the previous one.

