 # Apache Spark with Python - Problem Solving - Compute Average

In the last slide we saw that since average is not commutative, we could not use `reduce` directly to calculate the average on a set of numbers. So how do we calculate average using `reduce` in that case? Let's see.

INSTRUCTIONS
• First, let's define a set of elements for which we will be calculating the average, and store them in an RDD named `rdd`

``````<<your code goes here>> = sc.parallelize([1.0, 2, 3, 4, 5, 6, 7], 3)
``````
• Now let's calculate the average by using `reduce` to calculate the sum of the elements, and `count` to get the number of elements. Then we divided the sum by the count to get the average and then store the result in a new RDD called `avg`

``````avg = rdd.<<your code goes here>>(lambda x, y: x + y) / rdd.count()
``````

The average given here is `4.0` which is correct. However, this is not the correct approach since we are computing RDD twice - during reduce and during count. So, we will move to the next approach

• With the next approach, we will first translate all the values into a composite value such that each element of RDD represent a value along with how many elements have been summed up to reach this value. So we transform each element into a tuple with the value, and `1` which represents how many numbers have been added to reach the value (which is initially `1`). We will use `map` for this as shown below

``````rdd_count = rdd.<<your code goes here>>(lambda x: (x, 1))
``````
• Next, we will define a function `add_tuples` that will keep traversing the elements, and update both their sum and the number of elements that were summed up to reach this value and return a resulting tuple

``````def <<your code goes here>>(x, y):
return (x + y, x + y)
``````
• Now, we will use `reduce` with this function to return a sum of the values and their counts. We will store this in variables `sum` and `count`

``````(sum, count) = rdd_count.<<your code goes here>>(add_tuples)
``````
• Finally, we will calculate the average using these values

``````avg = sum / <<your code goes here>>
``````

This approach takes a significantly less amount of time than the previous one.

