# Apache Spark with Python - Reduce, Commutative & Associative

Now let's look at `reduce`. It aggregate elements of dataset using a function. `reduce` can be used as follows:

``````RDD.reduce(function)
``````

Please note that the function must be commutative and associative otherwise the results could be unpredictable and wrong. Also, the return type of function has to be same as argument.

Let's see how `reduce` works.

INSTRUCTIONS
• First, let's create an array of `101` numbers and save it in an RDD named `seq`

``````<<your code goes here>> = sc.parallelize(range(1, 101))
``````
• Now let's define a function named `sum` that takes `2` arguments, and returns the sum of those `2` numbers

``````def <<your code goes here>>(x, y):
return x+y
``````
• Now let's call this function using `reduce` to sum up the elements of the `seq` RDD, and then save it in `total`

``````total = seq.<<your code goes here>>(sum)
``````
• Let's print `total` and see the results

``````print(total)
``````
• This can be put in a simpler format using the following code

``````print(sc.parallelize(range(1, 101)).reduce(lambda x,y:x+y))
``````
• Let's check if we can use `reduce` to calculate average of a set of numbers since average is commutative and not associative. First, let define the set of numbers and store them in an RDD named `seq`

``````seq = sc.parallelize([3.0, 7, 13, 16, 19])
``````
• Now let's define the function `avg` which takes `2` numbers as arguments and returns their average

``````def <<your code goes here>>(x, y):
return ((x+y)/2)
``````
• Now let's use `reduce` to calculate the average of the elements of the `seq` RDD, and then save it in `total`

``````total = seq.<<your code goes here>>(avg)
``````
• Now let's print `total` and see the result

``````print(total)
``````

This is incorrect since the average for these sequence of numbers is `11.6`

No hints are availble for this assesment

Answer is not availble for this assesment

Note - Having trouble with the assessment engine? Follow the steps listed here