Apache Spark Basics with Python

39 / 86

Apache Spark with Python - Transformations - map & filter

There are many operations that can be performed on an RDD. These operations can be classified into 2 categories:

  1. Transformation
  2. Action

Transformation are those operations that translates one RDD into another new RDD without modifying the first one. There are many transformations available, here we will learn about map and filter.

map is a transformation that runs provided functions against each element of an RDD and creates a new RDD from the results of execution of the said function.

filter is used when we want to keep only some elements of an existing RDD and create a new RDD out of those elements.

Both of these have similar syntactical structure as shown below:

rdd.map(function)
rdd.filter(function)

Now let's see them in action.

INSTRUCTIONS
  • First, let's see a map transformation. Define an array of 10000 numbers from 1 to 10000 and store it in a variable named arr

    <<your code goes here>> = range(1, 10000)
    
  • Next, convert that array into an RDD named nums

    nums = sc.<<your code goes here>>(arr)
    
  • Now let's define a function multiplyByTwo that takes an element, multiplies it by 2 and returns the result

    def <<your code goes here>>(x):
        return x*2
    
  • Let's check the output of this function by passing the number 5 to it

    multiplyByTwo(<<your code goes here>>)
    
  • Now, let's use map on the RDD nums using this function and store the result in a new RDD named dbls

    <<your code goes here>>= nums.map(multiplyByTwo)
    
  • Let's take a look at the first 5 elements of the resulting RDD

    dbls.take(5)
    
  • Great! Now let's take a look at filter. We will use the nums RDD we created earlier. First, let's define a function isEven that takes an element, and returns the same if its even

    def <<your code goes here>>(x):
        return x%2 == 0
    
  • Let's check this function with the number 41 as an input

    isEven(41)
    
  • Now let's use filter on the nums RDD using this function and save the result in a new RDD named evens

    <<your code goes here>> = nums.filter(isEven)
    
  • Let's take a look at the first 3 elements of the resulting RDD

    evens.take(3)
    
See Answer

No hints are availble for this assesment


Note - Having trouble with the assessment engine? Follow the steps listed here

Loading comments...