Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
To create your custom partitioner you would need to define a function that takes an element and returns a number which is a hashcode.
def TwoPartsPartitioner(el):
if (el[0].upper() > 'J'):
return 1
else:
return 0
Now, let us use this partitioner.
First, we create an RDD x having key-value pairs by using parallelize over an Array. We are passing 3 as the second argument to parallelize. This creates an RDD x with three partitions.
x = sc.parallelize([("sandeep",1),("giri",1),("abhishek",1),("sravani",1),("jude",1)], 3)
Let us convert the entire RDD into array of arrays such that each partition becomes an array element. We can use collect() to get and print the whole array of arrays. This helps us understand the partitions very clearly.
x.glom().collect()
Next, we run the partitionBy
method on our RDD with partitioner
function we just created. We assign the result to the new RDD y.
y = x.partitionBy(2, partitionFunc=TwoPartsPartitioner)
Let’s check using glom how is the structure of our new RDD y.
y.glom().collect()
It should display something like this:
[[('giri', 1), ('abhishek', 1), ('jude', 1)], [('sandeep', 1), ('sravani', 1)]]
You can see that this new RDD has two partitions. The first partition contains all the keys starting with the alphabet a to j. And the second partition has the keys whose first character is after j. You can see that we were able to successfully use our newly created partitioner.
*Customer Partitioner - Scala Video *
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Loading comments...