Apache Spark - What is not true about map transformations?...

Apache Spark Basics

43 / 89

Previous Index Next

What is not true about map transformations?

The number of elements in resulting RDD will always be same as original
Map transformation can execute any valid code which takes argument same as RDD data type
Map Transformation is executed parallely on each partition against each record
The data type of elements in resulting RDD will always be same as original

Note - Having trouble with the assessment engine? Follow the steps listed here

Previous Index Next

Please login to comment

13 Comments

Manjari Singh

5 years ago

This option should be true:

The number of elements in resulting RDD will always be same as original

As shown by:

var evens = nums.fiter(isEven)

1 Upvote Share

Satyajit Das

5 years ago

Hi,

In the questions it is asked which is not True?

After the transformation, the resultant RDD is always different from its parent RDD as datatype.

It can be smaller (e.g. filter, count, distinct, sample), bigger (e.g. flatMap(), union(), Cartesian()) or the same size (e.g. map).

So, The data type of elements in resulting RDD will always be same as original is the False one.

All the best!

Upvote Share

Devashish Bhattacharjee

4 years ago

Hi,

1. The resulting RDD is created from the same elements as the original RDD. Hence the data type in original and resulting RDD should be same, right ? Hence this statement looks to be true. (last option in above question)

2. The number of elements in resulting RDD can be lesser than the original RDD. So, this statement looks false. (first option in above question).

So, shouldn't the answer to above question be the first option ? Please clarify if I am missing anything.

Upvote Share

Sandeep Giri

4 years ago

> The resulting RDD is created from the same elements as the original RDD.

Yes but the result is a function of what are you doing in transformation.

Check this:

var src_rdd = sc.parallelize(1 to 10)

// The src_rdd is of type integers

var result_rdd = src_rdd.map(x => ":"+ x.toString + ":")

// result_rdd is of type string.

1 Upvote Share

Himadri Pant

4 years ago

I agree, however, since this statement is also `false`:
<i>The number of elements in resulting RDD will always be same as original</i>

sc.parallelize(1 to 10).filter(_ % 5 == 0)

.. therefore, shouldnt' it be an accepted answer as well?

Upvote Share

Sandeep Giri

4 years ago

> The number of elements in resulting RDD will always be same as original

This is true for map.

Upvote Share

This comment has been removed.

Mohit Agarwal

5 years ago

i think i have not come across anything so far on map transformation, please can you give more examples on map tranformation, it's required for deep learning

Upvote Share

CloudxLab

5 years ago

Hello Disqus,

Thanks for contacting CloudxLab!

This automatic reply is just to let you know that we received your message and we’ll get back to you with a response as quickly as possible. During business hours (9am-5pm IST, Monday-Friday) we do our best to reply within a few hours. Evenings and weekends may take us a little bit longer.

If you have a general question about using CloudxLab, you’re welcome to browse our below Knowledge Base for walkthroughs of all of our features and answers to frequently asked questions.

- Tech FAQ <https: cloudxlab.com="" faq="" support="">
- General FAQ <https: cloudxlab.com="" faq=""/>

If you have any additional information that you think will help us to assist you, please feel free to reply to this email. We look forward to chatting soon!

Cheers,
The CloudxLab Team

Upvote Share

Amit Ranjan

5 years ago

Please provide some example to support the answer.

Upvote Share

Ambika Prasad

5 years ago

Can you please give a example for different datatype of resulting RDD element from map transformation

1 Upvote Share

Balamurugan

6 years ago

Hello,

As per first option "The number of elements in resulting RDD will always be same as original" here it is different can you please verify ?

scala> val stringRdd = sc.parallelize(Array("one","two","three","four","five"))
stringRdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[20] at parallelize at <console>:24

scala> stringRdd.count
res9: Long = 5

scala> val unionRdd = stringRdd.union(stringRdd)
unionRdd: org.apache.spark.rdd.RDD[String] = UnionRDD[21] at union at <console>:25

scala> unionRdd.count
res11: Long = 10

Upvote Share

CloudxLab

6 years ago

In question we are talking about map() and in the example you have used union.

Upvote Share

Apache Spark Basics

XP

Please login to comment

13 Comments