Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
hadoop fs -ls /data/msprojects/in_table.csv
$ python
>>> 8303338297.0/128.0/1024.0/1024.0
61.86469120532274
hdfs fsck /data/msprojects/in_table.csv
spark-shell --packages net.sf.opencsv:opencsv:2.3 --master yarn
var myrdd = sc.textFile("/data/msprojects/in_table.csv")
myrdd.partitions.length
So, number of partitions is a function of number of data blocks in case of sc.textFile.
var myrdd = sc.parallelize(1 to 100000)
myrdd.partitions.length
[sandeep@ip-172-31-60-179 ~]$ cat /proc/cpuinfo|grep processor
processor : 0
processor : 1
processor : 2
processor : 3
Since my machine has 4 cores, it has created 4 partitions.
$ spark-shell --master yarn
scala> var myrdd = sc.parallelize(1 to 100000)
scala> myrdd.partitions.length
res6: Int = 2
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Loading comments...