hadoop fs -ls /data/msprojects/in_table.csv
$ python
>>> 8303338297.0/128.0/1024.0/1024.0
61.86469120532274
hdfs fsck /data/msprojects/in_table.csv
spark-shell --packages net.sf.opencsv:opencsv:2.3 --master yarn
var myrdd = sc.textFile("/data/msprojects/in_table.csv")
myrdd.partitions.length
So, number of partitions is a function of number of data blocks in case of sc.textFile.
var myrdd = sc.parallelize(1 to 100000)
myrdd.partitions.length
[sandeep@ip-172-31-60-179 ~]$ cat /proc/cpuinfo|grep processor
processor : 0
processor : 1
processor : 2
processor : 3
Since my machine has 4 cores, it has created 4 partitions.
$ spark-shell --master yarn
scala> var myrdd = sc.parallelize(1 to 100000)
scala> myrdd.partitions.length
res6: Int = 2
No hints are availble for this assesment
Answer is not availble for this assesment
Loading comments...