Apache Spark - Key Value RDD

6 / 11

Apache Spark - Key Value RDD - ReduceByKey




Not able to play video? Try with vimeo

Code for word count

val lines = sc.textFile("/data/mr/wordcount/big.txt")
val words = lines.flatMap(x => x.split(" "))
val pairs = words.map(s => (s.toLowerCase(), 1))
val counts = pairs.reduceByKey((a, b) => a + b)
counts.take(10)
//counts.saveAsTextFile("word-count-spark");

Code for calculating max temperature

var txtRDD = sc.textFile("/data/spark/temps.csv")

def cleanRecord(line:String) = {
    var arr = line.split(",");
    (arr(1).trim, arr(0).toInt)
}
var recordsRDD = txtRDD.map(cleanRecord)

def max(a:Int, b:Int) = if (b > a) b else a

var result = recordsRDD.reduceByKey(max)
result.collect()

Loading comments...