Apache Spark Basics

45 / 69
Apache Spark - Is this a good way to find elements starting with...

Is this a good way to find elements starting with "cloudxlab" inside an rdd having strings as records: myrdd.collect().filter(_.startsWith("cloudxlab"))

  • Yes, it will be very efficient even if the data is big
  • No, The collect() on rdd brings data locally and then filters using local scala library fuction. Collect() will overflow the memory if data is big