DataFrames, Spark SQL, R

1 / 18

Spark SQL - Introduction

Spark - Dataframes & Spark SQL (Part1)

Spark SQL is a module of apache spark for handling structured data. With Spark SQL, you can process structured data using the SQL kind of interface. So, if your data can be represented in tabular format or is already located in the structured data sources such as SQL database, you can use SparkSQL for processing it.

Spark SQL provides an API called dataframes API which makes it possible to mix SQL queries, R like dataframe manipulation techniques and usual transformations and actions of an RDD. So, it is very well integrated.

Whether your data is in HDFS, Hive or Relational Databases and whether your data is in AVO, parquet, ORC or JSON format, you can access and process data uniformly.

With spark SQL, you can run your hive queries without any modifications. And you can use your existing BI tools to query big data.

Moreover, you can even join data across different formats and different data sources.

This is how Spark SQL provides uniform data access.


No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

5 Comments

Jupyter username password from clipboard not working

  Upvote    Share

Hi Santanu,

It may not work in some browsers. Could you please click on "eye icon" to see the password and paste it.

  Upvote    Share

In dataframe operation,
df.select($"name",$"age"+1).show()
why $ symbol is used in above command?

  Upvote    Share

19 days passes still no one came forward to answer this question.

  Upvote    Share

-- Please reply above this line --

Hi, Dinesh.

Can you please reply on the comments session down the Lecture or put your queries in our Lab-support https://discuss.cloudxlab.c... [1].

All the best.

Links:
------
[1] https://discuss.cloudxlab.c...

--
Best,
Satyajit Das

  Upvote    Share