DataFrames, Spark SQL, R

6 / 18

Spark SQL - SQL Queries On Dataframes

If you are more comfortable with SQL, you can use SQL for processing a dataframe in the following way.

First, you would need to create a temporary view of this data frame by calling a method createOrReplaceTempView on it. You need to provide a name for the view as an argument. Here are trying to register df dataframe as a view with the name people.

Afterward, you can call sql method on spark session object with an whatever SQL query you want. The data frame will be made available as a table view to your queries. Here we have created the tempview with the name people from df and then used select * from people as SQL.

The result of SQL method is another dataframe on which you can call various dataframe methods.

To see the result you would need to call show() method on the dataframe. You can see that the SQL query has worked successfully.

We had df which was loaded from JSON file. We register df as people. And the created another dataframe sqlDF using sql query and then displayed it using show() method.

We can further register some other dataframe as another view and join it with people view using SQL.