DataFrames, Spark SQL, R

14 / 18

Spark SQL - Using Hive tables

To make spark read and write hive tables, you need to configure Spark. You need to create a conf folder in your home directory.

And place your hadoop configuration files hive-site.xml, core-site.xml and hdfs-site.xml file in conf/ directory.

These configurations are not required on CloudxLab as it is already configured to run.

Start spark shell using the command shown on screen. You first need to import all classes from spark.implicits

If you have a table in Hive with the name a_student, you can simply load that table using spark.sql method using a query like select * from a_student.

Here it has created a dataframe with name df. Now, we can see the data inside using show() method.

Did you notice that the SQL query method is same as that of querying temporary view used earlier? It is very much possible to join a temporary view with a hive table using SQL queries. So, you can register any dataframe as a view and perform joins between data from view and SQL.

If you wish to read data from Hive using spark-submit, you would need to create the spark object using sparksession.builder method and specifying all kinds of configuration settings on this object.

Using spark-sql you can read data from various databases such as MySQL, Oracle or Microsoft SQL using the JDBC connectivity. JDBC stands for Java database connectivity. Using JDBC we would read almost every database from Java code. So, the same JDBC driver, you could read the data from your traditional database to create a dataframe which can be combined with other data.

Spark - Dataframes & Spark SQL (Part2)


No hints are availble for this assesment

Answer is not availble for this assesment

Please login to comment

22 Comments

Getting following error:

Table or view not found :

screenshot

  Upvote    Share

This comment has been removed.

Hi,

As the error suggests, please make sure that the table (or view) named a_student exists, before selecting any records from it. 

Thanks.

  Upvote    Share

how do i launch spark ? i am getting this error:

screenshot

  Upvote    Share

Hi,

You may use the adjacent Jupyter notebook for this exercise.

Nevertheless, please use 

spark-shell

to launch spark-shell in web console.

Thanks.

  Upvote    Share

can you pls explain, what are the properties that we need to change in hive-site.xml for hive spark integeration 

  Upvote    Share

pls explain internal data flow between hive and spark

  Upvote    Share

Hi, Bharat. 

As mentioned in the lecture you have to create the Spark Session builder object for and give the required configurations settings connecting to various databases. 

import org.apache.spark.sql.SparkSession 

It is like a entry point to all DB. 

Kindly refer for more details about fields and configurations you need to provide :

https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/SparkSession.html

https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkSession.html

All the best!

  Upvote    Share

This comment has been removed.

Hi

can you please check why am i getting error. I have table named "emp_details" and can access this table.

 

https://cloudxlab.com/assessment/displayslide/615/spark-sql-using-hive-tables?course_id=73&playlist_id=338

 

Thanks

  Upvote    Share

What error are you geting?

  Upvote    Share

in spark-sql we are using "val" while creating data frame where as while we are using with hive table s why are we using "var" while creating a dataframe

  Upvote    Share

Hello Team,
var df = spark.sql("select * from a_student")
gives below error
org.apache.spark.sql.AnalysisException: Table or view not found: a_student; line 1 pos 14.
i did get above command. why does it give error?
please explain with solution.

  Upvote    Share

Hi, Dinesh.

The error is self explanatory, can you check the Table inside your database?
All the best!

  Upvote    Share

I am trying to follow the example but when I type import spark.implicits._ in the spark shell, I get 25: error: not found: value spark

  Upvote    Share

Strangely, it worked in spark-shell but it didn't work.

This works:
val spark2 = spark
import spark2.implicits._

  Upvote    Share

See my comment below in response to Thiago.

  Upvote    Share

Hi Harry,
I could run this command 'import spark.implicits._' successfully from spark-shell, without any error. Request you to try again.
Thanks

  Upvote    Share

Hello,

But then, how Spark knows that its a hive table?
Only from configuration files?

  Upvote    Share

Yes, the hive is discovered using the configuration files hive-site.xml, core-site.xml and hdfs-site.xml in conf/ directory.

If hive support is enabled, the tables are created and searched in Hive.

  Upvote    Share

 How spark will resolve the conflict in following scenario? Spark will retrieve data from which source?

A temporary view is created and hive table also has identical name.

  Upvote    Share

Spark will give preference to the register view or registered temp table over the hive views or tables.

If there is a table X already existing in Hive and you register another table with the same name in spark. When you access X in spark it will give the data of the registered dataframe not the hive table. Please note that it is temporary and is only valid in the cusrrent session. 

Outside this session, if you try to access X, it will give the Hive data only not spark's.

  Upvote    Share