To make spark read and write hive tables, you need to configure Spark. You need to create a conf folder in your home directory.
And place your hadoop configuration files hive-site.xml, core-site.xml and hdfs-site.xml file in conf/ directory.
These configurations are not required on CloudxLab as it is already configured to run.
Start spark shell using the command shown on screen. You first need to import all classes from spark.implicits
If you have a table in Hive with the name a_student, you can simply load that table using spark.sql method using a query like select * from a_student.
Here it has created a dataframe with name df. Now, we can see the data inside using show() method.
Did you notice that the SQL query method is same as that of querying temporary view used earlier? It is very much possible to join a temporary view with a hive table using SQL queries. So, you can register any dataframe as a view and perform joins between data from view and SQL.
If you wish to read data from Hive using spark-submit, you would need to create the spark object using sparksession.builder method and specifying all kinds of configuration settings on this object.
Using spark-sql you can read data from various databases such as MySQL, Oracle or Microsoft SQL using the JDBC connectivity. JDBC stands for Java database connectivity. Using JDBC we would read almost every database from Java code. So, the same JDBC driver, you could read the data from your traditional database to create a dataframe which can be combined with other data.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Please login to comment
22 Comments
Getting following error:
Table or view not found :
screenshot
Upvote ShareThis comment has been removed.
Hi,
As the error suggests, please make sure that the table (or view) named a_student exists, before selecting any records from it.
Thanks.
Upvote Sharehow do i launch spark ? i am getting this error:
screenshot
Upvote ShareHi,
You may use the adjacent Jupyter notebook for this exercise.
Nevertheless, please use
to launch spark-shell in web console.
Thanks.
Upvote Sharecan you pls explain, what are the properties that we need to change in hive-site.xml for hive spark integeration
Upvote Sharepls explain internal data flow between hive and spark
Upvote ShareHi, Bharat.
As mentioned in the lecture you have to create the Spark Session builder object for and give the required configurations settings connecting to various databases.
import org.apache.spark.sql.SparkSession
It is like a entry point to all DB.
Kindly refer for more details about fields and configurations you need to provide :
https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/SparkSession.html
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkSession.html
All the best!
Upvote ShareThis comment has been removed.
Hi
can you please check why am i getting error. I have table named "emp_details" and can access this table.
https://cloudxlab.com/assessment/displayslide/615/spark-sql-using-hive-tables?course_id=73&playlist_id=338
Thanks
Upvote ShareWhat error are you geting?
Upvote Sharein spark-sql we are using "val" while creating data frame where as while we are using with hive table s why are we using "var" while creating a dataframe
Upvote ShareHello Team,
Upvote Sharevar df = spark.sql("select * from a_student")
gives below error
org.apache.spark.sql.AnalysisException: Table or view not found: a_student; line 1 pos 14.
i did get above command. why does it give error?
please explain with solution.
Hi, Dinesh.
The error is self explanatory, can you check the Table inside your database?
Upvote ShareAll the best!
I am trying to follow the example but when I type import spark.implicits._ in the spark shell, I get 25: error: not found: value spark
Upvote ShareStrangely, it worked in spark-shell but it didn't work.
This works:
Upvote Shareval spark2 = spark
import spark2.implicits._
See my comment below in response to Thiago.
Upvote ShareHi Harry,
Upvote ShareI could run this command 'import spark.implicits._' successfully from spark-shell, without any error. Request you to try again.
Thanks
Hello,
But then, how Spark knows that its a hive table?
Upvote ShareOnly from configuration files?
Yes, the hive is discovered using the configuration files hive-site.xml, core-site.xml and hdfs-site.xml in conf/ directory.
If hive support is enabled, the tables are created and searched in Hive.
Upvote ShareHow spark will resolve the conflict in following scenario? Spark will retrieve data from which source?
A temporary view is created and hive table also has identical name.
Upvote ShareSpark will give preference to the register view or registered temp table over the hive views or tables.
If there is a table X already existing in Hive and you register another table with the same name in spark. When you access X in spark it will give the data of the registered dataframe not the hive table. Please note that it is temporary and is only valid in the cusrrent session.
Outside this session, if you try to access X, it will give the Hive data only not spark's.
Upvote Share