Spark - Dataframes & Spark SQL (Part1)
Spark SQL is a module of apache spark for handling structured data. With Spark SQL, you can process structured data using the SQL kind of interface. So, if your data can be represented in tabular format or is already located in the structured data sources such as SQL database, you can use SparkSQL for processing it.
Spark SQL provides an API called dataframes API which makes it possible to mix SQL queries, R like dataframe manipulation techniques and usual transformations and actions of an RDD. So, it is very well integrated.
Whether your data is in HDFS, Hive or Relational Databases and whether your data is in AVO, parquet, ORC or JSON format, you can access and process data uniformly.
With spark SQL, you can run your hive queries without any modifications. And you can use your existing BI tools to query big data.
Moreover, you can even join data across different formats and different data sources.
This is how Spark SQL provides uniform data access.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Please login to comment
5 Comments
Jupyter username password from clipboard not working
Upvote ShareHi Santanu,
It may not work in some browsers. Could you please click on "eye icon" to see the password and paste it.
Upvote ShareIn dataframe operation,
Upvote Sharedf.select($"name",$"age"+1).show()
why $ symbol is used in above command?
19 days passes still no one came forward to answer this question.
Upvote Share-- Please reply above this line --
Hi, Dinesh.
Can you please reply on the comments session down the Lecture or put your queries in our Lab-support https://discuss.cloudxlab.c... [1].
All the best.
Links:
------
[1] https://discuss.cloudxlab.c...
--
Upvote ShareBest,
Satyajit Das