Login using Social Account
     Continue with GoogleLogin using your credentials
Spark - Dataframes & Spark SQL (Part1)
Spark SQL is a module of apache spark for handling structured data. With Spark SQL, you can process structured data using the SQL kind of interface. So, if your data can be represented in tabular format or is already located in the structured data sources such as SQL database, you can use SparkSQL for processing it.
Spark SQL provides an API called dataframes API which makes it possible to mix SQL queries, R like dataframe manipulation techniques and usual transformations and actions of an RDD. So, it is very well integrated.
Whether your data is in HDFS, Hive or Relational Databases and whether your data is in AVO, parquet, ORC or JSON format, you can access and process data uniformly.
With spark SQL, you can run your hive queries without any modifications. And you can use your existing BI tools to query big data.
Moreover, you can even join data across different formats and different data sources.
This is how Spark SQL provides uniform data access.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Loading comments...