Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
Earlier we discussed that we can create the dataframe from a JSON file using spark.read.json function directly. It was easy to create dataframe from JSON file because dataframe needs to know the columns and datatypes of columns and JSON has those details.
What if we want to create Dataframe out of unstructured data? The unstructured data does not have any details.
We would first create RDDs as learned earlier and then convert these RDDs to dataframe. But How?
Spark SQL supports two different methods for converting existing RDDs into dataframes.
The first method uses reflection to infer the schema of an RDD that contains specific types of objects. This reflection-based approach leads to more concise code and works well when you already know the schema while writing your Spark application.
The second method for creating dataframes is through a programmatic interface that allows you to construct a schema and then apply it to an existing RDD. While this method is more verbose, it allows you to construct dataframes when the columns and their types are not known until runtime.
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Answer is not availble for this assesment
Loading comments...