#NoPayJan Offer - Access all CloudxLab Courses for free between 1st to 31st JanEnroll Now >>
To load XML, we generally use spark-xml package. The spark-xml package is available in the SBT repository. It can be automatically downloaded by specifying dependency inside build.sbt while using spark-submit. Or it can be loaded in spark-shell by the way of --packages argument.
Lets launch the spark-shell with the --packages com.databricks:spark-xml_2.10:0.4.1
It might take a while to launch for the first time because it is going to download the packages from sbt repository.
Now, we can also use the spark.read.format object with xml as an argument and then specifying the columns using a method .option and then load the data from the HDFS.
We can also use the fully qualified name of format as com.databricks.spark.xml instead of simply xml.
Finally, we can take a look at data of dataframe using show() method. You can see that it in this dataframe every row is a book and the columns if the book is id, author, descriptions etc.
So, the spark-xml by default expects the top level to have the records and reach record to have the attributes which become columns.
Let's try to understand what does it mean by remote process call. Imagine that there is a phonebook service which stores your phonebook or contact list. The user accesses this phone book in order to look up a phone number of someone, update the number or download the entire phonebook. The users can either use a browser or user can use a mobile app which internally will call the service. The user can also create a bot or an automated script to query the server. So, the service could be accessed by bot, browser or mobile app. The access to the server is called Remote Process Call.
In the example diagram, getPhoneBook method being called and it is returning a complex object having an array of the phone number. Here the returned value is in the form of JSON format. There are many kinds of formats designed for such communication such as protocol buffers and AVRO.
No hints are availble for this assesment
Answer is not availble for this assesment