AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP

Can a machine create quiz which is good enough for testing a person’s knowledge of a subject?

So, last Friday, we wrote a program which can create simple ‘Fill in the blank’ type questions based on any valid English text.

This program basically figures out sentences in a text and then for each sentence it would first try to delete a proper noun and if there is no proper noun, it deletes a noun.

We are using textblob which is basically a wrapper over NLTK – The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.

Continue reading “AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP”

Top 50 Apache Spark Interview Questions And Answers

Here are the top Apache Spark interview questions and answers. There is a massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

Our experts have curated these questions to give you an idea of the type of questions which may be asked in an interview. Hope these Apache Spark interview questions and answers guide will help you in getting prepared for your next interview.

Spark Interview Questions
Spark Interview Questions

1. What is Apache Spark and what are the benefits of Spark over MapReduce?

  • Spark is really fast. If run in-memory it is 100x faster than Hadoop MapReduce.
  • In Hadoop MapReduce, you write many MapReduce jobs and then tie these jobs together using Oozie/shell script. This mechanism is very time consuming and MapReduce tasks have heavy latency. Between two consecutive MapReduce jobs, the data has to be written to HDFS and read from HDFS. This is time-consuming. In case of Spark, this is avoided using RDDs and utilizing memory (RAM). And quite often, translating the output of one MapReduce job into the input of another MapReduce job might require writing another code because Oozie may not suffice.
  • In Spark, you can basically do everything from single code or console (PySpark or Scala console) and get the results immediately. Switching between ‘Running something on cluster’ and ‘doing something locally’ is fairly easy and straightforward. This also leads to less context switch of the developer and more productivity.
  • Spark kind of equals to MapReduce and Oozie put together.

Watch this video to learn more about benefits of using Apache Spark over MapReduce.

Continue reading “Top 50 Apache Spark Interview Questions And Answers”

GraphFrames on CloudxLab

GraphFrames is quite a useful library of spark which helps in bringing Dataframes and GraphX package together.

From the website of Graphframes:

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

You can use graph frames very easily with spark-shell at CloudxLab by using —package option in the following way. Continue reading “GraphFrames on CloudxLab”

Using TensorFlow on CloudxLab

We are glad to inform you that the TensorFlow is now available on CloudxLab. In this example, we will walk you through a basic tutorial on how to use TensorFlow.

What is TensorFlow?
TensorFlow is an Open Source Software Library for Machine Intelligence. It is developed and supported by Google and is being adopted very fast.

What is CloudxLab?
CloudxLab provides a real cloud-based environment for practicing and learn various tools. You can start learning right away by just signing up online.

Continue reading “Using TensorFlow on CloudxLab”