AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP

Can a machine create quiz which is good enough for testing a person’s knowledge of a subject?

So, last Friday, we wrote a program which can create simple ‘Fill in the blank’ type questions based on any valid English text.

This program basically figures out sentences in a text and then for each sentence it would first try to delete a proper noun and if there is no proper noun, it deletes a noun.

We are using textblob which is basically a wrapper over NLTK – The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.

# With ! we can run the unix commands from the jupyter notebook
#nltk is a great natual language processing library in Python
!pip install -U nltk

# Lets install textblob
# textblob is a simple wrapper over NLTK
!pip install -U textblob
!python -m textblob.download_corpora

# Import TextBlob module
from textblob import TextBlob

# This is the text that we are going to use. 
# This text is from wikipedia on World War 2 - https://en.wikipedia.org/wiki/World_War_II
# Note: triple quotes are used for defining multi line string
ww2 = '''
World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945, although related conflicts began earlier. It involved the vast majority of the world's countries—including all of the great powers—eventually forming two opposing military alliances: the Allies and the Axis. It was the most widespread war in history, and directly involved more than 100 million people from over 30 countries. In a state of total war, the major participants threw their entire economic, industrial, and scientific capabilities behind the war effort, erasing the distinction between civilian and military resources.

World War II was the deadliest conflict in human history, marked by 50 million to 85 million fatalities, most of which were civilians in the Soviet Union and China. It included massacres, the deliberate genocide of the Holocaust, strategic bombing, starvation, disease and the first use of nuclear weapons in history.[1][2][3][4]

The Empire of Japan aimed to dominate Asia and the Pacific and was already at war with the Republic of China in 1937,[5] but the world war is generally said to have begun on 1 September 1939[6] with the invasion of Poland by Nazi Germany and subsequent declarations of war on Germany by France and the United Kingdom. Supplied by the Soviet Union, from late 1939 to early 1941, in a series of campaigns and treaties, Germany conquered or controlled much of continental Europe, and formed the Axis alliance with Italy and Japan. Under the Molotov–Ribbentrop Pact of August 1939, Germany and the Soviet Union partitioned and annexed territories of their European neighbours, Poland, Finland, Romania and the Baltic states. The war continued primarily between the European Axis powers and the coalition of the United Kingdom and the British Commonwealth, with campaigns including the North Africa and East Africa campaigns, the aerial Battle of Britain, the Blitz bombing campaign, and the Balkan Campaign, as well as the long-running Battle of the Atlantic. On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the major part of the Axis military forces into a war of attrition. In December 1941, Japan attacked the United States and European colonies in the Pacific Ocean, and quickly conquered much of the Western Pacific.
'''
ww2 = unicode(ww2, 'utf-8')

ww2b = TextBlob(ww2)
sposs = {}
for sentence in ww2b.sentences:
    
    # We are going to prepare the dictionary of parts-of-speech as the key and value is a list of words:
    # {part-of-speech: [word1, word2]}
    # We are basically grouping the words based on the parts-of-speech
    
    poss = {}
    sposs[sentence.string] = poss;
    for t in sentence.tags:
        tag = t[1].encode('utf-8')
        if tag not in poss:
            poss[tag] = []
        poss[tag].append(t[0].encode('utf-8'))


import random
import re

# Create the blank in string
def replaceIC(word, sentence):
    insensitive_hippo = re.compile(re.escape(word), re.IGNORECASE)
    return insensitive_hippo.sub('__________________', sentence)

# For a sentence create a blank space.
# It first tries to randomly selection proper-noun 
# and if the proper noun is not found, it selects a noun randomly.
def removeWord(sentence, poss):
    words = None
    if 'NNP' in poss:
        words = poss['NNP']
    elif 'NN' in poss:
        words = poss['NN']
    else:
        print("NN and NNP not found")
        return (None, sentence, None)
    if len(words) > 0:
        word = random.choice(words)
        replaced = replaceIC(word, sentence)
        return (word, sentence, replaced)
    else:
        print("words are empty")
        return (None, sentence, None)

# Iterate over the sentenses 
for sentence in sposs.keys():
    poss = sposs[sentence]
    (word, osentence, replaced) = removeWord(sentence, poss)
    if replaced is None:
        print ("Founded none for ")
        print(sentence)
    else:
        print(replaced)
        print ("\n===============")
        print(word)
        print ("===============")
        print("\n")

The results are as follows:

In __________________ 1941, Japan attacked the United States and European colonies in the Pacific Ocean, and quickly conquered much of the Western Pacific.

===============
December
===============

 

The war continued primarily between the European Axis powers and the coalition of the United Kingdom and the British Commonwealth, with campaigns including the North Africa and East Africa campaigns, the aerial __________________ of Britain, the Blitz bombing campaign, and the Balkan Campaign, as well as the long-running __________________ of the Atlantic.

===============
Battle
===============

 

The __________________ advance halted in 1942 when Japan lost the critical Battle of Midway, and Germany and Italy were defeated in North Africa and then, decisively, at Stalingrad in the Soviet Union.

===============
Axis
===============

 

During 1944 and 1945 the Japanese suffered major reverses in mainland Asia in South Central China and Burma, while the Allies crippled the Japanese __________________ and captured key Western Pacific islands.

===============
Navy
===============

…..

We can further improve this in many ways. Some of these are as follows:

  1. Better selection of the word to be picked as a question.
  2. Conversion into proper question: “Who won the war?” instead of “_____ won the war”
  3. Creating multiple choice questions with good distractions or alternative options.

The Jupyter notebook for this is available in here: https://github.com/cloudxlab/ml/tree/master/projects/autoquiz

If you are interested to work on it further with us, drop an email at reachus@cloudxlab.com.