{"id":946,"date":"2017-11-29T11:28:22","date_gmt":"2017-11-29T11:28:22","guid":{"rendered":"http:\/\/blog.cloudxlab.com\/?p=946"},"modified":"2019-01-08T12:46:53","modified_gmt":"2019-01-08T12:46:53","slug":"generating-fill-blanks-nlp","status":"publish","type":"post","link":"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/","title":{"rendered":"AutoQuiz: Generating &#8216;Fill in the Blank&#8217; Type Questions with NLP"},"content":{"rendered":"<p>Can a machine create quiz which is good enough for testing a person&#8217;s knowledge of a subject?<\/p>\n<p>So, last Friday, we wrote a program which can create simple &#8216;Fill in the blank&#8217; type questions based on any valid English text.<\/p>\n<p>This program basically figures out sentences in a text and then for each sentence it would first try to delete a proper noun and if there is no proper noun, it deletes a noun.<\/p>\n<p>We are using textblob which is basically a wrapper over NLTK &#8211; The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.<\/p>\n<p><!--more--><\/p>\n<pre class=\"lang:default decode:true\"># With ! we can run the unix commands from the jupyter notebook\r\n#nltk is a great natual language processing library in Python\r\n!pip install -U nltk\r\n\r\n# Lets install textblob\r\n# textblob is a simple wrapper over NLTK\r\n!pip install -U textblob\r\n!python -m textblob.download_corpora\r\n\r\n# Import TextBlob module\r\nfrom textblob import TextBlob\r\n\r\n# This is the text that we are going to use. \r\n# This text is from wikipedia on World War 2 - https:\/\/en.wikipedia.org\/wiki\/World_War_II\r\n# Note: triple quotes are used for defining multi line string\r\nww2 = '''\r\nWorld War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945, although related conflicts began earlier. It involved the vast majority of the world's countries\u2014including all of the great powers\u2014eventually forming two opposing military alliances: the Allies and the Axis. It was the most widespread war in history, and directly involved more than 100 million people from over 30 countries. In a state of total war, the major participants threw their entire economic, industrial, and scientific capabilities behind the war effort, erasing the distinction between civilian and military resources.\r\n\r\nWorld War II was the deadliest conflict in human history, marked by 50 million to 85 million fatalities, most of which were civilians in the Soviet Union and China. It included massacres, the deliberate genocide of the Holocaust, strategic bombing, starvation, disease and the first use of nuclear weapons in history.[1][2][3][4]\r\n\r\nThe Empire of Japan aimed to dominate Asia and the Pacific and was already at war with the Republic of China in 1937,[5] but the world war is generally said to have begun on 1 September 1939[6] with the invasion of Poland by Nazi Germany and subsequent declarations of war on Germany by France and the United Kingdom. Supplied by the Soviet Union, from late 1939 to early 1941, in a series of campaigns and treaties, Germany conquered or controlled much of continental Europe, and formed the Axis alliance with Italy and Japan. Under the Molotov\u2013Ribbentrop Pact of August 1939, Germany and the Soviet Union partitioned and annexed territories of their European neighbours, Poland, Finland, Romania and the Baltic states. The war continued primarily between the European Axis powers and the coalition of the United Kingdom and the British Commonwealth, with campaigns including the North Africa and East Africa campaigns, the aerial Battle of Britain, the Blitz bombing campaign, and the Balkan Campaign, as well as the long-running Battle of the Atlantic. On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the major part of the Axis military forces into a war of attrition. In December 1941, Japan attacked the United States and European colonies in the Pacific Ocean, and quickly conquered much of the Western Pacific.\r\n'''\r\nww2 = unicode(ww2, 'utf-8')\r\n\r\nww2b = TextBlob(ww2)\r\nsposs = {}\r\nfor sentence in ww2b.sentences:\r\n    \r\n    # We are going to prepare the dictionary of parts-of-speech as the key and value is a list of words:\r\n    # {part-of-speech: [word1, word2]}\r\n    # We are basically grouping the words based on the parts-of-speech\r\n    \r\n    poss = {}\r\n    sposs[sentence.string] = poss;\r\n    for t in sentence.tags:\r\n        tag = t[1].encode('utf-8')\r\n        if tag not in poss:\r\n            poss[tag] = []\r\n        poss[tag].append(t[0].encode('utf-8'))\r\n\r\n\r\nimport random\r\nimport re\r\n\r\n# Create the blank in string\r\ndef replaceIC(word, sentence):\r\n    insensitive_hippo = re.compile(re.escape(word), re.IGNORECASE)\r\n    return insensitive_hippo.sub('__________________', sentence)\r\n\r\n# For a sentence create a blank space.\r\n# It first tries to randomly selection proper-noun \r\n# and if the proper noun is not found, it selects a noun randomly.\r\ndef removeWord(sentence, poss):\r\n    words = None\r\n    if 'NNP' in poss:\r\n        words = poss['NNP']\r\n    elif 'NN' in poss:\r\n        words = poss['NN']\r\n    else:\r\n        print(\"NN and NNP not found\")\r\n        return (None, sentence, None)\r\n    if len(words) &gt; 0:\r\n        word = random.choice(words)\r\n        replaced = replaceIC(word, sentence)\r\n        return (word, sentence, replaced)\r\n    else:\r\n        print(\"words are empty\")\r\n        return (None, sentence, None)\r\n\r\n# Iterate over the sentenses \r\nfor sentence in sposs.keys():\r\n    poss = sposs[sentence]\r\n    (word, osentence, replaced) = removeWord(sentence, poss)\r\n    if replaced is None:\r\n        print (\"Founded none for \")\r\n        print(sentence)\r\n    else:\r\n        print(replaced)\r\n        print (\"\\n===============\")\r\n        print(word)\r\n        print (\"===============\")\r\n        print(\"\\n\")\r\n\r\n<\/pre>\n<p>The results are as follows:<\/p>\n<p>In __________________ 1941, Japan attacked the United States and European colonies in the Pacific Ocean, and quickly conquered much of the Western Pacific.<\/p>\n<p>===============<br \/>\nDecember<br \/>\n===============<\/p>\n<p>&nbsp;<\/p>\n<p>The war continued primarily between the European Axis powers and the coalition of the United Kingdom and the British Commonwealth, with campaigns including the North Africa and East Africa campaigns, the aerial __________________ of Britain, the Blitz bombing campaign, and the Balkan Campaign, as well as the long-running __________________ of the Atlantic.<\/p>\n<p>===============<br \/>\nBattle<br \/>\n===============<\/p>\n<p>&nbsp;<\/p>\n<p>The __________________ advance halted in 1942 when Japan lost the critical Battle of Midway, and Germany and Italy were defeated in North Africa and then, decisively, at Stalingrad in the Soviet Union.<\/p>\n<p>===============<br \/>\nAxis<br \/>\n===============<\/p>\n<p>&nbsp;<\/p>\n<p>During 1944 and 1945 the Japanese suffered major reverses in mainland Asia in South Central China and Burma, while the Allies crippled the Japanese __________________ and captured key Western Pacific islands.<\/p>\n<p>===============<br \/>\nNavy<br \/>\n===============<\/p>\n<p>&#8230;..<\/p>\n<p>We can further improve this in many ways. Some of these are as follows:<\/p>\n<ol>\n<li>Better selection of the word to be picked as a question.<\/li>\n<li>Conversion into proper question: &#8220;Who won the war?&#8221; instead of &#8220;_____ won the war&#8221;<\/li>\n<li>Creating multiple choice questions with\u00a0good distractions or alternative options.<\/li>\n<\/ol>\n<p>The Jupyter\u00a0notebook for this is available in here:\u00a0<a href=\"https:\/\/github.com\/cloudxlab\/ml\/tree\/master\/projects\/autoquiz\">https:\/\/github.com\/cloudxlab\/ml\/tree\/master\/projects\/autoquiz<\/a><\/p>\n<p>If you are interested to work on it further with us, drop an email at reachus@cloudxlab.com.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Can a machine create quiz which is good enough for testing a person&#8217;s knowledge of a subject? So, last Friday, we wrote a program which can create simple &#8216;Fill in the blank&#8217; type questions based on any valid English text. This program basically figures out sentences in a text and then for each sentence it &hellip; <a href=\"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;AutoQuiz: Generating &#8216;Fill in the Blank&#8217; Type Questions with NLP&#8221;<\/span><\/a><\/p>\n","protected":false},"author":14,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[28,13,14],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v16.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>AutoQuiz - Generate questions automatically with NLP | CloudxLab Blog<\/title>\n<meta name=\"description\" content=\"Can a machine auto-generate quizzes? In this blog post, we will learn how to auto-generate fill in the blank type questions with NLP.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AutoQuiz - Generate questions automatically with NLP | CloudxLab Blog\" \/>\n<meta property=\"og:description\" content=\"Can a machine auto-generate quizzes? In this blog post, we will learn how to auto-generate fill in the blank type questions with NLP.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/\" \/>\n<meta property=\"og:site_name\" content=\"CloudxLab Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cloudxlab\" \/>\n<meta property=\"article:published_time\" content=\"2017-11-29T11:28:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-01-08T12:46:53+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:site\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"5 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"CloudxLab Blog\",\"description\":\"Learn AI, Machine Learning, Deep Learning, Devops &amp; Big Data\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/cloudxlab.com\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/#webpage\",\"url\":\"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/\",\"name\":\"AutoQuiz - Generate questions automatically with NLP | CloudxLab Blog\",\"isPartOf\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\"},\"datePublished\":\"2017-11-29T11:28:22+00:00\",\"dateModified\":\"2019-01-08T12:46:53+00:00\",\"author\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/4835f1b3d5000626cb15e9311d748e09\"},\"description\":\"Can a machine auto-generate quizzes? In this blog post, we will learn how to auto-generate fill in the blank type questions with NLP.\",\"breadcrumb\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/generating-fill-blanks-nlp\/#webpage\"}}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/4835f1b3d5000626cb15e9311d748e09\",\"name\":\"Sandeep Giri\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1393214840cf7455bb4cba055cb30468?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1393214840cf7455bb4cba055cb30468?s=96&d=mm&r=g\",\"caption\":\"Sandeep Giri\"},\"sameAs\":[\"https:\/\/cloudxlab.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/946"}],"collection":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/comments?post=946"}],"version-history":[{"count":6,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/946\/revisions"}],"predecessor-version":[{"id":1076,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/946\/revisions\/1076"}],"wp:attachment":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media?parent=946"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/categories?post=946"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/tags?post=946"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}