AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP

Can a machine create quiz which is good enough for testing a person’s knowledge of a subject?

So, last Friday, we wrote a program which can create simple ‘Fill in the blank’ type questions based on any valid English text.

This program basically figures out sentences in a text and then for each sentence it would first try to delete a proper noun and if there is no proper noun, it deletes a noun.

We are using textblob which is basically a wrapper over NLTK – The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.

Continue reading “AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP”

Python Setup Using Anaconda For Machine Learning and Data Science Tools

In this post, we will learn how to configure tools required for CloudxLab’s Python for Machine Learning course. We will use Python 3 and Jupyter notebooks for hands-on practicals in the course. Jupyter notebooks provide a really good user interface to write code, equations, and visualizations.

Please choose one of the options listed below for practicals during the course.

Continue reading “Python Setup Using Anaconda For Machine Learning and Data Science Tools”

Predicting Income Level, An Analytics Casestudy in R

1. Introduction

In this data analytics case study, we will use the US census data to build a model to predict if the income of any individual in the US is greater than or less than USD 50000 based on the information available about that individual in the census data.

The dataset used for the analysis is an extraction from the 1994 census data by Barry Becker and donated to the public site http://archive.ics.uci.edu/ml/datasets/Census+Income. This dataset is popularly called the “Adult” data set. The way that we will go about this case study is in the following order:

Describe the data- Specifically the predictor variables (also called independent variables features) from the Census data and the dependent variable which is the level of income (either “greater than USD 50000” or “less than USD 50000”).
Acquire and Read the data- Downloading the data directly from the source and reading it.
Clean the data- Any data from the real world is always messy and noisy. The data needs to be reshaped in order to aid exploration of the data and modeling to predict the income level.
Explore the independent variables of the data- A very crucial step before modeling is the exploration of the independent variables. Exploration provides great insights to an analyst on the predicting power of the variable. An analyst looks at the distribution of the variable, how variable it is to predict the income level, what skews it has, etc. In most analytics project, the analyst goes back to either get more data or better context or clarity from his finding.
Build the prediction model with the training data- Since data like the Census data can have many weak predictors, for this particular case study I have chosen the non-parametric predicting algorithm of Boosting. Boosting is a classification algorithm (here we classify if an individual’s income is “greater than USD 50000” or “less than USD 50000”) that gives the best prediction accuracy for weak predictors. Cross validation, a mechanism to reduce over fitting while modeling, is also used with Boosting.
Validate the prediction model with the testing data- Here the built model is applied on test data that the model has never seen. This is performed to determine the accuracy of the model in the field when it would be deployed. Since this is a case study, only the crucial steps are retained to keep the content concise and readable.

Continue reading “Predicting Income Level, An Analytics Casestudy in R”

CloudxLab Conducts Another Successful Webinar On “Big Data & AI”

Buoyed by the success of our previous webinar and excited by the unending curiosity of our audience, we at CloudxLab decided to conduct another webinar on “Big Data & AI” on 24th August. Mr Sandeep Giri, founder of CloudxLab, was the lead presenter in the webinar. A graduate from IIT Roorkee with more than 15 years of experience in companies such as DE Shaw, Inmobi & Amazon, Sandeep conducted the webinar to the appreciation of all.

Continue reading “CloudxLab Conducts Another Successful Webinar On “Big Data & AI””

Future Of Mobility – Shaped By Big Data & AI

The advancements in the field of Big Data & Artificial Intelligence (AI) are occurring at an unprecedented pace and everyone from researchers to engineers to common folk are wondering how their lives will be affected. While almost all industries are estimating significant disruption from advancements in Big Data & AI, I believe the industry that will actually experience the maximum impact will be the Automotive or Transportation industry. Here is my perspective on how Big Data & AI will change the Automotive & Transportation industry landscape. It should appeal to engineers as well as to common folk interested in technological developments. I will discuss the challenges, existing solutions and will propose two alternative solutions.

Continue reading “Future Of Mobility – Shaped By Big Data & AI”

What, How & Why of Artificial Intelligence

Artificial Intelligence (AI) is the buzzword that is resounding and echoing all over the world. While large corporations, organizations & institutions are publicly proclaiming and publicizing their massive investments toward development and deployment of AI capabilities, people, in general, are feeling perplexed regarding the meaning and nuances of AI. This blog is an attempt to demystify AI and provide a brief introduction to the various aspects of AI to all such persons, engineers, non-engineers & beginners, who are seeking to understand AI. In the forthcoming discussion, we will explore the following questions:

What is AI & what does it seek to accomplish?
How will the goals of AI be accomplished, through which methodologies?
Why is AI gaining so much momentum now?

Continue reading “What, How & Why of Artificial Intelligence”

GraphFrames on CloudxLab

GraphFrames is quite a useful library of spark which helps in bringing Dataframes and GraphX package together.

From the website of Graphframes:

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.
—

You can use graph frames very easily with spark-shell at CloudxLab by using —package option in the following way. Continue reading “GraphFrames on CloudxLab”

CloudxLab Webinar on “Big Data & AI”: An Overwhelming Success

Understanding the current scenario of tremendous interest of students and professionals regarding “Big Data & AI”, CloudxLab conducted a webinar on July 12, 2017 to introduce and explain the many nuances of this upcoming field to all the enthusiasts. Mr Sandeep Giri, founder of CloudxLab with more than 15 years of experience in the industry with companies such as Amazon, InMobi, DE Shaw etc., was the lead presenter in the webinar.

The Scope:

The webinar covered the following:

Overview of Big Data
Big Data Technology Stack
Introduction to Artificial Intelligence
Demo of Machine Learning Example
Artificial Intelligence Technology Stack

Continue reading “CloudxLab Webinar on “Big Data & AI”: An Overwhelming Success”

How CloudxLab Helped Our User With A Job

We recently had a heart warming moment – one of our subscribers had been made an offer by Tata Consultancy Services. His thank you note to us made our day. Meara Laxman had subscribed to CloudxLab to practice his Big Data skills and as per him got more than he expected. Here is our interview with him.

CxL: How did CloudxLab help you learn Big Data tools better?
Laxman: Cloudxlab helped me a lot in learning all the Bigdata eco system components. I had gained enough theoretical knowledge on big data tools from the internet but I ran into trouble trying to practice due to my incompatible system requirements and configurations. That is when I found Cloudxlab and subscribed to it. I got good exposure to the practical aspects as Cloudxlab provided some sample lab session video material which are very clear and easy to practice and understand. Moreover, the Cloudxlab team helped me every time I had an issue and clarified all my queries.

CxL: How did CloudxLab help you with finding a new job?
Laxman: CloudxLab played a key role in getting me my new job. I lacked Continue reading

CloudxLab Blog

AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP

Python Setup Using Anaconda For Machine Learning and Data Science Tools

Top 50 Apache Spark Interview Questions And Answers

Predicting Income Level, An Analytics Casestudy in R

1. Introduction

CloudxLab Conducts Another Successful Webinar On “Big Data & AI”

Future Of Mobility – Shaped By Big Data & AI

What, How & Why of Artificial Intelligence

GraphFrames on CloudxLab

CloudxLab Webinar on “Big Data & AI”: An Overwhelming Success

How CloudxLab Helped Our User With A Job

<img class="aligncenter wp-image-734 size-full" src="https://blog.cloudxlab.com/wp-content/uploads/2017/09/Percentage-of-Income-more-than-50k-Country-wise.png" alt="Percentage of Income more than 50k Country wise" width="614" height="359" />

1. Introduction