How to create an Apache Thrift Service – Tutorial

Overview

Say you come up with a wonderful idea such as a really great phone service. You would want this phone service to be available to the APIs in various languages. Whether people are using Python, C++, Java or any other programming language, the users should be able to use your service. Also, you would want the users to be able to access globally. In such scenarios, you should create the Thrift Service. Thrift lets you create a generic interface which can be implemented on the server. The clients of this generic interface can be automatically generated in all kinds of languages.

Let us get started! Here we are going to create a very simple service that just prints the server time.

Step 0: Install Thrift

This step is not required if you are using CloudxLab. You can just log in to the web console. In case you want to set up Apache Thrift on your own machine please follow these instructions: https://thrift.apache.org/docs/install/

Step 1: Create the interface definition

Let us create a file with name Example.thrift with the following code in it:

Step 2: Generate the server and client side code in python

At this point, you will observe a folder with the name “gen-py” created in your current directory. Inside gen-py, you would notice that a folder with name “Example” has been created with all sorts of code.

Step3: Create Server

First, create a directory with the name “server” and go into that directory:

Inside this folder create a file with the name PythonServer.py and the following contents:

Notice that we are implementing service by the way of the class ExampleHandler.

Step 4: Now start the server:

Step 5: Create a client

Let the server run and open a new terminal. In the new terminal follow the instructions from here onwards. Create a folder with the name “client”. Inside that folder create a file with name PythonClient.py and the following code:

Step 6: Run the client

It should print the current time such as:

1549721664.93

It is an extremely simple example. You can extend it to add more functions and objects.

The code for the whole project is available here: https://github.com/cloudxlab/thrift-examples

Use-cases of Machine Learning in E-Commerce

What computing did to the usual industry earlier, Machine Learning is doing the same to usual rule-based computing now. It is eating the market of the same. Earlier, in organizations, there used to be separate groups for Image Processing, Audio Processing, Analytics and Predictions. Now, these groups are merged because machine learning is basically overlapping with every domain of computing. Let us discuss how machine learning is impacting e-commerce in particular.

The first use case of Machine Learning that became really popular was Amazon Recommendations. Afterwards, the Netflix launched a challenge of Movie Recommendations which gave birth to Kaggle, now an online platform of various machine learning challenges.

Before I dive deep into the details further, lets quickly brief the terms that are found often confusing. AI stands for Artificial Intelligence which means being able to display human-like intelligence. AI is basically an objective. Machine learning is making computers learn based on historical or empirical data instead of explicitly writing the rules. Artificial Neural networks are the computing constructs designed on a similar structure like the animal brain. Deep Learning is a branch of machine learning where we use a complex Artificial Neural network for predictions.

Continue reading “Use-cases of Machine Learning in E-Commerce”

What are the pre-requisites to learn big data?

Pre-requisites for Big Data Hadoop

We, at CloudxLab, keep getting a lot of questions online, sometimes offline, asking us

“I want to learn big data. But, just don’t know whether I am eligible or not.”

“I am so and so, can I learn big data?”

We have compiled the most common questions here. And, we will answer each one of them.

So, here we go.

What are those questions?

  1. I am from a non-technical background. Can I learn big data?
  2. Do I need to know programming languages such as Java, Python, PHP, etc.?
  3. Or, since it is big data, do I need to know any other relational databases such as Oracle or in general do I need to be well versed with SQL?
  4. And also, do I need to know the Unix or Linux?

Continue reading “What are the pre-requisites to learn big data?”

Top Machine Learning Interview Questions for 2018 (Part-1)

 

These Machine Learning Interview Questions, are the real questions that are asked in the top interviews.

For hiring machine learning engineers or data scientists, the typical process has multiple rounds.

  1. A basic screening round – The objective is to check the minimum fitness in this round.
  2. Algorithm Design Round – Some companies have this round but most don’t. This involves checking the coding / algorithmic skills of the interviewee.
  3. ML Case Study – In this round, you are given a case study problem of machine learning on the lines of Kaggle. You have to solve it in an hour.
  4. Bar Raiser / Hiring Manager  – This interview is generally with the most senior person in the team or a very senior person from another team (at Amazon it is called Bar raiser round) who will check if the candidate fits in the company-wide technical capabilities. This is generally the last round.

Continue reading “Top Machine Learning Interview Questions for 2018 (Part-1)”

Financial Aid, Scholarship Test & Free Resources

Financial Aid

At CloudxLab, we have always believed in quality education must be affordable for everyone so that we can help learners achieving career goals and build innovative products.

If you can’t afford to pay for a course, you can apply for financial aid using this form. Learners with Financial Aid in a course will be able to access all of the course content and complete all work required to earn a certificate. Financial Aid only applies to the course that the Financial Aid application was approved for. Most courses offer Financial Aid, but Financial Aid may not be available for certain courses. It will take a minimum of 7 days for us to review your financial aid application. When your application is reviewed, you’ll get an email letting you know whether it’s been approved or denied.

Continue reading “Financial Aid, Scholarship Test & Free Resources”

Phrase matching using Apache Spark

Recently, a friend whose company is working on large scale project reached out to us to seek a solution to a simple problem of finding a list of phrases (approximately 80,000) in a huge set of rich text documents (approx 6 million).

The problem at first looked simple. The way engineers had solved it is by simply loading the two documents in Apache Spark’s DataFrame and joining those using “like”. Something on these lines:

select phrase.id, docs.id from phrases, docs where docs.txt like ‘%’ + phrases.phrase + ‘%’

But it was taking huge time even on the small subset of the data and processing is done in distributed fashion. Any Guesses, why?

They had also tried to use Apache Spark’s broadcast mechanism on the smaller dataset but still, it was taking a long while finishing even a small task.

Continue reading “Phrase matching using Apache Spark”

Scholarship Test for Machine Learning Course

After receiving a huge response in our last scholarship test, we are once again back with a basic conceptual test to attain scholarship for our upcoming Specialization course on Machine Learning and Deep Learning.

Concepts to be tested: Linear algebra, probability theory, statistics, multivariable calculus, algorithms and complexity, aptitude and Data Interpretation.

  • Date and Time: September 2, 2018, 8:00 am PDT (8:30 pm IST)
  • Type: objective (MCQ)
  • Number of questions: 25
  • Duration: 90 minutes
  • Mode: Online

If you have a good aptitude and general problem-solving skills, this test is for you. So, go ahead and earn what you deserve.

If you have any questions on the test or if anything else comes up, just click here to let us know. We’re always happy to help.

 

How to Teach Online Effectively

 

I founded KnowBigData.com in 2014 after working in Amazon. Teaching is my passion, and technology, specifically large-scale computing my forte, thanks to my working experience with Amazon, InMobi, D. E. Shaw and my own startup tBits Global. Therefore, I wanted to help people learn technology online. I launched KnowBigData.com, an online instructor-led training on MongoDB followed by Big Data and Machine learning. Eventually, we innovated a lot in learning and shaped KnowBigData into Cloudxlab.com which is currently a major gamified learning environment for Machine Learning, AI, and Big Data.

Continue reading “How to Teach Online Effectively”

How To Optimise A Neural Network?

When we are solving an industry problem involving neural networks, very often we end up with bad performance. Here are some suggestions on what should be done in order to improve the performance.

Is your model underfitting or overfitting?

You must break down the input data set into two parts – training and test. The general practice is to have 80% for training and 20% for testing.

You should train your neural network with the training set and test with the testing set. This sounds like common sense but we often skip it.

Compare the performance (MSE in case of regression and accuracy/f1/recall/precision in case of classification) of your model with the training set and with the test set.

If it is performing badly for both test and training it is underfitting and if it is performing great for the training set but not test set, it is overfitting.

Continue reading “How To Optimise A Neural Network?”

10 Things to Look for When Choosing a Big Data course / Institute

Every now and then, I keep seeing a new company coming up with Hadoop classes/courses. Also, my friends keep asking me which of these courses is good to take. I gave them a few tips to choose the best course suitable for them. Here are the few tips to decide which course you should attend to:

1. Does the instructor have domain expertise?

Know your instructor. You must know about the instructor’s background. Has (s)he done any big data related work? I have seen a lot of instructors who just attend a course somewhere and become instructors.

If the instructor never worked in the domain, do not take such classes. Also, avoid training institutes that do not tell you details about the instructor.

2. Is the instructor hands on? When did she/he code last time?

In the domain of technology, there is a humongous difference between one instructor who is hands-on in coding and another who is delivering based on theoretical knowledge. Also, know when the instructor worked on codes the last time. If instructor never coded, do not attend the class.

3. Does the instructor encourage & answer your questions?

There are many recorded free videos available across the internet. The only reason you would go for live classes would be to get your questions answered and doubts cleared immediately.

If the instructor does not encourage questions and answers, such classes are fairly useless.

Continue reading “10 Things to Look for When Choosing a Big Data course / Institute”