A Simple Tutorial on Linux – Part-2

This post is the continuation of A Simple Tutorial on Linux – Part-1

In the Part-1 we learned the following topics on Linux.

  • Linux Operating System
  • Linux Files & Process
  • The Directory Structure
  • Permissions
  • Process

Keeping up the same pace, we will learn the following topics in the 2nd part of the Linux series.

  • Shell Scripting
  • Networking
  • Files & Directories
  • Chaining Unix Commands
  • Pipes
  • Filters
  • Word Count Exercise
  • Special System commands
  • Environment variables

Writing first shell script

A shell script is a file containing a list of commands. Let’s create a simple command that prints two words:

1. Open a text editor to create a file myfirstscript.sh:

nano myfirstscript.sh

2. Write the following into the editor:

#!/bin/bash
name=linux
echo "hello $name world"

Note: In Unix, the extension doesn’t dictate the program to be used while executing a script. It is the first line of the script that would dictate which program to use. In the example above, the program is “/bin/bash” which is a Unix shell.

1. Press Ctrl +x to save and then “y” to exit

2. Now, by default, it would not have executable permission. You can make it executable like this:

chmod +x myfirstscript.sh

3. To run the script, use:

./myfirstscript.sh

Continue reading “A Simple Tutorial on Linux – Part-2”

A Simple Tutorial on Linux – Part-1

We have started this series of tutorials for Linux which is divided into two blog posts. Each one of them will cover basic concepts with practical examples. Also, we have provided the quiz on some of the topics that you can attend for free.

In the first part of the series, we will learn the following topics in detail

  • Linux Operating System
  • Linux Files & Process
  • The Directory Structure
  • Permissions
  • Process

Introduction

Linux is a Unix like operating system. It is open source and free. We might sometimes use the word “Unix” instead of Linux.

A user can interact with Linux either using a ‘graphical interface’ or using the ‘command line interface’.

Learning to use the command line interface has a bigger learning curve than the graphical interface but the former can be used to automate very easily. Also, most of the server side work is generally done using the command line interface.

Linux Operating System

The operating system is made of three parts:

1. The Programs

A user executes programs. AngryBird is a program that gets executed by the kernel, for example. When a program is launched, it creates processes. Program or process will be used interchangeably.

2. The Kernel

The Kernel handles the main work of an operating system:

  • Allocates time & memory to programs
  • Handles File System
  • Responds to various Calls

3. The Shell

A user interacts with the Kernel via the Shell. The console as opened in the previous slide is the shell. A user writes instructions in the shell to execute commands. Shell is also a program that keeps asking you to type the name of other programs to run.

Continue reading “A Simple Tutorial on Linux – Part-1”

A Successful Machine Learning Bootcamp

CloudxLab has hosted several webinars in the past and all of them have been successful. But this time we thought to try something different. So, we all sat together and decided to do an offline meetup for Machine Learning. Though we had done some in the past, the engagement and interaction that one can get in the online webinar are not comparable. Anyhow, we then got in touch with Drupal Bangalore and they were having this event in R. V College of engineering. And one of the topics was Introduction to Machine Learning. We found this a good opportunity to bring our knowledge in the offline circle too.

Machine Learning Bootcamp

So it all happened on Nov 17 where Machine Learning enthusiasts gathered to attend the one day workshop on Machine Learning. The presenter was none other than Mr. Sandeep Giri, who has over 15 years of experience in the domain of Machine learning and Big Data technologies. He has worked in companies like Amazon, InMobi, and D. E. Shaw.

Continue reading “A Successful Machine Learning Bootcamp”

Machine Learning Bootcamp – Introduction and Hands-on @ RV College of Engineering, Bangalore

We have a one-day workshop on Introduction to Machine Learning with Drupal Bangalore. In this workshop, you will learn how to apply various Machine Learning techniques for everyday business problems.

  • Date: Saturday, Dec 16, 2017
  • Place: R. V. College of Engineering, Bangalore
  • Time: 11.30 am – 1.30 pm: Presentation and Demo, 2.30 pm – 4.30 pm: Hands-on

What will be covered?

An exposure to Machine Learning using Python to analyze, draw intelligence and build powerful models using real-world datasets. You’ll also gain the insights to apply data processing and Machine Learning techniques in real time.

After completing this workshop, you will be able to build and optimize your own automated classifier to extract insights from real-world data sets.

Continue reading “Machine Learning Bootcamp – Introduction and Hands-on @ RV College of Engineering, Bangalore”

NumPy and Pandas Tutorial – Data Analysis with Python

Python is increasingly being used as a scientific language. Matrix and vector manipulations are extremely important for scientific computations. Both NumPy and Pandas have emerged to be essential libraries for any scientific computation, including machine learning, in python due to their intuitive syntax and high-performance matrix computation capabilities.

In this post, we will provide an overview of the common functionalities of NumPy and Pandas. We will realize the similarity of these libraries with existing toolboxes in R and MATLAB. This similarity and added flexibility have resulted in wide acceptance of python in the scientific community lately. Topic covered in the blog are:

  1. Overview of NumPy
  2. Overview of Pandas
  3. Using Matplotlib

This post is an excerpt from a live hands-on training conducted by CloudxLab on 25th Nov 2017. It was attended by more than 100 learners around the globe. The participants were from countries namely; United States, Canada, Australia, Indonesia, India, Thailand, Philippines, Malaysia, Macao, Japan, Hong Kong, Singapore, United Kingdom, Saudi Arabia, Nepal, & New Zealand.

Continue reading “NumPy and Pandas Tutorial – Data Analysis with Python”

Introduction to Machine Learning – An Informative Webinar

On November 3, CloudxLab conducted a successful webinar on “Introduction to Machine Learning”.  It was a 3-hour session wherein the instructor shed some light on Machine Learning and its terminologies.

It was attended by more than 200 learners around the globe. The participants were from countries namely; United States, Canada, Australia, Indonesia, India, Thailand, Philippines, Malaysia, Macao, Japan, Hong Kong, Singapore, United Kingdom, Saudi Arabia, Nepal, & New Zealand.

Presented By

Sandeep Giri - Instructor for the Machine Learning webinar

Sandeep Giri

Topics Covered in The Webinar

  • What is Machine Learning?
  • Automating Mario Game
  • The Machine Learning Tsunami
  • Collecting Data
  • Processing Data
  • Spam filter Using Traditional and Machine Learning
  • What is AI?
  • Sub-objectives of AI
  • Different Type of Machine Learning
  • Artifical Neural Network
  • Introduction to Deep Learning
  • TensorFlow Demo
  • Machine Learning Frameworks
  • Deep Learning Frameworks

Continue reading “Introduction to Machine Learning – An Informative Webinar”

What is Big Data? An Easy Introduction to Big Data Terminologies

Unless you’ve been living under the rock, you must have heard or read the term – Big Data. But many people don’t know what Big Data actually means. Even if they do then the definition of the same is not clear to them. If you’re one of them then don’t be disheartened. By the time you complete reading this very article, you will have a clear idea about Big Data and its terminology.

What is Big Data?

In very simple words, Big Data is data of very big size which can not be processed with usual tools like file systems & relational databases. And to process such data we need to have distributed architecture. In other words, we need multiple systems to process the data to achieve a common goal.

Continue reading “What is Big Data? An Easy Introduction to Big Data Terminologies”

AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP

Can a machine create quiz which is good enough for testing a person’s knowledge of a subject?

So, last Friday, we wrote a program which can create simple ‘Fill in the blank’ type questions based on any valid English text.

This program basically figures out sentences in a text and then for each sentence it would first try to delete a proper noun and if there is no proper noun, it deletes a noun.

We are using textblob which is basically a wrapper over NLTK – The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing for English written in the Python programming language.

Continue reading “AutoQuiz: Generating ‘Fill in the Blank’ Type Questions with NLP”

Python Setup Using Anaconda For Machine Learning and Data Science Tools

Python for Machine Learning

In this post, we will learn how to configure tools required for CloudxLab’s Python for Machine Learning course. We will use Python 3 and Jupyter notebooks for hands-on practicals in the course. Jupyter notebooks provide a really good user interface to write code, equations, and visualizations.

Please choose one of the options listed below for practicals during the course.

Continue reading “Python Setup Using Anaconda For Machine Learning and Data Science Tools”

Top 50 Apache Spark Interview Questions And Answers

Here are the top Apache Spark interview questions and answers. There is a massive growth in the big data space, and job opportunities are skyrocketing, making this the perfect time to launch your career in this space.

Our experts have curated these questions to give you an idea of the type of questions which may be asked in an interview. Hope these Apache Spark interview questions and answers guide will help you in getting prepared for your next interview.

Spark Interview Questions
Spark Interview Questions

1. What is Apache Spark and what are the benefits of Spark over MapReduce?

  • Spark is really fast. If run in-memory it is 100x faster than Hadoop MapReduce.
  • In Hadoop MapReduce, you write many MapReduce jobs and then tie these jobs together using Oozie/shell script. This mechanism is very time consuming and MapReduce tasks have heavy latency. Between two consecutive MapReduce jobs, the data has to be written to HDFS and read from HDFS. This is time-consuming. In case of Spark, this is avoided using RDDs and utilizing memory (RAM). And quite often, translating the output of one MapReduce job into the input of another MapReduce job might require writing another code because Oozie may not suffice.
  • In Spark, you can basically do everything from single code or console (PySpark or Scala console) and get the results immediately. Switching between ‘Running something on cluster’ and ‘doing something locally’ is fairly easy and straightforward. This also leads to less context switch of the developer and more productivity.
  • Spark kind of equals to MapReduce and Oozie put together.

Watch this video to learn more about benefits of using Apache Spark over MapReduce.

Continue reading “Top 50 Apache Spark Interview Questions And Answers”