How to install Python packages on CloudxLab?

In this blog post, we will learn how to install Python packages on CloudxLab.

Step 1-

Create the virtual environment for your project. A virtual environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them. Login to CloudxLab web console and create a virtual environment for your project.

Continue reading “How to install Python packages on CloudxLab?”

CloudxLab Reviews

Jose

Jose Manual Ramirez Leon

It is really a great site. As a 37-year-old with a masters
in mechanical engineering, I decided to switch careers
and get another masters. One of my courses was
Big Data and, at the beginning, I was completely lost
& I was falling behind in my assignments and after
searching the internet for a solution, finally found  CloudxLab.

Not only do they have any conceivable Big Data
technology on their servers, they have superb
customer support. Whenever I have had a doubt,
even in debugging my own programs, they have
answered me with the correct solution in a few hours.

I earnestly recommend it to everyone.

Continue reading “CloudxLab Reviews”

Building Real-Time Analytics Dashboard Using Apache Spark

Apache Spark

 

In this blog post, we will learn how to build a real-time analytics dashboard using Apache Spark streaming, Kafka, Node.js, Socket.IO and Highcharts.

Problem Statement

An e-commerce portal (http://www.aaaa.com) wants to build a real-time analytics dashboard to visualize the number of orders getting shipped every minute to improve the performance of their logistics.

Solution

Before working on the solution, let’s take a quick look at all the tools we will be using:

Apache Spark – A fast and general engine for large-scale data processing. It is 100 times faster than Hadoop MapReduce in memory and 10x faster on disk. Learn more about Apache Spark here

Python – Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Learn more about Python here

Kafka – A high-throughput, distributed, publish-subscribe messaging system. Learn more about Kafka here

Node.js – Event-driven I/O server-side JavaScript environment based on V8. Learn more about Node.js here

Socket.io – Socket.IO is a JavaScript library for real-time web applications. It enables real-time, bi-directional communication between web clients and servers. Read more about Socket.io here

Highcharts – Interactive JavaScript charts for web pages. Read more about Highcharts here

CloudxLab – Provides a real cloud-based environment for practicing and learn various tools. You can start practicing right away by just signing up online.

How To Build A Data Pipeline?

Below is the high-level architecture of the data pipeline

Data Pipeline
Data Pipeline

Our real-time analytics dashboard will look like this

Real-Time Analytics Dashboard
Real-Time Analytics Dashboard

Continue reading “Building Real-Time Analytics Dashboard Using Apache Spark”

Running PySpark in Jupyter / IPython notebook

We are glad to inform that now you can run PySpark code in Jupyter notebook on CloudxLab.

What is Jupyter notebook?

The IPython Notebook is now known as the Jupyter Notebook. It is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media. For more details on the Jupyter Notebook, please see the Jupyter website.

Please follow below steps to access the Jupyter notebook on CloudxLab

Step 1 – Login to web console

Continue reading “Running PySpark in Jupyter / IPython notebook”

Access S3 Files in Spark

In this blog post we will learn how to access S3 Files using Spark on CloudxLab.
Please follow below steps to access S3 files:

Access Spark 1.2.1, Spark 1.6 and Spark 2.0 on CloudxLab

In this blog post we will learn how to access various versions of Spark on CloudxLab. Spark 1.2.1 will be helpful if you are preparing for CCA (Cloudera Certified Associate). Spark 1.6 will be useful for practicing SparkR. Please note that Spark 1.2.1, Spark 1.6 and Spark 2.0.1 may not integrate tightly with Hadoop, but you will be able to run most of the commands.

How to access Spark 1.2.1?

Continue reading “Access Spark 1.2.1, Spark 1.6 and Spark 2.0 on CloudxLab”

CloudxLab Getting Started Guide

Please use below resources to make most out of your CloudxLab Subscription

CloudxLab hands-on videos

Hadoop videos on CloudxLab

Spark videos on CloudxLab

Stream Processing Using Apache Spark and Kafka

Thank you all for your overwhelming response to our “Stream Processing using Apache Spark and Apache Kafka session” in “Apache Spark Hands-On” series, which happened on June 15, 2016 8:00 pm IST

Key takeaways- 

+ Introduction to Apache Spark
+ Introduction to stream processing
+ Understanding RDD (Resilient Distributed Datasets)
+ Understanding Dstream
+ Kafka Introduction
+ Understanding Stream Processing flow
+ Real time hands-on using CloudxLab
+ Questions and Answers

Continue reading “Stream Processing Using Apache Spark and Kafka”

Apache Spark Introduction

Thank you all for your overwhelming response to our Apache Spark Introduction session in “Apache Spark Hands-On” series, which happened on April 28, 2016 8:00 pm IST

Presented By
Sandeep Giri

Sandeep Giri

Key takeaways for this webinar were

+ Introduction to Apache Spark
+ Introduction to RDD (Resilient Distributed Datasets)
+ Loading data into an RDD
+ RDD Operations – Transformation
+ RDD Operations – Actions
+ Hands-on demos using CloudxLab
+ Questions and Answers

Continue reading “Apache Spark Introduction”

CloudxLab Introduction

What is CloudxLab?

CloudxLab is a cloud based virtual lab for practicing Big Data (Hadoop, Spark etc), Machine Learning and Deep Learning technologies.

Origins

While training students on Big Data technologies at KnowBigData, we realized that our learners were facing a lot of trouble downloading and configuring virtual machines (VM) provided by major Hadoop vendors. Most often, these virtual machines were slow and would not allow for use of any other application on the same computer.

Moreover, working on a VM did not give a real world experience as one is still dealing with only one machine instead of a cluster of machines which is the whole idea of Big Data technologies which are primarily based on distributed computing.

This is how CloudxLab was conceptualized in an effort to resolve these pain points of learners. The video below will help understand how one of our clients – Simplilearn – is using CloudxLab to provide a better learning experience to their course takers.

Continue reading “CloudxLab Introduction”