Tutorials Archives | Page 3 of 3

Using TensorFlow on CloudxLab

We are glad to inform you that the TensorFlow is now available on CloudxLab. In this example, we will walk you through a basic tutorial on how to use TensorFlow.

What is TensorFlow?
TensorFlow is an Open Source Software Library for Machine Intelligence. It is developed and supported by Google and is being adopted very fast.

What is CloudxLab?
CloudxLab provides a real cloud-based environment for practicing and learn various tools. You can start learning right away by just signing up online.

Continue reading “Using TensorFlow on CloudxLab”

Access S3 Files in Spark

In this blog post we will learn how to access S3 Files using Spark on CloudxLab.
Please follow below steps to access S3 files:

#Login to Web Console

#Specify the hadoop config
export YARN_CONF_DIR=/etc/hadoop/conf/
export HADOOP_CONF_DIR=/etc/hadoop/conf/

#Specify the Spark Class Path
export SPARK_CLASSPATH="$SPARK_CLASSPATH:/usr/hdp/current/hadoop-client/hadoop-aws.jar"
export SPARK_CLASSPATH="$SPARK_CLASSPATH:/usr/hdp/current/hadoop-client/lib/aws-java-sdk-1.7.4.jar"
export SPARK_CLASSPATH="$SPARK_CLASSPATH:/usr/hdp/current/hadoop-client/lib/guava-11.0.2.jar"

#Launch Spark Shell
/usr/spark1.6/bin/spark-shell

#On the spark shell Specify the AWS Key
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "YOUR_AWS_ACCESS_KeY")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "YOUR_AWS_SECRET_ACCESS_KeY")

#Now Access s3 files using spark
#Create RDD out of s3 file
val nationalNames = sc.textFile("s3n://cxl-spark-test-data/sss/baby-names.csv")

#Just check the first line
nationalNames.take(1)

Access Spark 1.6 and Spark 2.3 on CloudxLab

To access Spark 2.3, type below commands in the web console

pyspark (For Python)
spark-shell (For Scala)

To access Spark 1.6, first set the version in the web console

export SPARK_MAJOR_VERSION=1

And then type below commands in the web console

pyspark (For Python)
spark-shell (For Scala)

CloudxLab Getting Started Guide

Please use below resources to make most out of your CloudxLab Subscription

You can find the link to the complete getting started guide here.

CloudxLab hands-on videos

Hadoop videos on CloudxLab

Spark videos on CloudxLab

What is CloudxLab?

CloudxLab is a cloud based virtual lab for practicing Big Data (Hadoop, Spark etc), Machine Learning and Deep Learning technologies.

Origins

While training students on Big Data technologies at KnowBigData, we realized that our learners were facing a lot of trouble downloading and configuring virtual machines (VM) provided by major Hadoop vendors. Most often, these virtual machines were slow and would not allow for use of any other application on the same computer.

Moreover, working on a VM did not give a real world experience as one is still dealing with only one machine instead of a cluster of machines which is the whole idea of Big Data technologies which are primarily based on distributed computing.

This is how CloudxLab was conceptualized in an effort to resolve these pain points of learners. The video below will help understand how one of our clients – Simplilearn – is using CloudxLab to provide a better learning experience to their course takers.

Continue reading “CloudxLab Introduction”