In this blog post we will learn how to access S3 Files using Spark on CloudxLab.
Please follow below steps to access S3 files:
#Login to Web Console
#Specify the hadoop config
#Specify the Spark Class Path
#Launch Spark Shell
#On the spark shell Specify the AWS Key
#Now Access s3 files using spark
#Create RDD out of s3 file
val nationalNames = sc.textFile("s3n://cxl-spark-test-data/sss/baby-names.csv")
#Just check the first line
Adding to an already impressive list of collaborations, International School of Engineering (INSOFE) has recently signed up with CloudxLab (CxL). This move will enable INSOFE’s students to practice in a real world scenario through the cloud based labs offered by CloudxLab.
INSOFE’s flagship program, CPEE – Certificate Program in Engineering Excellence – was created to transform “individuals into analytics professionals”. It is listed at #3 between Columbia and Stanford at #2 and #4 respectively, and holds the distinction of being the only institute outside the US to hold a spot in this list by CIO.com. This within an admirable 3 years of inception. Having established itself as one of the top institutes globally, INSOFE is ceaselessly on the look out for innovative ways to engage and enhance student experience.
In a recent strategic partnership that demonstrates SCMHRD’s superior vision in pedagogy, the Post Graduate Program in Business Analytics (PGPBA) has tied up with well known learning innovation firm CloudxLab. With this partnership, SCMHRD’s students will get to learn and work with Big Data and analytics tools in the same manner that enterprises learn and use them.
SCMHRD’s flagship Analytics program PGPBA with its emphasis on Big Data analytics, as opposed to standard analytics, makes it relevant to a bigger gamut of employers and hence the better choice. This emphasis isn’t easy to cater to. Providing Big Data tools to learners entails providing a cluster (a bunch of computers) that they can practice on which in turn translates to expensive infrastructure, big support teams, and the operational costs that go with everything.
CloudxLab is a cloud based virtual lab for practicing Big Data (Hadoop, Spark etc), Machine Learning and Deep Learning technologies.
While training students on Big Data technologies at KnowBigData, we realized that our learners were facing a lot of trouble downloading and configuring virtual machines (VM) provided by major Hadoop vendors. Most often, these virtual machines were slow and would not allow for use of any other application on the same computer.
Moreover, working on a VM did not give a real world experience as one is still dealing with only one machine instead of a cluster of machines which is the whole idea of Big Data technologies which are primarily based on distributed computing.
This is how CloudxLab was conceptualized in an effort to resolve these pain points of learners. The video below will help understand how one of our clients – Simplilearn – is using CloudxLab to provide a better learning experience to their course takers.