Future Of Mobility – Shaped By Big Data & AI

The advancements in the field of Big Data & Artificial Intelligence (AI) are occurring at an unprecedented pace and everyone from researchers to engineers to common folk are wondering how their lives will be affected. While almost all industries are estimating significant disruption from advancements in Big Data & AI, I believe the industry that will actually experience the maximum impact will be the Automotive or Transportation industry. Here is my perspective on how Big Data & AI will change the Automotive & Transportation industry landscape. It should appeal to engineers as well as to common folk interested in technological developments. I will discuss the challenges, existing solutions and will propose two alternative solutions.

Challenges in the Automotive Industry

The automotive industry is currently grappling with the following issues:

  • Pollution: Almost all the automobiles use fossil fuels that fire the “Internal Combustion Engine” (ICE). The emissions of this process range from carbon-dioxide, carbon-monoxide, oxides of nitrogen and other pollutants. Such emissions are resulting in the deterioration of health throughout the world.
  • Congestion: As the number of people commuting through automobiles are increasing everyday, congested roads and traffic jams are becoming more and more frequent.
  • Accidents:  Most automobiles currently in use are driven manually, and there are frequent cases of accidents occurring due to errors committed by human drivers.

Existing Solution

Self-driving electric cars

It is being widely accepted that in future self-driving electric cars will solve many of the issues currently being faced by the automotive industry. Since these cars will be driven through electrical power, they are expected not to cause emissions and pollution. Also, since the cars will be self-driven, the issue of accidents due to human errors are expected to be resolved. Furthermore, these cars will automatically choose the routes based on real-time traffic information on all possible roads. Therefore, the issue of congestion too is expected to get resolved.  All these features will become possible through Big Data enabled AI:

Visual Image Processing: The most crucial aspect of self-driving automobiles will be the ability of their image processing system to correctly identify the boundaries of the roads, the possible obstacles on roads and broken roads. This identification is possible through AI systems powered by Big Data. The Visual Image Processing system of the automobiles will need to initially process the captured real-time analog visuals into a digital format. This needs to tally with the database of previously encountered and stored digital images and their actual interpretations along with the remedial course of action. Every such action needs the database to be updated for future use. Especially challenging will be cases of partial road blockages or the presence of spilled harmful substances on roads. Such cases will actually test the Big Data enabled AI mechanism of the image processing system. The system will need to accurately interpret unexpected road situations and provide solutions.

Navigation:  As the next crucial challenge, the navigation system of self-driven automobiles should be able to access, interpret and analyze real-time road traffic information to decide on the best possible route from the source to the destination. There are likely to be many possible road routes between the source and destination whereby each route has a different length, different road condition & different level of real-time traffic. The Big Data powered AI system of the self-driven automobile should be able to access and process these variables according to the requirements and priorities of the traveller so as to decide on the best suited road route. Developments in Internet of Things (IoT) will ensure that all the required information is available to the navigation system of the automobile, and the AI capabilities will ensure that the automobile can process the information accurately.

  • Electric automobiles will not cause pollution while driving
  • Self-driving capabilities will provide convenience and possibly reduce accidents
  • Since electricity itself is produced through a variety of means including burning fossil fuels and nuclear fission reactors, the overall reduction in pollution may not be so significant
  • On the road, almost infinite possible situations and scenarios can arise such as small sharp substances, acids, inflammables etc. being spilled on road. This will test the capabilities of the AI system of the self-driven automobiles
  • Since, each self-driven automobile will operate on its own algorithms and systems, any unexpected behaviour from one automobile can alter the previously predicted traffic estimates
  • As the number of electric automobiles increases and all such automobiles carry greater electrical charge for increased travel range, then too many automobiles carrying large amounts of electrical charge and zipping past each other at high speeds and within very close proximity could trigger unexpected electromagnetic inflammation.

Alternative Solutions

1. Reducing pollution from Internal Combustion Engine (ICE):

Very recently, the Swedish automaker Volvo announced plans to scrap the internal combustion engine and shift entirely to electric automobiles by 2040. The Economist magazine published an article proclaiming the demise of the internal combustion engine (ICE). However, in complete contrast to these developments,  the Japanese automaker Mazda announced a major breakthrough that would significantly reduce emissions from such engines. Therefore, the great paradox confronting the automotive industry is whether it should discard the ICE or undertake significant investments and efforts to improve the efficiencies and reduce the emissions of ICE.

My personal opinion, bolstered by the recent breakthrough from Mazda, is that besides the other aspects of Engineering, advancements in Big Data & AI can enable the researchers to develop such superior formats of ICE that are much more efficient and much less polluting than the current versions. And doing so will require rapidly deploying & testing newer modifications and equally rapidly implementing further changes based on results of previous modifications. And such deployment-testing-performance data needs to be collated from as many cases as is possible. Furthermore, automakers need to test the possibilities of using newer formats of fuel, possible fossil & bio-fuel hybrid fuel, that are inherently less polluting than the currently used fossil fuels. Only through Big Data powered AI, automakers can experiment, deploy, test and analyse the maximum possible combinations of ICE configurations and fuel.

  • Superior ICE technology coupled with inherently less polluting fuel can offer a credible alternative to all-electric cars and provide more options to people
  • Fuel prices are expected to decline significantly in future
  • Using fuel directly to power the ICE is more appealing that using the same fuel or coal to first produce electricity and then use the electricity to power electric automobiles
  • Storing fuel is always easier and safer than storing electricity
  • If in future, it becomes possible to generate all the required electricity only through solar cells and at prices comparable to that possible through fossils, then this approach will lose significance

2. Centrally controlled and routed Electric pods:

I consider that the goal of auto-driven electric mobility will be better achieved through a “Track-Network-based Centrally Controlled & Routed On-Demand Pod System”. In this, electricity propelled pods will run on a track network that is separate from the general road network. The user of each pod will enter his destination, and the automated central routing station (ACRS) will determine the exact route and speeds of the entire journey of that pod prior to the start of travel for that pod. This will become possible as the ACRS itself has decided on and is executing the routes for all the pods that are on this segregated track network. And since, the network is separate from the usual road traffic, there cannot be any unexpected disruptions on the traffic through trespassing or accidents.

However, this system will involve centralised route determination and execution for all the pods on the network. Hence, there will be a need to capture, process, analyse, interpret and execute upon humongous amount of real-time data. This will be possible only through Big Data enabled AI systems that power the ACRS.

  • Automated central control and routing of all the pods on the network will ensure that highest possible efficiencies are achieved in terms of minimising congestion and time of travel as the ACRS itself will determine and execute the itinerary of each individual pod with time-precision of milliseconds and location-precision of millimeters.
  • Unexpected situations such as trespassing or spillovers will either not arise or will get easily detected on the segregated track network
  • Faced with any unexpected contingency, the central control system can reconfigure the speeds and routes of as many pods as required
  • Much of the electrical propulsion power will be stored in the “network grid” and not in the individual pods – thereby making the pod safer
  • Furthermore, since the electrical power will be stored in the non-moving network-grid, the moving pods will not have to carry the storage or safety mechanisms associated with large amounts of electrical charge. This will ensure that the pods are super-light and require minimal propulsive thrust. In contrast, the road-based electrical automobiles will have to carry the burden of storage and safety mechanisms associated with electrical charge within themselves, thereby making them heavy.
  • Any disruption or issue in the ACRS will disrupt the entire network
  • Any error in assessing the speeds or locations of any of the pods by the ACRS will disrupt the entire network


Amongst all the industries, the automotive and transportation industry has the most significant impact on our lives.  As the debate on the future of this industry intensifies, the industry finds itself at crossroads whereby there are many possibilities and alternatives to choose from. The most remarkable point to note here is that irrespective of which possibilities or alternatives prevail and succeed, it is beyond question that the future of automotive industry will be shaped by advancements in the field of Big Data & AI.

Furthermore, in my opinion, the future of ground transportation will witness the coexistence of all the three possibilities that we discussed:

  • Prime regions of metropolitan cities will run on the track-based on-demand centrally routed electric pods
  • Most regions of metropolitan and Tier-I cities will run on road-based individually controlled electric automobiles (manually or self driven)
  • Smaller towns and cities will continue to run on manually driven road based automobiles that utilise improved ICE technology and superior fossil-bio-fuel combination


What, How & Why of Artificial Intelligence

Artificial Intelligence (AI) is the buzzword that is resounding and echoing all over the world. While large corporations, organizations & institutions are publicly proclaiming and publicizing their massive investments toward development and deployment of AI capabilities, people, in general, are feeling perplexed regarding the meaning and nuances of AI. This blog is an attempt to demystify AI and provide a brief introduction to the various aspects of AI to all such persons, engineers, non-engineers & beginners, who are seeking to understand AI. In the forthcoming discussion, we will explore the following questions:

  • What is AI & what does it seek to accomplish?
  • How will the goals of AI be accomplished, through which methodologies?
  • Why is AI gaining so much momentum now?

I. What is AI & what does it seek to accomplish?

The term “Artificial Intelligence” was coined by John McCarthy in 1956 during a workshop at University of Dartmouth at New Hampshire. AI implies intelligence exhibited by machines whereby this intelligence is either equivalent or superior to that exhibited by humans. The objective of AI is to enable machines to accomplish the following:

  • Sensory Perception: AI enabled Bot should be able to perceive & classify stimuli of vision, sound, touch, taste & odor. Examples: Vision perceiving self-driven automobiles can avoid and circumvent obstacles on the road; sound perceiving bots can provide suggestion to improve musical compositions; Touch perceiving bots can allow apparel buyers to e-touch the fabric of apparel; taste perceiving bots can allow humans to choose the best restaurant, and odour perceiving bots can replace sniffer dogs at criminal sites
  • Natural Language Processing: AI enabled bots can read, speak & write human languages such as English, Spanish etc.
  • Reasoning: AI enabled bots should be able to analyze and understand the many options, possibilities and scenarios inherent in any problem and thereby suggest the best course of action to humans. For example, AI enabled bot should be able to diagnose the diseases in a patient, analyze the implications of various possible medications and thereby suggest the best possible course of treatment.

II. How will the goals of AI be accomplished?

The following snapshot provides a picture of the available methodologies to achieve AI:

Let us consider the possibilities in some detail:

  • Rule-based Systems: These are software-hardware packages whereby many possibilities have been hard-coded into the system, and through pre-coded instructions, the system is empowered to provide responses to inputs or stimuli. The benefit of such systems is that they will provide a completely accurate output or response for all those inputs or stimuli that it has been explicitly programmed for. However, the system will be in a spot if an input or stimuli occur that it has not been programmed to respond to.
  • Domain-Specific Computing: Such systems are an extension of rule-based systems whereby the overall software-hardware system has been explicitly programmed to respond to the many possible inputs or stimuli that can occur within a specific industrial domain. For example, a domain specific computing system for a self-driving car will be programmed specifically to respond to many possible situations such as partial-road-blockages that can arise while driving. Similarly, a domain specific computing system for medical diagnosis will be programmed specifically to check for certain medical parameters and decide on medication based on the results.
  • Robotics: In this approach, computer programming is combined with human-like features of walking, hands-movement etc. in robots that can serve many functions that require a combination of decision-making and movement. Robots that can clean houses or serve food in restaurants are examples of robotics.
  • Machine Learning: This is the most relevant approach to achieve AI whereby the computer system “learns by doing” without being explicitly programmed for the many situations that can arise. In this approach, the system makes several learning-oriented trial attempts at doing the required task and collects and analyzes the results of each attempt. Through continuous and real-time analysis of the data and results of all of its previous attempts, the system goes on learning, modifying and improving its approach. An example will be the instance whereby a computer system learned to play Mario not through any explicitly programmed instructions, but through repeated attempts to play and continuous improvement of its approach based on results of all previous attempts.
  • Deep Learning: This is a sub-set of Machine Learning whereby the functioning of human neural networks is mimicked unto “Artificial Neural Networks”. Neural Networks consist of three types of layers of neurons: Input Layer, Output Layer & many possible hidden layers in between. If there are 2 or more hidden layers, it is referred to as a Deep Learning Network.


III. Why is AI gaining so much momentum now?

The following developments are propelling the recent advancements in AI:

  • Powerful Computing: Significant advancements in processing speed of microprocessors and the storage capacity of RAMs have enabled real-time processing and analysis of massive amounts of data and simultaneous generation of insights that can be used for improved decision making by bots
  • Distributed Computing: In order to process huge amounts of data, it is now possible to distribute both the data and the logic into many different machines so that the data storage and data processing abilities of all the available computers can be utilized in parallel and simultaneously
  • Internet of Things: Everything from industrial machinery to personal gadgets is now connected to each other through either the internet or intranets. Therefore, it is now possible to collect huge amount of real-time data from all of these inter-connected devices — which can be further analyzed and processed to draw superior insights and continuously improve automated decision-making
  • Deep Learning Advancements: Recent breakthroughs have allowed the scientists to more clearly understand the structure and working of human neural networks. Based on this, scientists have developed Artificial Neural Networks that can completely mimic the human neural networks and enable human-like learning and decision-making by bots

IV. Examples of Current Use of AI

  • Australian bank Westpac is using AI powered bots to answer customer queries and provide financial advice
  • Gmail is using AI to auto-suggest replies to emails
  • Pinterest is using AI to suggest images to users that are similar to the images that those users have shown interest in
  • AirBnB is using AI to decide the prices for various types of accommodations at different locations so that their sales are maximized
  • Policing agencies are using AI to recognize and match the faces and voices of persons over the internet with those of criminals being pursued
  • Facebook is using AI to suggest possible friend options to its users based on factors such as location, interests, profession, age etc.

V. Conclusion:

It is clearly evident that developments in the field of AI represent the most significant paradigm shift in the technological landscape whereby more and more humans are not just accepting, but rather feeling excited about the many benefits that AI can offer. The scientists, on their part, are undertaking all possible efforts to ensure that the advancements in AI serve the purpose of enhancing the quality of life for all.


GraphFrames on CloudxLab

GraphFrames is quite a useful library of spark which helps in bringing Dataframes and GraphX package together.

From the website of Graphframes:

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

You can use graph frames very easily with spark-shell at CloudxLab by using —package option in the following way.

For spark-shell:

For python spark shell:

When you launch the shell with the –packages argument, it is going to download graphframes and make available in the shell. Now, lets create a graph frame. Here is some example code (scala):

This would display the total in degrees of each vertex:

Now, lets try to filter. The following code would display the counts of edges that have follow relationship which 2.

Now, lets try to run the an algorithm such as pagerank on the graph.

After few iterations, it should display the page rank of each element as follows:

CloudxLab Webinar on “Big Data & AI”: An Overwhelming Success

Understanding the current scenario of tremendous interest of students and professionals regarding “Big Data & AI”, CloudxLab conducted a webinar on July 12, 2017 to introduce and explain the many nuances of this upcoming field to all the enthusiasts. Mr Sandeep Giri, founder of CloudxLab with more than 15 years of experience in the industry with companies such as Amazon, InMobi, DE Shaw etc., was the lead presenter in the webinar.

The Scope:

The webinar covered the following:

  • Overview of Big Data
  • Big Data Technology Stack
  • Introduction to Artificial Intelligence
  • Demo of Machine Learning Example
  • Artificial Intelligence Technology Stack

The Response:

  • Approximately 1000 persons registered for the webinar, and 600+ attended
  • Overwhelming enthusiasm was evident as 100+ questions were asked
  • Participants requested to extend the session as the flow of knowledge was tremendously beneficial to all

The Flow:

  • Sandeep initiated the webinar by comparing the current state of “Big Data & AI” technologies with the ideal future goal of creating an artificial person who is human-equivalent in all aspects
  • Presenter provided a sneak peek into Machine Learning through an example where computer learned and mastered the game “Mario” by repeatedly practicing and improving through each iteration
  • Thereafter, it was explained how Internet of Things is creating humongous amounts of data called Big Data, and how various technologies have evolved to Collect-Store-Process Big Data
  • The presenter, then, delved upon the relevance of Hadoop & Spark ecosystems in the Collect-Store-Process cycle of Big Data
  • At this stage the discussion became a combination of presentation and query-resolution on the many nuances of Hadoop & Spark ecosystems as participants raised many interesting questions
  • The discussion then progressed to the next stage where the presenter explained the meaning and history of development of “Artificial Intelligence” (AI)
  • Thereafter, he explained that Reasoning, Navigation, Natural Language Processing, Knowledge & Perception are the current scope of AI — and future scope should include Emotional Intelligence & Intuition
  • The presenter, then, progressed to explain the meaning and nuances of Machine Learning & Deep Learning along with their frameworks
  • The webinar then culminated through an extended Q/A session to answer all the queries whereby most participants enquired how they could learn more on these topics.

The Recording:

What about those who missed the webinar?

  • Worry not, the recording of the webinar is available here:

  • The presentation slides are available here:


The Feedback:

How CloudxLab Helped Our User With A Job

We recently had a heart warming moment – one of our subscribers had been made an offer by Tata Consultancy Services.  His thank you note to us made our day.  Meara Laxman had subscribed to CloudxLab to practice his Big Data skills and as per him got more than he expected.  Here is our interview with him.

CxL: How did CloudxLab help you learn Big Data tools better?
Laxman: Cloudxlab helped me a lot in learning all the Bigdata eco system components. I had gained enough theoretical knowledge on big data tools from the internet but I ran into trouble trying to practice due to my incompatible system requirements and configurations. That is when I found Cloudxlab and subscribed to it.  I got good exposure to the practical aspects as Cloudxlab provided some sample lab session video material which are very clear and easy to practice and understand. Moreover, the Cloudxlab team helped me every time I had an issue and clarified all my queries.

CxL: How did CloudxLab help you with finding a new job?
Laxman: CloudxLab played a key role in getting me my new job. I lacked Continue reading “How CloudxLab Helped Our User With A Job”

How to install Python packages on CloudxLab?

In this blog post, we will learn how to install Python packages on CloudxLab.

Step 1-

Create the virtual environment for your project. A virtual environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them. Login to CloudxLab web console and create a virtual environment for your project.

Continue reading “How to install Python packages on CloudxLab?”

CloudxLab Reviews


Jose Manual Ramirez Leon

It is really a great site. As a 37-year-old with a masters
in mechanical engineering, I decided to switch careers
and get another masters. One of my courses was
Big Data and, at the beginning, I was completely lost
& I was falling behind in my assignments and after
searching the internet for a solution, finally found  CloudxLab.

Not only do they have any conceivable Big Data
technology on their servers, they have superb
customer support. Whenever I have had a doubt,
even in debugging my own programs, they have
answered me with the correct solution in a few hours.

I earnestly recommend it to everyone.

Continue reading “CloudxLab Reviews”

Building Real-Time Analytics Dashboard Using Apache Spark

Apache Spark


In this blog post, we will learn how to build a real-time analytics dashboard using Apache Spark streaming, Kafka, Node.js, Socket.IO and Highcharts.

Problem Statement

An e-commerce portal (http://www.aaaa.com) wants to build a real-time analytics dashboard to visualize the number of orders getting shipped every minute to improve the performance of their logistics.


Before working on the solution, let’s take a quick look at all the tools we will be using:

Apache Spark – A fast and general engine for large-scale data processing. It is 100 times faster than Hadoop MapReduce in memory and 10x faster on disk. Learn more about Apache Spark here

Python – Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Learn more about Python here

Kafka – A high-throughput, distributed, publish-subscribe messaging system. Learn more about Kafka here

Node.js – Event-driven I/O server-side JavaScript environment based on V8. Learn more about Node.js here

Socket.io – Socket.IO is a JavaScript library for real-time web applications. It enables real-time, bi-directional communication between web clients and servers. Read more about Socket.io here

Highcharts – Interactive JavaScript charts for web pages. Read more about Highcharts here

CloudxLab – Provides a real cloud-based environment for practicing and learn various tools. You can start practicing right away by just signing up online.

How To Build A Data Pipeline?

Below is the high-level architecture of the data pipeline

Data Pipeline
Data Pipeline

Our real-time analytics dashboard will look like this

Real-Time Analytics Dashboard
Real-Time Analytics Dashboard

Continue reading “Building Real-Time Analytics Dashboard Using Apache Spark”

Cloudera Certification Practice On CloudxLab

How does CloudxLab help with preparing for Cloudera, Hortonworks, and related certifications?  Here is an interview with one of our users who
has successfully completed the ‘Cloudera Certified Associate for Spark andUntitled Hadoop Developer‘ (CCA175) certification using CloudxLab for hands-on practice. Having completed the certification, Senthil Ramesh who is currently working with Accenture, gladly discussed his experience with us.

CxL: How did CloudxLab help you with the Cloudera certification and help you learn Big Data overall?

Senthil: CloudxLab played an important part in the hands on experience for my big data learning. As soon as I understood that my laptop may not be able to support all the tools necessary to work towards the certification, I started looking for a cloud based solution and found CloudxLab. The sign up was easy and everything was setup in a short time. I must say, without doing hands on it would have been harder to crack the certification. Thanks to CloudxLab for that.

CxL: Why CloudxLab and not a Virtual Machine?

Continue reading “Cloudera Certification Practice On CloudxLab”

CloudxLab Joins Hands With TechM’s UpX Academy


CloudxLab is proud to announce its partnership with TechMahindra’s UpX Academy.  TechM’s e-learning platform, UpX Academy, delivers courses in Big Data & Data Sciences.  With programs spanning over 6-12 weeks and covering in-demand skills such as Hadoop, Spark, Machine Learning, R and Tableau, UpX has tied up with CloudxLab to provide the latest to its course takers.

Run by an excellent team, we at CloudxLab are in awe of the attention UpX pays to the users needs.  As Sandeep (CEO at CloudxLab) puts it, “We were not surprised when UpX decided to come on board.  Their ultimate interest is in keeping their users happy and we are more than glad to work with them on this.”

Continue reading “CloudxLab Joins Hands With TechM’s UpX Academy”