CloudxLab is a shared cluster where you will be sharing resources with the other users. For dedicated cluster requests for multiple users reachus@cloudxlab.com.
CloudxLab is a managed service where the configurations and installations are taken care of by us. We have already set up most of the tools needed for practice.
Please contact us at reachus@cloudxlab.com if you are looking for other tools and we will try our best to make it available on the cluster. We are in pursuit of providing the best experience for our learning. If the tools/library you need is open source or free and it could be useful to more than 5% users, we would like to install it.
Also, you can install various libraries in your own environment such as virtualenv that do not require any administration.
Dedicated Big Data & AI organization focusing on lab services. Machine Learning ecosystem with Big Data such as TensorFlow, ScikitLearn, NumPy, SciPy, Pandas and Analytics tools such as R, Jupyter, etc. We have multiple versions of Spark. Automated Assessments and Email Support
The tools and components available in the cluster include Hadoop, Spark, Kafka, Hive, Pig, HBase, Oozie, ZooKeeper, Flume, Sqoop, Mahout, R, Linux, Python, Scala, MongoDB, NumPy, SciPy, Pandas, Scikit-learn etc. Again, if you are looking for other tools please contact us at reachus@cloudxlab.com.
We provide the Hortonworks Data Platform. The Hadoop version on the cluster is 2.7. Please find the version of all the software components installed on CloudxLab here.
Currently, we have 5 nodes in the cluster. We automatically scale up and down based on the cluster load. Three nodes have 8 cores and 32 GB RAM and the other two nodes have 16 cores and 60 GB RAM each depending on the services running on them.
The CloudxLab cluster is used for educational and PoC purposes. The reason we are able to provide the cluster at a very low cost is that we are able to share the systems.
The system resources are limited. If you try to use more resources, it is going to hurt other users. We have been avoiding putting hard limits on the resources consumption because we do not want to put roadblocks to the learning path to our users.
Here are the limits as per the fair usage policy:
HDFS - We provide 4.5 GB of storage space on HDFS with the replication factor of 3. That means if the replication factor is 3, you can store up to 1.5 GB data. And if the replication factor is 1, you can store up to 4.5 GB data. This is a hard limit meaning the HDFS will throw an error if you want to go beyond that storage.
Local Storage on the Linux console - The allowed storage is 3 GB on the web console in your home directory.
If you exceed this 3GB quota, you are given a 7 day grace period to reduce your usage to less than 3GB.
During this grace period you can have a maximum of 4GB of data in your home directory.
Once the grace period expires or you exceed the 4GB limit, whichever is earlier, you will no longer be able to create any new files in your home directory, and also your Jupyter server will stop working until you reduce your usage to less than 3GB.
Also, our scripts keep observing the storage consumption. Our bots will automatically delete your data if your storage is more than 4GB.
To clean the unnecessary files please follow the instructions given here.
Hive - Please ensure that you do not create too many databases in Apache Hive. The permissible number of databases in the Apache Hive is one.
RAM - Please ensure that your programs are not consuming memory (RAM) beyond 2GB. This hurts the other users. Our bots will automatically kill your processes if your RAM usages are more than 2GB.
Duration - Please do not run a long process such as the Hive, pyspark, spark-shell, or Jupyter notebook. Your process will be killed by our bots if 1) It is running for more than 3 hours 2) Your notebook is idle for more than 60 mins, 3) You are using more than one YARN container at a time. While Hive, pyspark, spark-shell consume the containers from YARN, the Jupyter notebook consumes the local memory.
CPU - Please do not run CPU-intensive tasks such as bitcoin mining or an infinite loop.
Bandwidth - Please do not download more than 5 GB of data a month.
MySQL - Please note that you will not be able to create new databases in MySQL. In MySQL, it becomes difficult to manage if we are allowing everyone to create databases.
Please note that violating these terms is an offense and your account might get disabled in case of an offense.
An additional 10% discount is available if more than 100 subscriptions' upfront payment is done. For more details, reach out to reachus@cloudxlab.com
We have tried to keep everything available so you can start practicing without delay. So, yes, we do provide sample datasets. Please find the list of available datasets here
Yes! Please feel free to ask your questions on CloudxLab forum and our community and team of experts will answer your questions. We believe forum will add better perspectives, ideas, and solutions to your questions.
No. The lab is like a buffet. You can use as much as you want under the fair usage policy but you can not share with others.
Also, the lab is very personalized. The experience points are rewarded based on the lab usage and therefore it should be used individually.
If you will share your logins with others then our bot will automatically disable your account and this is irrevocable. You might not be able to use the CloudxLab services in future in such cases.
If you are unhappy with the product for any reason, let us know within 7 days of purchasing or upgrading your account, and we'll cancel your account and issue a full refund. Please contact us at reachus@cloudxlab.com to request a refund within the stipulated time. We will be sorry to see you go though!
You can use the Jupyter notebook and the Unix text editors on CloudxLab to write code. ?It is not possible to install the software like IntelliJ or Eclipse on a cloud based environment. You can use such IDEs on desktops/laptop and then upload the code on CloudxLab for execution. See this chapter to learn about it in more details:
Yes! Your whole team or class can opt for our Corporate Training Program. We provide assessment platforms too for your current employees and new candidates.
We can help you out to create assessments in our assessment engine. With this engine, you will be able to track if the students have completed the given hands-on exercises or not. For more details, let us know at reachus@cloudxlab.com
Please sign up here as an instructor and we will provide you the details.
Absolutely! Please contact us here. You can also reach us anytime on our 24/7 support helpline by calling us on +918049202224