The CloudxLab cluster is used for educational and PoC purposes. The reason we are able to provide the cluster at a very low cost is that we are able to share the systems.
The system resources are limited. If you try to use more resources, it is going to hurt other users. We have been avoiding putting hard limits on the resources consumption because we do not want to put roadblocks to the learning path to our users.
Here are the limits as per the fair usage policy:
HDFS - We provide 4.5 GB of storage space on HDFS with the replication factor of 3. That means if the replication factor is 3, you can store up to 1.5 GB data. And if the replication factor is 1, you can store up to 4.5 GB data. This is a hard limit meaning the HDFS will throw an error if you want to go beyond that storage.
Local Storage on the Linux console - The allowed storage is 3 GB on the web console in your home directory.
If you exceed this 3GB quota, you are given a 7 day grace period to reduce your usage to less than 3GB.
During this grace period you can have a maximum of 4GB of data in your home directory.
Once the grace period expires or you exceed the 4GB limit, whichever is earlier, you will no longer be able to create any new files in your home directory, and also your Jupyter server will stop working until you reduce your usage to less than 3GB.
Also, our scripts keep observing the storage consumption. Our bots will automatically delete your data if your storage is more than 4GB.
To clean the unnecessary files please follow the instructions given here.
Hive - Please ensure that you do not create too many databases in Apache Hive. The permissible number of databases in the Apache Hive is one.
RAM - Please ensure that your programs are not consuming memory (RAM) beyond 2GB. This hurts the other users. Our bots will automatically kill your processes if your RAM usages are more than 2GB.
Duration - Please do not run a long process such as the Hive, pyspark, spark-shell, or Jupyter notebook. Your process will be killed by our bots if 1) It is running for more than 3 hours 2) Your notebook is idle for more than 60 mins, 3) You are using more than one YARN container at a time. While Hive, pyspark, spark-shell consume the containers from YARN, the Jupyter notebook consumes the local memory.
CPU - Please do not run CPU-intensive tasks such as bitcoin mining or an infinite loop.
Bandwidth - Please do not download more than 5 GB of data a month.
MySQL - Please note that you will not be able to create new databases in MySQL. In MySQL, it becomes difficult to manage if we are allowing everyone to create databases.
Please note that violating these terms is an offense and your account might get disabled in case of an offense.