The CloudxLab cluster is used for educational and PoC purposes. The reason we are able to provide the cluster at a very low cost is that we are able to share the systems.
The system resources are limited. If you try to use more resources, it is going to hurt other users. We have been avoiding putting hard limits on the resources consumption because we do not want to put roadblocks to the learning path to our users.
Here are the limits as per the fair usage policy:
HDFS - We provide 4.5 GB of storage space on HDFS with the replication factor of 3. That means if the replication factor is 3, you can store up to 1.5 GB data. And if the replication factor is 1, you can store up to 4.5 GB data. This is a hard limit meaning the HDFS will throw an error if you want to go beyond that storage.
Local Storage on the Linux console - The allowed storage is 1 GB on web console. Our scripts keep observing the storage consumption. Our bots will automatically delete your data if your storage is more than 1GB.
Hive - Please ensure that you do not create too many databases in Apache Hive. The permissible number of databases in the Apache Hive is one.
RAM - Please ensure that your programs are not consuming memory (RAM) beyond 2GB. This hurts the other users. Our bots will automatically kill your processes if your RAM usages are more than 2GB.
Duration - Please do not run a long process such as the hive, pyspark, spark-shell or Jupyter notebook beyond 60 minutes. While Hive, pyspark, spark-shell consume the containers from YARN, the Jupyter notebook consumes the local memory.
CPU - Please do not run CPU intensive tasks such as bitcoin mining or an infinite loop.
Bandwidth - Please do not download more than 5 GB data a month.
Please note that on violating these terms is an offence and your account might get disabled in case of offence.