To ensure the most efficient use of our cluster resources and provide a smooth experience for all researchers, we would like to provide a reminder regarding our policies on resource requests, monitoring, and Slurm Fairshare.
📉 Active Monitoring & Under-Utilization
While users are free to request the specific resources required for their research (for example, requesting 4+ GPUs for high-parallelism workloads), we actively monitor all GPU, CPU, and memory usage to prevent waste.
- The 10% Rule: To maintain fair access for those in the queue, any job found to be using less than 10% of its allocated resources for 1/3 of its walltime may be canceled by the administration without prior notice.
- Why? This action frees up idle resources for other users who are waiting to run their code.
- Track Your Usage: You can monitor your resource utilization in real-time by visiting DICC OnDemand > Active Jobs > Utilization graph on each job.
⚖️ Slurm Fairshare: How Priority is Calculated
We continue to implement a Fairshare policy via Slurm to ensure equitable access across the cluster. The system is designed so that a single user submitting many jobs cannot block others.
- Usage vs. Score: The more resources you consume, the lower your Fairshare score becomes.
- Resource “Cost”: Using “expensive” resources like GPUs reduces your score faster than CPU-only jobs.
- Priority: A lower Fairshare score results in a lower priority for your new job allocations. As your usage tapers off, your score will gradually recover.
🚀 Optimization is Key
We encourage all users to optimize their job and request only the resources their job can effectively utilize. Efficient requests lead to:
- Faster queue times for everyone.
- Better overall performance of the cluster.
- Protection of your Fairshare score for future jobs.
Thank you for your cooperation in keeping DICC HPC environment efficient and fair for the entire research community.