Greetings all HPC Users,
Over the years, we have observed that many users try to request resources with bad habits or practice. Many users tried to ‘squeeze’ jobs into what resources are available at that moment. While this might allow jobs to start earlier, these habits however produce a lot of problems in the long run in an infinite cycle if continued. Requesting unused resources could in the long run block many of the jobs with proper resource requirements from running from time to time.
We have never promoted such behavior as to ‘squeeze’ jobs into the HPC cluster based on what resources are currently unused. HPC users should always request the amount of required resources, despite the currently unused resources being insufficient. Scheduling jobs based on available resources is the role of a HPC scheduler, and shall not be users’ concerns. Users should never request resources based on what resources are currently unused, but instead should request the amount of required resources.
So, in order to ‘fix’ the issue and promote good practice on HPC resource planning and usage, we have decided to remove easy access to information such as what resources are currently unused from cluster-info command. Information on maximum Cores, Threads and Memory for each node are still available through the cluster-info command. If you have been relying on cluster-info to ‘squeeze’ your jobs in, it is now time to change your approach on resource planning.
If you wish to know when your pending jobs will run, you can use squeue –start command to display an estimation of job start date and time for all current pending jobs. Only jobs submitted and pending for at least 60 seconds will show the estimated start date and time. The information could change based on the amount of jobs submitted from time to time and jobs finished before reaching the maximum walltime limit.
If you have any issues with this change, please let us know through the service desk. We are ready to hear your suggestions and feedback.
Thank you.