Dear HPC Users,

Recently we noticed a lot of interactive sessions were created and started via DICC OnDemand portal by many users, where significant of those jobs generate a huge amount of idle and unused time in the cluster. This is a waste in HPC resources.

Based on our observation over the months, we can summarise that these idle times were due to either of the following reasons:

  • The queued interactive jobs in OnDemand portal started during midnight or weekend when the users were away.
  • The jobs completed some calculation but pending for more actions from users due to interactive nature.
  • Users created jobs by mistake, where most of the jobs have been cancelled by DICC administrators due to prolonged periods of inactivity.

In order to minimise the impact of these problems on the utilisation of the HPC cluster, we have internally discussed and decided to reduce the time limit for such jobs. In the future, all interactive jobs started via DICC OnDemand portal will be limited to ONE hour maximum. Batch jobs submitted via sbatch are not affected. This also means that any jobs that were observed with a 1 hour or longer idle gap in CPU or GPU usage will be immediately terminated by the DICC administrator.

The justification behind this decision is that interactive sessions are not meant for long running jobs, but mostly for visual debugging and quick data analysis. If you are going to run a long running calculation, you should run those in proper batch jobs. We hope everyone is responsible towards their submitted jobs and other users and stop wasting HPC resources.

Existing running jobs will not be affected, but will be terminated if idle for too long. All currently queued OnDemand interactive jobs will be capped to 1 hour maximum wall time.

If you have any concern or questions on this change, please let us know via the service desk.

Categories: HPCNews