Changes between Version 3 and Version 4 of Workshops/JobCheckpointing
- Timestamp:
- 01/20/26 16:19:36 (8 hours ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Workshops/JobCheckpointing
v3 v4 10 10 11 11 * Checkpointed jobs can get started sooner out of the job queue pending state with a reduced requested run time. (See "backfill scheduling" in [https://slurm.schedmd.com/sched_config.html|SLURM Scheduling Configuration Guide]. 12 * Most production clusters enforce strict walltime limits. (See see [wiki:cypress/about#SLURMresourcemanager SLURM (resource manager)].) 12 13 * A parallel MPI job can fail as soon as a single node in use crashes. 13 * Most production clusters enforce strict walltime limits. (See see [wiki:cypress/about#SLURMresourcemanager SLURM (resource manager)].)14 * Cloud-based job queues with high availability can enforce the use of pre-emptible job queues. 14 15 15 16 == Impacts of Job Checkpointing ==
