Changes between Version 3 and Version 4 of Workshops/JobCheckpointing


Ignore:
Timestamp:
01/20/26 16:19:36 (8 hours ago)
Author:
Carl Baribault
Comment:

Added cloud-based item

Legend:

Unmodified
Added
Removed
Modified
  • Workshops/JobCheckpointing

    v3 v4  
    1010
    1111* Checkpointed jobs can get started sooner out of the job queue pending state with a reduced requested run time. (See "backfill scheduling" in [https://slurm.schedmd.com/sched_config.html|SLURM Scheduling Configuration Guide].
     12* Most production clusters enforce strict walltime limits. (See see [wiki:cypress/about#SLURMresourcemanager SLURM (resource manager)].)
    1213* A parallel MPI job can fail as soon as a single node in use crashes.
    13 * Most production clusters enforce strict walltime limits. (See see [wiki:cypress/about#SLURMresourcemanager SLURM (resource manager)].)
     14* Cloud-based job queues with high availability can enforce the use of pre-emptible job queues.
    1415
    1516== Impacts of Job Checkpointing ==