Changes between Version 7 and Version 8 of Workshops/JobCheckpointing


Ignore:
Timestamp:
01/22/26 12:05:22 (2 days ago)
Author:
Carl Baribault
Comment:

Added multi-node benefit

Legend:

Unmodified
Added
Removed
Modified
  • Workshops/JobCheckpointing

    v7 v8  
    4343* Checkpointed jobs running parallel MPI (especially long running jobs recording at regular intervals) can fail as soon as a single node in use crashes.
    4444* Checkpointed jobs running in certain cloud-based job queues with high availability can experience strictly enforced job pre-emption (SIGTERM signals).
     45* On Cypress, as an example, there are many more nodes available for multi-node jobs with 24-hour time limit. See [wiki:cypress/about#SLURMresourcemanager SLURM (resource manager)].
    4546
    4647== What are the impacts of job checkpointing? ==