Changes between Version 7 and Version 8 of Workshops/JobCheckpointing
- Timestamp:
- 01/22/26 12:05:22 (2 days ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Workshops/JobCheckpointing
v7 v8 43 43 * Checkpointed jobs running parallel MPI (especially long running jobs recording at regular intervals) can fail as soon as a single node in use crashes. 44 44 * Checkpointed jobs running in certain cloud-based job queues with high availability can experience strictly enforced job pre-emption (SIGTERM signals). 45 * On Cypress, as an example, there are many more nodes available for multi-node jobs with 24-hour time limit. See [wiki:cypress/about#SLURMresourcemanager SLURM (resource manager)]. 45 46 46 47 == What are the impacts of job checkpointing? ==
