wiki:Workshops/JobParallelism/AfterYourJobHasCompleted

Version 2 (modified by Carl Baribault, 8 hours ago) ( diff )

Added ideal, actual, and summary

After your job has completed - determining cumulative core efficiency

Assumptions

See Assumptions - same as for running jobs.

Preliminary: tools available

On LONI clusters

LONI clusters provide the self-contained commands seff and qshow.

On LONI QB4 cluster:

[loniID@qbd2 ~]$ seff -h
Usage: seff [Options] <Jobid>
       Options:
       -h    Help menu
       -v    Version
       -d    Debug mode: display raw Slurm data
[loniID@qbd2 ~]$ seff -v
seff Version 2.1
  • qshow (provided by LONI)

On LONI QB4 cluster:

[loniID@qbd2 ~]$ qshow -h
** usage: qshow -n <options> <base-name> <begin #> <end #> <command>
...
Show and optionally kill user processes on remote nodes or execute
commands...
[loniID@qbd2 ~]$ qshow -v
qshow 2.74

On Cypress

In the following we'll need to use the sacct command for analyzing completed jobs on Cypress. (Cypress uses an older version of SLURM (v14.03.0) with insufficient support for the seff command.)

Here are the relevant outputs that we'll be using from sacct.

sacct output columnDescriptionFormatNotes
TotalCPUTotal core hours used[DD-[hh:]]mm:ss)Needs conversion to seconds
CPUTimeRAWTotal cores hours allocatedSecondsNo conversion needed
REQMEMRequested memoryGB or MB Defaults to 3200MB per core
MaxRSSMaximum memory usedGB per nodeSampled every 30 seconds on Cypress

Cumulative core efficency: (total core hours used) / (total core hours allocated)

Ideal case

Ideally we have TotalCPU = CPUTimeRAW such as the following.

  • TotalCPU=20 hours, CPUTimeRAW=20 hours - using all 20 requested cores, full time for 1 hour
  • Core efficiency = (20 hours TotalCPU / 20 hours CPUTimeRAW) = 1

Actual case

Using sacct

Here is the sacct command used to for a completed job where we've masked the job ID XXXXXXX

[tulaneID@cypress1 ~]$sacct  -P -n --format JobID,AllocCPUS,TotalCPU,CPUTimeRaw,REQMEM,MaxRSS -j XXXXXX
XXXXXXX|10|11-04:18:08|1213660|128Gn|
XXXXXXX.batch|1|11-04:18:08|121366|128Gn|3860640K

In the following we'll use the values TotalCPU=11-04:18:08 and CPUTimeRAW=1213660 from the 2nd line, the XXXXXXX.batch step, in the above.

Converting TotalCPU to seconds

We'll use the following shell function to convert TotalCPU in format [DD-[hh:]]mm:ss) to seconds.

[tulaneID@cypress1 ~]$convert_totalcpu_to_seconds() {
   seconds=$(echo "$1" | awk -F'[:-]' '{
      if (NF == 4) {
          # Format: D-HH:MM:SS
          total = ($1 * 86400) + ($2 * 3600) + ($3 * 60) + $4
      } else if (NF == 3) {
          # Format: HH:MM:SS or MM:SS (assumes HH:MM:SS)
          total = ($1 * 3600) + ($2 * 60) + $3
      } else if (NF == 2) {
          # Format: MM:SS
          total = ($1 * 60) + $2
      } else {
          total = $1 # Assume only seconds if no separators found
      }
      print total
   }')

   echo "$seconds"
}
[tulaneID@cypress1 ~]$convert_totalcpu_to_seconds 11-04:18:08
965888

Compute cumulative core efficiency

Now that we have the job's TotalCPU in seconds, we can calculate the job's cumulative core efficiency.

[tulaneID@cypress1 ~]$bc <<< "scale=2; 965888 / 1213660"
.79

Summary for this job

Fewer requested resources = faster job queueing

In general, whenever a job can nonetheless run to completion in a comparable elapsed time but with less memory and/or fewer processors (cores and/or nodes) requested, then easier the resource manager SLURM will find an earlier time slot - if not immediately so - to queue (start and run) the job.

Suggestions for requested processor count and RAM

  • With the above result of 0.79, we conclude that not all 10 requested cores were in use throughout the duration of the job.
    • We may be able to request fewer cores depending on the requirements of the parallel segments of the computation.
    • We should consult the software provider's information.
  • Also, the job used ~3.9GB (MaxRSS) out of the requested 128GB (REQMEM) of RAM
    • We could easily expect to have the job run in the same amount of time requesting 10 cores and greatly reduced memory, say, --mem=32000 or 32GB.
Note: See TracWiki for help on using the wiki.