[[PageOutline]] = After your job has completed - determining cumulative core efficiency = == Assumptions == See [wiki:Workshops/JobParallelism/WhileYourJobIsRunning#Assumptions Assumptions] - same as for running jobs. == Preliminary: tools available == === On LONI clusters === LONI clusters provide the self-contained commands '''seff''' and '''qshow'''. * '''seff''' (See [https://github.com/SchedMD/slurm/tree/master/contribs/seff seff on github].) On LONI QB4 cluster: {{{ [loniID@qbd2 ~]$ seff -h Usage: seff [Options] Options: -h Help menu -v Version -d Debug mode: display raw Slurm data [loniID@qbd2 ~]$ seff -v seff Version 2.1 }}} * '''qshow''' (provided by LONI) On LONI QB4 cluster: {{{ [loniID@qbd2 ~]$ qshow -h ** usage: qshow -n ... Show and optionally kill user processes on remote nodes or execute commands... [loniID@qbd2 ~]$ qshow -v qshow 2.74 }}} === On Cypress === In the following we'll need to use the '''sacct''' command for analyzing completed jobs on Cypress. (Cypress uses an older version of SLURM (v14.03.0) with insufficient support for the seff command.) Here are the relevant outputs that we'll be using from '''sacct'''. ||='''sacct''' output column=||='''Description'''=||='''Format'''=||='''Notes'''=|| ||'''TotalCPU'''||Total core hours used||[DD-[hh:]]mm:ss)||Needs conversion to seconds|| ||'''CPUTimeRAW'''||Total cores hours allocated||Seconds||No conversion needed|| ||'''REQMEM'''||Requested memory||GB or MB ||Defaults to 3200MB per core|| ||'''MaxRSS'''||Maximum memory used||GB per node||Sampled every 30 seconds on Cypress|| == Cumulative core efficency: (total core hours used) / (total core hours allocated) == === Ideal case === Ideally we have '''TotalCPU''' = '''CPUTimeRAW''' such as the following. * TotalCPU=20 hours, CPUTimeRAW=20 hours - using all 20 requested cores, full time for 1 hour * Core efficiency = (20 hours TotalCPU / 20 hours CPUTimeRAW) = 1 === Actual case === ==== Using sacct ==== Here is the sacct command used to for a completed job where we've masked the job ID XXXXXXX {{{ [tulaneID@cypress1 ~]$sacct -P -n --format JobID,AllocCPUS,TotalCPU,CPUTimeRaw,REQMEM,MaxRSS -j XXXXXX XXXXXXX|10|11-04:18:08|1213660|128Gn| XXXXXXX.batch|1|11-04:18:08|121366|128Gn|3860640K }}} In the following we'll use the values TotalCPU=11-04:18:08 and CPUTimeRAW=1213660 from the 2nd line, the XXXXXXX.batch step, in the above. ==== Converting TotalCPU to seconds ==== We'll use the following shell function to convert '''TotalCPU''' in format [DD-[hh:]]mm:ss) to seconds. {{{ [tulaneID@cypress1 ~]$convert_totalcpu_to_seconds() { seconds=$(echo "$1" | awk -F'[:-]' '{ if (NF == 4) { # Format: D-HH:MM:SS total = ($1 * 86400) + ($2 * 3600) + ($3 * 60) + $4 } else if (NF == 3) { # Format: HH:MM:SS or MM:SS (assumes HH:MM:SS) total = ($1 * 3600) + ($2 * 60) + $3 } else if (NF == 2) { # Format: MM:SS total = ($1 * 60) + $2 } else { total = $1 # Assume only seconds if no separators found } print total }') echo "$seconds" } [tulaneID@cypress1 ~]$convert_totalcpu_to_seconds 11-04:18:08 965888 }}} === Compute cumulative core efficiency === Now that we have the job's '''TotalCPU''' in seconds, we can calculate the job's cumulative core efficiency. {{{ [tulaneID@cypress1 ~]$bc <<< "scale=2; 965888 / 1213660" .79 }}} === Summary for this job === ==== Fewer requested resources = faster job queueing In general, whenever a job can nonetheless run to completion in a comparable elapsed time but with less memory and/or fewer processors (cores and/or nodes) requested, then easier the resource manager SLURM will find an earlier time slot - if not immediately so - to queue (start and run) the job. ==== Suggestions for requested processor count and RAM * With the above result of 0.79, we conclude that not all 10 requested cores were in use throughout the duration of the job. * We may be able to request fewer cores depending on the requirements of the parallel segments of the computation. * We should consult the software provider's information. * Also, the job used ~3.9GB ('''MaxRSS''') out of the requested 128GB ('''REQMEM''') of RAM * We could easily expect to have the job run in the same amount of time requesting 10 cores and greatly reduced memory, say, '''!--mem=32000''' or 32GB.