| 49 | | ||='''sacct''' output column=||=Description=||=Format=|| |
| 50 | | ||'''TotalCPU'''||Total core hours used||[DD-[hh:]]mm:ss)|| |
| 51 | | ||'''CPUTimeRAW'''||Total cores hours allocated||Seconds|| |
| | 48 | ||='''sacct''' output column=||='''Description'''=||='''Format'''=||='''Notes'''=|| |
| | 49 | ||'''TotalCPU'''||Total core hours used||[DD-[hh:]]mm:ss)||Needs conversion to seconds|| |
| | 50 | ||'''CPUTimeRAW'''||Total cores hours allocated||Seconds||No conversion needed|| |
| | 51 | ||'''REQMEM'''||Requested memory||GB or MB ||Defaults to 3200MB per core|| |
| | 52 | ||'''MaxRSS'''||Maximum memory used||GB per node||Sampled every 30 seconds on Cypress|| |
| 55 | | foo |
| | 56 | === Ideal case === |
| | 57 | |
| | 58 | Ideally we have '''TotalCPU''' = '''CPUTimeRAW''' such as the following. |
| | 59 | |
| | 60 | * TotalCPU=20 hours, CPUTimeRAW=20 hours - using all 20 requested cores, full time for 1 hour |
| | 61 | * Core efficiency = (20 hours TotalCPU / 20 hours CPUTimeRAW) = 1 |
| | 62 | |
| | 63 | === Actual case === |
| | 64 | |
| | 65 | ==== Using sacct ==== |
| | 66 | |
| | 67 | Here is the sacct command used to for a completed job where we've masked the job ID XXXXXXX |
| | 68 | |
| | 69 | {{{ |
| | 70 | [tulaneID@cypress1 ~]$sacct -P -n --format JobID,AllocCPUS,TotalCPU,CPUTimeRaw,REQMEM,MaxRSS -j XXXXXX |
| | 71 | XXXXXXX|10|11-04:18:08|1213660|128Gn| |
| | 72 | XXXXXXX.batch|1|11-04:18:08|121366|128Gn|3860640K |
| | 73 | }}} |
| | 74 | |
| | 75 | In the following we'll use the values TotalCPU=11-04:18:08 and CPUTimeRAW=1213660 from the 2nd line, the XXXXXXX.batch step, in the above. |
| | 76 | |
| | 77 | ==== Converting TotalCPU to seconds ==== |
| | 78 | |
| | 79 | We'll use the following shell function to convert '''TotalCPU''' in format [DD-[hh:]]mm:ss) to seconds. |
| | 80 | |
| | 81 | |
| | 82 | {{{ |
| | 83 | [tulaneID@cypress1 ~]$convert_totalcpu_to_seconds() { |
| | 84 | seconds=$(echo "$1" | awk -F'[:-]' '{ |
| | 85 | if (NF == 4) { |
| | 86 | # Format: D-HH:MM:SS |
| | 87 | total = ($1 * 86400) + ($2 * 3600) + ($3 * 60) + $4 |
| | 88 | } else if (NF == 3) { |
| | 89 | # Format: HH:MM:SS or MM:SS (assumes HH:MM:SS) |
| | 90 | total = ($1 * 3600) + ($2 * 60) + $3 |
| | 91 | } else if (NF == 2) { |
| | 92 | # Format: MM:SS |
| | 93 | total = ($1 * 60) + $2 |
| | 94 | } else { |
| | 95 | total = $1 # Assume only seconds if no separators found |
| | 96 | } |
| | 97 | print total |
| | 98 | }') |
| | 99 | |
| | 100 | echo "$seconds" |
| | 101 | } |
| | 102 | [tulaneID@cypress1 ~]$convert_totalcpu_to_seconds 11-04:18:08 |
| | 103 | 965888 |
| | 104 | }}} |
| | 105 | |
| | 106 | === Compute cumulative core efficiency === |
| | 107 | |
| | 108 | Now that we have the job's '''TotalCPU''' in seconds, we can calculate the job's cumulative core efficiency. |
| | 109 | |
| | 110 | {{{ |
| | 111 | [tulaneID@cypress1 ~]$bc <<< "scale=2; 965888 / 1213660" |
| | 112 | .79 |
| | 113 | }}} |
| | 114 | |
| | 115 | === Summary for this job === |
| | 116 | |
| | 117 | ==== Fewer requested resources = faster job queueing |
| | 118 | |
| | 119 | In general, whenever a job can nonetheless run to completion in a comparable elapsed time but with less memory and/or fewer processors (cores and/or nodes) requested, then easier the resource manager SLURM will find an earlier time slot - if not immediately so - to queue (start and run) the job. |
| | 120 | |
| | 121 | ==== Suggestions for requested processor count and RAM |
| | 122 | |
| | 123 | * With the above result of 0.79, we conclude that not all 10 requested cores were in use throughout the duration of the job. |
| | 124 | * We may be able to request fewer cores depending on the requirements of the parallel segments of the computation. |
| | 125 | * We should consult the software provider's information. |
| | 126 | * Also, the job used ~3.9GB ('''MaxRSS''') out of the requested 128GB ('''REQMEM''') of RAM |
| | 127 | * We could easily expect to have the job run in the same amount of time requesting 10 cores and greatly reduced memory, say, '''!--mem=32000''' or 32GB. |