116 | | === Job scheduling and priority === |
117 | | |
118 | | We would like each of our research groups to have equal opportunity to use the cluster. Instead of giving each research group a fixed allocation of CPU-time (where the ability to run jobs is cut off after the allocation is reached), SLURM uses a "Fair-share" feature to attempt to give each research group its fair share of resources. Each job has a priority, which is a number that determines which queued jobs are to be scheduled to run first. |
119 | | |
120 | | You may use the "sprio" command to see the priority of queued jobs. For example, the command: |
121 | | |
122 | | {{{ |
123 | | sprio -o "%Y %u %i" | sort -nr |
124 | | }}} |
125 | | |
126 | | will return a list of queued jobs in priority order, and |
127 | | {{{ |
128 | | sprio -j <jobid> |
129 | | }}} |
130 | | |
131 | | (where <jobid> should be replaced by the actual Job ID) will show the components which go into the priority. These components are: |
132 | | |
133 | | * Fair-share: Fair-share is based on historical usage. For details on SLURM's Fair-share implementation, see here: |
134 | | https://slurm.schedmd.com/priority_multifactor.html#fairshare . |
135 | | In short, the more CPU-time previously used, the lower the priority for subsequent jobs will (temporarily) become. SLURM has a half-life decay parameter so that more recent usage is weighted more strongly. We set this half-life on Cypress to 1 week. |
136 | | |
137 | | * Age: Jobs that have been waiting in the queue longer get higher priority. |
138 | | |
139 | | * Job Size: Larger jobs (i.e. jobs with more CPUs/nodes requested) have higher priority to favor jobs that take advantage of parallel processesing (e.g. MPI jobs). |
140 | | |
141 | | SLURM calculates each priority component as a fraction (value between 0 and 1), which is then multiplied by a weight. The current weights are: Fair-share: 100,000; Age: 10,000; Job Size: 1,000. That is, Fair-share is the major contributor to priority. The weighted components are added to give the final priority. |
142 | | |
143 | | |
| 116 | * [https://wiki.hpc.tulane.edu/trac/wiki/cypress/using#Jobschedulingandpriority Job scheduling and priority] |