Changes between Version 54 and Version 55 of cypress/using


Ignore:
Timestamp:
Aug 21, 2018 12:29:52 PM (3 years ago)
Author:
fuji
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • cypress/using

    v54 v55  
    161161
    162162This shows that job 2660 allocated close to 40 GB of memory.
     163
     164
     165==== Job scheduling and priority ====
     166
     167We would like each of our research groups to have equal opportunity to use the cluster.  Instead of giving each research group a fixed allocation of CPU-time (where the ability to run jobs is cut off after the allocation is reached), SLURM uses a "Fair-share" feature to attempt to give each research group its fair share of resources.  Each job has a priority, which is a number that determines which queued jobs are to be scheduled to run first.
     168
     169You may use the "sprio" command to see the priority of queued jobs.  For example, the command:
     170
     171{{{
     172sprio -o "%Y %u %i" | sort -nr
     173}}}
     174
     175will return a list of queued jobs in priority order, and
     176{{{
     177sprio -j <jobid>
     178}}}
     179
     180(where <jobid> should be replaced by the actual Job ID) will show the components which go into the priority.  These components are:
     181
     182* Fair-share:  Fair-share is based on historical usage.  For details on SLURM's Fair-share implementation, see here:
     183  https://slurm.schedmd.com/priority_multifactor.html#fairshare .
     184  In short, the more CPU-time previously used, the lower the priority for subsequent jobs will (temporarily) become.  SLURM has a half-life decay parameter so that more recent usage is weighted more strongly.  We set this half-life on Cypress to 1 week.
     185
     186* Age:  Jobs that have been waiting in the queue longer get higher priority.
     187
     188* Job Size:  Larger jobs (i.e. jobs with more CPUs/nodes requested) have higher priority to favor jobs that take advantage of parallel processesing (e.g. MPI jobs).
     189
     190SLURM calculates each priority component as a fraction (value between 0 and 1), which is then multiplied by a weight.  The current weights are:  Fair-share:  100,000; Age:  10,000;  Job Size:  1,000.  That is, Fair-share is the major contributor to priority.  The weighted components are added to give the final priority.
    163191
    164192=== MPI Jobs ===