Changes between Version 42 and Version 43 of cypress/about


Ignore:
Timestamp:
Jul 13, 2017 5:06:37 PM (4 years ago)
Author:
hoang
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • cypress/about

    v42 v43  
    118118
    119119Other QOS's on the system are for staff testing use only.
     120
     121=== Job scheduling and priority ===
     122
     123We would like each of our research groups to have equal opportunity to use the cluster.  Instead of giving each research group a fixed allocation of CPU-time (where the ability to run jobs is cut off after the allocation is reached), SLURM uses a "Fair-share" feature to attempt to give each research group its fair share of resources.  Each job has a priority, which is a number that determines which queued jobs are to be scheduled to run first.
     124
     125You may use the "sprio" command to see the priority of queued jobs.  For example, the command:
     126
     127{{{
     128sprio -o "%Y %u %i" | sort -nr
     129}}}
     130
     131will return a list of queued jobs in priority order, and
     132{{{
     133sprio -j <jobid>
     134}}}
     135
     136(where <jobid> should be replaced by the actual Job ID) will show the components which go into the priority.  These components are:
     137
     138* Fair-share:  Fair-share is based on historical usage.  For details on SLURM's Fair-share implementation, see here:
     139  https://slurm.schedmd.com/priority_multifactor.html#fairshare .
     140  In short, the more CPU-time previously used, the lower the priority for subsequent jobs will (temporarily) become.  SLURM has a half-life decay parameter so that more recent usage is weighted more strongly.  We set this half-life on Cypress to 1 week.
     141
     142* Age:  Jobs that have been waiting in the queue longer get higher priority.
     143
     144* Job Size:  Larger jobs (i.e. jobs with more CPUs/nodes requested) have higher priority to favor jobs that take advantage of parallel processesing (e.g. MPI jobs).
     145
     146SLURM calculates each priority component as a fraction (value between 0 and 1), which is then multiplied by a weight.  The current weights are:  Fair-share:  100,000; Age:  10,000;  Job Size:  1,000.  That is, Fair-share is the major contributor to priority.  The weighted components are added to give the final priority.
    120147
    121148=== Running serial jobs ===