Changes between Version 47 and Version 48 of cypress/about


Ignore:
Timestamp:
Aug 21, 2018 12:30:41 PM (3 years ago)
Author:
fuji
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • cypress/about

    v47 v48  
    114114* [https://wiki.hpc.tulane.edu/trac/wiki/cypress/using#Requestingmemoryforyourjob Requesting memory for your job]
    115115
    116 === Job scheduling and priority ===
    117 
    118 We would like each of our research groups to have equal opportunity to use the cluster.  Instead of giving each research group a fixed allocation of CPU-time (where the ability to run jobs is cut off after the allocation is reached), SLURM uses a "Fair-share" feature to attempt to give each research group its fair share of resources.  Each job has a priority, which is a number that determines which queued jobs are to be scheduled to run first.
    119 
    120 You may use the "sprio" command to see the priority of queued jobs.  For example, the command:
    121 
    122 {{{
    123 sprio -o "%Y %u %i" | sort -nr
    124 }}}
    125 
    126 will return a list of queued jobs in priority order, and
    127 {{{
    128 sprio -j <jobid>
    129 }}}
    130 
    131 (where <jobid> should be replaced by the actual Job ID) will show the components which go into the priority.  These components are:
    132 
    133 * Fair-share:  Fair-share is based on historical usage.  For details on SLURM's Fair-share implementation, see here:
    134   https://slurm.schedmd.com/priority_multifactor.html#fairshare .
    135   In short, the more CPU-time previously used, the lower the priority for subsequent jobs will (temporarily) become.  SLURM has a half-life decay parameter so that more recent usage is weighted more strongly.  We set this half-life on Cypress to 1 week.
    136 
    137 * Age:  Jobs that have been waiting in the queue longer get higher priority.
    138 
    139 * Job Size:  Larger jobs (i.e. jobs with more CPUs/nodes requested) have higher priority to favor jobs that take advantage of parallel processesing (e.g. MPI jobs).
    140 
    141 SLURM calculates each priority component as a fraction (value between 0 and 1), which is then multiplied by a weight.  The current weights are:  Fair-share:  100,000; Age:  10,000;  Job Size:  1,000.  That is, Fair-share is the major contributor to priority.  The weighted components are added to give the final priority.
    142 
    143 
     116* [https://wiki.hpc.tulane.edu/trac/wiki/cypress/using#Jobschedulingandpriority Job scheduling and priority]
    144117
    145118