Changes between Version 54 and Version 55 of cypress/using

Aug 21, 2018 12:29:52 PM (3 years ago)


  • cypress/using

    v54 v55  
    162162This shows that job 2660 allocated close to 40 GB of memory.
     165==== Job scheduling and priority ====
     167We would like each of our research groups to have equal opportunity to use the cluster.  Instead of giving each research group a fixed allocation of CPU-time (where the ability to run jobs is cut off after the allocation is reached), SLURM uses a "Fair-share" feature to attempt to give each research group its fair share of resources.  Each job has a priority, which is a number that determines which queued jobs are to be scheduled to run first.
     169You may use the "sprio" command to see the priority of queued jobs.  For example, the command:
     172sprio -o "%Y %u %i" | sort -nr
     175will return a list of queued jobs in priority order, and
     177sprio -j <jobid>
     180(where <jobid> should be replaced by the actual Job ID) will show the components which go into the priority.  These components are:
     182* Fair-share:  Fair-share is based on historical usage.  For details on SLURM's Fair-share implementation, see here:
     183 .
     184  In short, the more CPU-time previously used, the lower the priority for subsequent jobs will (temporarily) become.  SLURM has a half-life decay parameter so that more recent usage is weighted more strongly.  We set this half-life on Cypress to 1 week.
     186* Age:  Jobs that have been waiting in the queue longer get higher priority.
     188* Job Size:  Larger jobs (i.e. jobs with more CPUs/nodes requested) have higher priority to favor jobs that take advantage of parallel processesing (e.g. MPI jobs).
     190SLURM calculates each priority component as a fraction (value between 0 and 1), which is then multiplied by a weight.  The current weights are:  Fair-share:  100,000; Age:  10,000;  Job Size:  1,000.  That is, Fair-share is the major contributor to priority.  The weighted components are added to give the final priority.
    164192=== MPI Jobs ===