Changes between Version 2 and Version 3 of Workshops/cypress/SlurmPractice


Ignore:
Timestamp:
08/22/18 13:24:53 (6 years ago)
Author:
fuji
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Workshops/cypress/SlurmPractice

    v2 v3  
    9191|| long        || 168 ||168 hours ||  8 ||
    9292
    93 The differences between '''normal''' and '''long''' are the number of nodes you can request and duration you can run your code.
     93The differences between '''normal''' and '''long''' are the number of nodes you can request and  the duration you can run your code.
    9494The details will be explained in Parallel Jobs below.
    9595
     
    161161}}}
    162162
    163 
    164 
     163=== Cancel Jobs ===
     164Look at '''slurmscrit2''',
     165{{{
     166[fuji@cypress1 SerialJob]$ cat slurmscript2
     167#!/bin/bash
     168#SBATCH --qos=workshop            # Quality of Service
     169#SBATCH --partition=workshop      # partition
     170#SBATCH --job-name=pythonLong       # Job Name
     171#SBATCH --time=00:01:00         # WallTime
     172#SBATCH --nodes=1               # Number of Nodes
     173#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
     174#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)
     175
     176module load anaconda
     177python hello.py
     178
     179sleep 3600
     180}}}
     181
     182The major difference from '''slurmscrit1''' is the last line '''sleep 3600''', which makes Bash to wait for 3600 seconds at this point.
     183
     184Submit the job,
     185{{{
     186[fuji@cypress1 SerialJob]$ sbatch slurmscript2
     187Submitted batch job 773951
     188}}}
     189
     190The '''squeue''' command gives you a list of jobs running/queued on Cypress.
     191The '''squeue''' command also tells us what node our job is being run on.
     192To single out your own job you can use the "-u" option flag to specify your user name.
     193
     194{{{
     195[fuji@cypress1 SerialJob]$ squeue -u fuji
     196     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)
     197    773951 worksh         pythonLong     fuji  R       0:07  1 cypress01-117
     198}}}
     199
     200To stop the job,
     201{{{
     202[fuji@cypress1 SerialJob]$ scancel 773951
     203[fuji@cypress1 SerialJob]$ squeue -u fuji
     204     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)
     205}}}
     206
     207You will see a new file again.
     208{{{
     209[fuji@cypress1 SerialJob]$ ls
     210hello.py  slurm-773944.out  slurm-773951.out  slurmscript1  slurmscript2
     211}}}
     212
     213The new file contains
     214{{{
     215[fuji@cypress1 SerialJob]$ cat slurm-773951.out
     216Hello, world!
     2172018-08-22T13:17:00.965433
     218cypress01-117
     219slurmstepd: error: *** JOB 773951 CANCELLED AT 2018-08-22T13:17:25 ***
     220}}}