Changes between Initial Version and Version 1 of Workshops/cypress/JobDependency


Ignore:
Timestamp:
08/22/18 14:59:59 (6 years ago)
Author:
fuji
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Workshops/cypress/JobDependency

    v1 v1  
     1= Job Dependency =
     2If you haven't done yet, download Samples by:
     3
     4{{{svn co file:///home/fuji/repos/workshop ./workshop}}}
     5
     6Checkout Sample files onto local machine, (linux shell)
     7
     8{{{svn co svn+ssh://USERID@cypress1.tulane.edu/home/fuji/repos/workshop ./workshop}}}
     9
     10
     11----
     12
     13Job dependencies are used to defer the start of a job until the specified dependencies have been satisfied. They are specified with the '''--dependency''' option to '''sbatch''' command.
     14{{{
     15sbatch --dependency=<type:job_id[:job_id][,type:job_id[:job_id]]> ...
     16}}}
     17
     18Dependency types:
     19* '''after:jobid[:jobid...]'''  job can begin after the specified jobs have started
     20* '''afterany:jobid[:jobid...]'''       job can begin after the specified jobs have terminated
     21* '''afternotok:jobid[:jobid...]'''     job can begin after the specified jobs have failed
     22* '''afterok:jobid[:jobid...]'''        job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats).
     23
     24
     25== Submitting Dependent Jobs ==
     26Get into '''JobDependencies''' directory under '''workshop''',
     27{{{
     28[fuji@cypress1 ~]$ cd workshop/
     29[fuji@cypress1 workshop]$ cd JobDependencies
     30[fuji@cypress1 JobDependencies]$ ls
     31addOne.py  number.dat  script.sh  slurmscript  SubmitDependentJobs.sh
     32}}}
     33
     34Python code addOne.py reads '''number.dat''' and gets an integer number, and then adds one and stores it back to '''number.dat''' .
     35{{{
     36[fuji@cypress1 JobDependencies]$ cat addOne.py
     37# HELLO PYTHON
     38import datetime
     39import socket
     40
     41now = datetime.datetime.now()
     42print 'Hello, world!'
     43print now.isoformat()
     44print socket.gethostname()
     45#
     46with open('number.dat','r') as f:
     47        data = f.readline()
     48        number = int(data)
     49#
     50print "Number = %d" % number
     51with open('number.dat','w') as f:
     52        f.write(str(number + 1))
     53#
     54}}}
     55
     56
     57'''slurmscipt''' just run the code,
     58{{{
     59[fuji@cypress1 JobDependencies]$ cat slurmscript
     60#!/bin/bash
     61#SBATCH --qos=workshop            # Quality of Service
     62#SBATCH --partition=workshop      # partition
     63#SBATCH --job-name=python       # Job Name
     64#SBATCH --time=00:01:00         # WallTime
     65#SBATCH --nodes=1               # Number of Nodes
     66#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
     67#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)
     68
     69module load anaconda
     70python addOne.py
     71
     72sleep 10
     73}}}
     74
     75Let submit one job and then submit anther job that depends on the first jobs as,
     76{{{
     77[fuji@cypress1 JobDependencies]$ sbatch slurmscript
     78Submitted batch job 773997
     79[fuji@cypress1 JobDependencies]$ sbatch --dependency=afterok:773997 slurmscript
     80Submitted batch job 773998
     81}}}
     82
     83List the jobs,
     84{{{
     85[fuji@cypress1 JobDependencies]$ squeue -u fuji
     86     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)
     87    773998 worksh             python     fuji PD       0:00  1 (Dependency)
     88    773997 worksh             python     fuji  R       0:00  1 cypress01-117
     89}}}
     90
     91After the first job completed, the second job begin to run,
     92{{{
     93[fuji@cypress1 JobDependencies]$ squeue -u fuji
     94     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)
     95    773998 worksh             python     fuji  R       0:05  1 cypress01-117
     96}}}
     97
     98The results are:
     99{{{
     100[fuji@cypress1 JobDependencies]$ ls
     101addOne.py  number.dat  script.sh  slurm-773997.out  slurm-773998.out  slurmscript  SubmitDependentJobs.sh
     102[fuji@cypress1 JobDependencies]$ cat slurm-773997.out
     103Hello, world!
     1042018-08-22T14:55:37.421310
     105cypress01-117
     106Number = 2
     107[fuji@cypress1 JobDependencies]$ cat slurm-773998.out
     108Hello, world!
     1092018-08-22T14:55:47.619183
     110cypress01-117
     111Number = 3
     112}}}