wiki:Workshops/cypress/JobDependency

Version 1 (modified by fuji, 6 years ago) ( diff )

Job Dependency

If you haven't done yet, download Samples by:

svn co file:///home/fuji/repos/workshop ./workshop

Checkout Sample files onto local machine, (linux shell)

svn co svn+ssh://USERID@cypress1.tulane.edu/home/fuji/repos/workshop ./workshop


Job dependencies are used to defer the start of a job until the specified dependencies have been satisfied. They are specified with the —dependency option to sbatch command.

sbatch --dependency=<type:job_id[:job_id][,type:job_id[:job_id]]> ...

Dependency types:

  • after:jobid[:jobid…] job can begin after the specified jobs have started
  • afterany:jobid[:jobid…] job can begin after the specified jobs have terminated
  • afternotok:jobid[:jobid…] job can begin after the specified jobs have failed
  • afterok:jobid[:jobid…] job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats).

Submitting Dependent Jobs

Get into JobDependencies directory under workshop,

[fuji@cypress1 ~]$ cd workshop/
[fuji@cypress1 workshop]$ cd JobDependencies
[fuji@cypress1 JobDependencies]$ ls
addOne.py  number.dat  script.sh  slurmscript  SubmitDependentJobs.sh

Python code addOne.py reads number.dat and gets an integer number, and then adds one and stores it back to number.dat .

[fuji@cypress1 JobDependencies]$ cat addOne.py
# HELLO PYTHON
import datetime
import socket

now = datetime.datetime.now()
print 'Hello, world!'
print now.isoformat()
print socket.gethostname()
#
with open('number.dat','r') as f:
	data = f.readline()
	number = int(data)
#
print "Number = %d" % number
with open('number.dat','w') as f:
	f.write(str(number + 1))
#

slurmscipt just run the code,

[fuji@cypress1 JobDependencies]$ cat slurmscript
#!/bin/bash
#SBATCH --qos=workshop            # Quality of Service
#SBATCH --partition=workshop      # partition
#SBATCH --job-name=python       # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

module load anaconda
python addOne.py

sleep 10

Let submit one job and then submit anther job that depends on the first jobs as,

[fuji@cypress1 JobDependencies]$ sbatch slurmscript
Submitted batch job 773997
[fuji@cypress1 JobDependencies]$ sbatch --dependency=afterok:773997 slurmscript
Submitted batch job 773998

List the jobs,

[fuji@cypress1 JobDependencies]$ squeue -u fuji
     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)
    773998 worksh             python     fuji PD       0:00  1 (Dependency)
    773997 worksh             python     fuji  R       0:00  1 cypress01-117

After the first job completed, the second job begin to run,

[fuji@cypress1 JobDependencies]$ squeue -u fuji
     JOBID    QOS               NAME     USER ST       TIME NO NODELIST(REASON)
    773998 worksh             python     fuji  R       0:05  1 cypress01-117

The results are:

[fuji@cypress1 JobDependencies]$ ls
addOne.py  number.dat  script.sh  slurm-773997.out  slurm-773998.out  slurmscript  SubmitDependentJobs.sh
[fuji@cypress1 JobDependencies]$ cat slurm-773997.out
Hello, world!
2018-08-22T14:55:37.421310
cypress01-117
Number = 2
[fuji@cypress1 JobDependencies]$ cat slurm-773998.out
Hello, world!
2018-08-22T14:55:47.619183
cypress01-117
Number = 3
Note: See TracWiki for help on using the wiki.