= Job Dependency = If you haven't done yet, download Samples by: {{{svn co file:///home/fuji/repos/workshop ./workshop}}} Checkout Sample files onto local machine, (linux shell) {{{svn co svn+ssh://USERID@cypress1.tulane.edu/home/fuji/repos/workshop ./workshop}}} ---- Job dependencies are used to defer the start of a job until the specified dependencies have been satisfied. They are specified with the '''--dependency''' option to '''sbatch''' command. {{{ sbatch --dependency= ... }}} Dependency types: * '''after:jobid[:jobid...]''' job can begin after the specified jobs have started * '''afterany:jobid[:jobid...]''' job can begin after the specified jobs have terminated * '''afternotok:jobid[:jobid...]''' job can begin after the specified jobs have failed * '''afterok:jobid[:jobid...]''' job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats). == Submitting Dependent Jobs == Get into '''JobDependencies''' directory under '''workshop''', {{{ [fuji@cypress1 ~]$ cd workshop/ [fuji@cypress1 workshop]$ cd JobDependencies [fuji@cypress1 JobDependencies]$ ls addOne.py number.dat script.sh slurmscript SubmitDependentJobs.sh }}} Python code addOne.py reads '''number.dat''' and gets an integer number, and then adds one and stores it back to '''number.dat''' . {{{ [fuji@cypress1 JobDependencies]$ cat addOne.py # HELLO PYTHON import datetime import socket now = datetime.datetime.now() print 'Hello, world!' print now.isoformat() print socket.gethostname() # with open('number.dat','r') as f: data = f.readline() number = int(data) # print "Number = %d" % number with open('number.dat','w') as f: f.write(str(number + 1)) # }}} '''slurmscipt''' just run the code, {{{ [fuji@cypress1 JobDependencies]$ cat slurmscript #!/bin/bash #SBATCH --qos=workshop # Quality of Service #SBATCH --partition=workshop # partition #SBATCH --job-name=python # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) module load anaconda python addOne.py sleep 10 }}} Let submit one job and then submit anther job that depends on the first jobs as, {{{ [fuji@cypress1 JobDependencies]$ sbatch slurmscript Submitted batch job 773997 [fuji@cypress1 JobDependencies]$ sbatch --dependency=afterok:773997 slurmscript Submitted batch job 773998 }}} List the jobs, {{{ [fuji@cypress1 JobDependencies]$ squeue -u fuji JOBID QOS NAME USER ST TIME NO NODELIST(REASON) 773998 worksh python fuji PD 0:00 1 (Dependency) 773997 worksh python fuji R 0:00 1 cypress01-117 }}} After the first job completed, the second job begin to run, {{{ [fuji@cypress1 JobDependencies]$ squeue -u fuji JOBID QOS NAME USER ST TIME NO NODELIST(REASON) 773998 worksh python fuji R 0:05 1 cypress01-117 }}} The results are: {{{ [fuji@cypress1 JobDependencies]$ ls addOne.py number.dat script.sh slurm-773997.out slurm-773998.out slurmscript SubmitDependentJobs.sh [fuji@cypress1 JobDependencies]$ cat slurm-773997.out Hello, world! 2018-08-22T14:55:37.421310 cypress01-117 Number = 2 [fuji@cypress1 JobDependencies]$ cat slurm-773998.out Hello, world! 2018-08-22T14:55:47.619183 cypress01-117 Number = 3 }}}