= Job Dependency = If you haven't done yet, download Samples by: {{{git clone https://hidekiCCS:@bitbucket.org/hidekiCCS/hpc-workshop.git}}} ---- Job dependencies are used to defer the start of a job until the specified dependencies have been satisfied. They are specified with the '''--dependency''' option to '''sbatch''' command. {{{ sbatch --dependency= ... }}} Dependency types: * '''after:jobid[:jobid...]''' job can begin after the specified jobs have started * '''afterany:jobid[:jobid...]''' job can begin after the specified jobs have terminated * '''afternotok:jobid[:jobid...]''' job can begin after the specified jobs have failed * '''afterok:jobid[:jobid...]''' job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats). == Submitting Dependent Jobs == Get into '''!JobDependencies''' directory under '''workshop''', {{{ [fuji@cypress1 ~]$ cd workshop/ [fuji@cypress1 workshop]$ cd JobDependencies [fuji@cypress1 JobDependencies]$ ls addOne.py number.dat script.sh slurmscript SubmitDependentJobs.sh }}} Python code addOne.py reads '''number.dat''' and gets an integer number, and then adds one and stores it back to '''number.dat''' . {{{ [fuji@cypress1 JobDependencies]$ cat addOne.py # HELLO PYTHON import datetime import socket now = datetime.datetime.now() print 'Hello, world!' print now.isoformat() print socket.gethostname() # with open('number.dat','r') as f: data = f.readline() number = int(data) # print "Number = %d" % number with open('number.dat','w') as f: f.write(str(number + 1)) # }}} '''slurmscipt''' just run the code, {{{ [fuji@cypress1 JobDependencies]$ cat slurmscript #!/bin/bash #SBATCH --qos=workshop # Quality of Service #SBATCH --partition=workshop # partition #SBATCH --job-name=python # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) module load anaconda python addOne.py sleep 10 }}} Let submit one job and then submit anther job that depends on the first jobs as, {{{ [fuji@cypress1 JobDependencies]$ sbatch slurmscript Submitted batch job 773997 [fuji@cypress1 JobDependencies]$ sbatch --dependency=afterok:773997 slurmscript Submitted batch job 773998 }}} List the jobs, {{{ [fuji@cypress1 JobDependencies]$ squeue -u fuji JOBID QOS NAME USER ST TIME NO NODELIST(REASON) 773998 worksh python fuji PD 0:00 1 (Dependency) 773997 worksh python fuji R 0:00 1 cypress01-117 }}} After the first job completed, the second job begin to run, {{{ [fuji@cypress1 JobDependencies]$ squeue -u fuji JOBID QOS NAME USER ST TIME NO NODELIST(REASON) 773998 worksh python fuji R 0:05 1 cypress01-117 }}} The results are: {{{ [fuji@cypress1 JobDependencies]$ ls addOne.py number.dat script.sh slurm-773997.out slurm-773998.out slurmscript SubmitDependentJobs.sh [fuji@cypress1 JobDependencies]$ cat slurm-773997.out Hello, world! 2018-08-22T14:55:37.421310 cypress01-117 Number = 2 [fuji@cypress1 JobDependencies]$ cat slurm-773998.out Hello, world! 2018-08-22T14:55:47.619183 cypress01-117 Number = 3 }}} == Submitting Many Dependent Jobs with Bash Script == Look at '''!SubmitDependentJobs.sh''' {{{ [fuji@cypress1 JobDependencies]$ cat SubmitDependentJobs.sh #!/bin/bash EMAIL=$USER@tulane.edu WALLTIME_LIMIT=1:00:00 export WORKDIR=`pwd` # QUEUE='--partition=workshop --qos=workshop' WALLTIME="--time=$WALLTIME_LIMIT" RESORCE="--nodes=1 --ntasks-per-node=1 --cpus-per-task=1" OTHERS="--export=ALL --mail-type=END --mail-user=$EMAIL" # JOB_SETTING="$QUEUE $WALLTIME $RESORCE $OTHERS" DEPENDENCY="" while [[ $# > 0 ]] do JOB=`sbatch --job-name=$1 $DEPENDENCY $JOB_SETTING ./$1 | awk '{print $4}'`; echo $JOB submitted; DEPENDENCY="--dependency=afterok:$JOB" ; shift done }}} This bash script takes script names as command-line options, and submits a sequence of dependent jobs with those scripts. The bash script, '''script.sh''' {{{ [fuji@cypress1 JobDependencies]$ cat script.sh #!/bin/bash module load anaconda python addOne.py sleep 1 }}} just runs '''addOne.py'''. Let's submit 10 of '''script.sh''', {{{ [fuji@cypress1 JobDependencies]$ ./SubmitDependentJobs.sh script.sh script.sh script.sh script.sh script.sh script.sh script.sh script.sh script.sh script.sh 774001 submitted 774002 submitted 774003 submitted 774004 submitted 774005 submitted 774006 submitted 774007 submitted 774008 submitted 774009 submitted 774010 submitted }}} List jobs, {{{ [fuji@cypress1 JobDependencies]$ squeue -u fuji JOBID QOS NAME USER ST TIME NO NODELIST(REASON) 774005 worksh script.sh fuji PD 0:00 1 (Dependency) 774006 worksh script.sh fuji PD 0:00 1 (Dependency) 774007 worksh script.sh fuji PD 0:00 1 (Dependency) 774008 worksh script.sh fuji PD 0:00 1 (Dependency) 774009 worksh script.sh fuji PD 0:00 1 (Dependency) 774010 worksh script.sh fuji PD 0:00 1 (Dependency) 774004 worksh script.sh fuji R 0:01 1 cypress01-117 }}}