| 1 | = Job Dependency = |
| 2 | If you haven't done yet, download Samples by: |
| 3 | |
| 4 | {{{svn co file:///home/fuji/repos/workshop ./workshop}}} |
| 5 | |
| 6 | Checkout Sample files onto local machine, (linux shell) |
| 7 | |
| 8 | {{{svn co svn+ssh://USERID@cypress1.tulane.edu/home/fuji/repos/workshop ./workshop}}} |
| 9 | |
| 10 | |
| 11 | ---- |
| 12 | |
| 13 | Job dependencies are used to defer the start of a job until the specified dependencies have been satisfied. They are specified with the '''--dependency''' option to '''sbatch''' command. |
| 14 | {{{ |
| 15 | sbatch --dependency=<type:job_id[:job_id][,type:job_id[:job_id]]> ... |
| 16 | }}} |
| 17 | |
| 18 | Dependency types: |
| 19 | * '''after:jobid[:jobid...]''' job can begin after the specified jobs have started |
| 20 | * '''afterany:jobid[:jobid...]''' job can begin after the specified jobs have terminated |
| 21 | * '''afternotok:jobid[:jobid...]''' job can begin after the specified jobs have failed |
| 22 | * '''afterok:jobid[:jobid...]''' job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats). |
| 23 | |
| 24 | |
| 25 | == Submitting Dependent Jobs == |
| 26 | Get into '''JobDependencies''' directory under '''workshop''', |
| 27 | {{{ |
| 28 | [fuji@cypress1 ~]$ cd workshop/ |
| 29 | [fuji@cypress1 workshop]$ cd JobDependencies |
| 30 | [fuji@cypress1 JobDependencies]$ ls |
| 31 | addOne.py number.dat script.sh slurmscript SubmitDependentJobs.sh |
| 32 | }}} |
| 33 | |
| 34 | Python code addOne.py reads '''number.dat''' and gets an integer number, and then adds one and stores it back to '''number.dat''' . |
| 35 | {{{ |
| 36 | [fuji@cypress1 JobDependencies]$ cat addOne.py |
| 37 | # HELLO PYTHON |
| 38 | import datetime |
| 39 | import socket |
| 40 | |
| 41 | now = datetime.datetime.now() |
| 42 | print 'Hello, world!' |
| 43 | print now.isoformat() |
| 44 | print socket.gethostname() |
| 45 | # |
| 46 | with open('number.dat','r') as f: |
| 47 | data = f.readline() |
| 48 | number = int(data) |
| 49 | # |
| 50 | print "Number = %d" % number |
| 51 | with open('number.dat','w') as f: |
| 52 | f.write(str(number + 1)) |
| 53 | # |
| 54 | }}} |
| 55 | |
| 56 | |
| 57 | '''slurmscipt''' just run the code, |
| 58 | {{{ |
| 59 | [fuji@cypress1 JobDependencies]$ cat slurmscript |
| 60 | #!/bin/bash |
| 61 | #SBATCH --qos=workshop # Quality of Service |
| 62 | #SBATCH --partition=workshop # partition |
| 63 | #SBATCH --job-name=python # Job Name |
| 64 | #SBATCH --time=00:01:00 # WallTime |
| 65 | #SBATCH --nodes=1 # Number of Nodes |
| 66 | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
| 67 | #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) |
| 68 | |
| 69 | module load anaconda |
| 70 | python addOne.py |
| 71 | |
| 72 | sleep 10 |
| 73 | }}} |
| 74 | |
| 75 | Let submit one job and then submit anther job that depends on the first jobs as, |
| 76 | {{{ |
| 77 | [fuji@cypress1 JobDependencies]$ sbatch slurmscript |
| 78 | Submitted batch job 773997 |
| 79 | [fuji@cypress1 JobDependencies]$ sbatch --dependency=afterok:773997 slurmscript |
| 80 | Submitted batch job 773998 |
| 81 | }}} |
| 82 | |
| 83 | List the jobs, |
| 84 | {{{ |
| 85 | [fuji@cypress1 JobDependencies]$ squeue -u fuji |
| 86 | JOBID QOS NAME USER ST TIME NO NODELIST(REASON) |
| 87 | 773998 worksh python fuji PD 0:00 1 (Dependency) |
| 88 | 773997 worksh python fuji R 0:00 1 cypress01-117 |
| 89 | }}} |
| 90 | |
| 91 | After the first job completed, the second job begin to run, |
| 92 | {{{ |
| 93 | [fuji@cypress1 JobDependencies]$ squeue -u fuji |
| 94 | JOBID QOS NAME USER ST TIME NO NODELIST(REASON) |
| 95 | 773998 worksh python fuji R 0:05 1 cypress01-117 |
| 96 | }}} |
| 97 | |
| 98 | The results are: |
| 99 | {{{ |
| 100 | [fuji@cypress1 JobDependencies]$ ls |
| 101 | addOne.py number.dat script.sh slurm-773997.out slurm-773998.out slurmscript SubmitDependentJobs.sh |
| 102 | [fuji@cypress1 JobDependencies]$ cat slurm-773997.out |
| 103 | Hello, world! |
| 104 | 2018-08-22T14:55:37.421310 |
| 105 | cypress01-117 |
| 106 | Number = 2 |
| 107 | [fuji@cypress1 JobDependencies]$ cat slurm-773998.out |
| 108 | Hello, world! |
| 109 | 2018-08-22T14:55:47.619183 |
| 110 | cypress01-117 |
| 111 | Number = 3 |
| 112 | }}} |