| | 1 | = Job Dependency = |
| | 2 | If you haven't done yet, download Samples by: |
| | 3 | |
| | 4 | {{{svn co file:///home/fuji/repos/workshop ./workshop}}} |
| | 5 | |
| | 6 | Checkout Sample files onto local machine, (linux shell) |
| | 7 | |
| | 8 | {{{svn co svn+ssh://USERID@cypress1.tulane.edu/home/fuji/repos/workshop ./workshop}}} |
| | 9 | |
| | 10 | |
| | 11 | ---- |
| | 12 | |
| | 13 | Job dependencies are used to defer the start of a job until the specified dependencies have been satisfied. They are specified with the '''--dependency''' option to '''sbatch''' command. |
| | 14 | {{{ |
| | 15 | sbatch --dependency=<type:job_id[:job_id][,type:job_id[:job_id]]> ... |
| | 16 | }}} |
| | 17 | |
| | 18 | Dependency types: |
| | 19 | * '''after:jobid[:jobid...]''' job can begin after the specified jobs have started |
| | 20 | * '''afterany:jobid[:jobid...]''' job can begin after the specified jobs have terminated |
| | 21 | * '''afternotok:jobid[:jobid...]''' job can begin after the specified jobs have failed |
| | 22 | * '''afterok:jobid[:jobid...]''' job can begin after the specified jobs have run to completion with an exit code of zero (see the user guide for caveats). |
| | 23 | |
| | 24 | |
| | 25 | == Submitting Dependent Jobs == |
| | 26 | Get into '''JobDependencies''' directory under '''workshop''', |
| | 27 | {{{ |
| | 28 | [fuji@cypress1 ~]$ cd workshop/ |
| | 29 | [fuji@cypress1 workshop]$ cd JobDependencies |
| | 30 | [fuji@cypress1 JobDependencies]$ ls |
| | 31 | addOne.py number.dat script.sh slurmscript SubmitDependentJobs.sh |
| | 32 | }}} |
| | 33 | |
| | 34 | Python code addOne.py reads '''number.dat''' and gets an integer number, and then adds one and stores it back to '''number.dat''' . |
| | 35 | {{{ |
| | 36 | [fuji@cypress1 JobDependencies]$ cat addOne.py |
| | 37 | # HELLO PYTHON |
| | 38 | import datetime |
| | 39 | import socket |
| | 40 | |
| | 41 | now = datetime.datetime.now() |
| | 42 | print 'Hello, world!' |
| | 43 | print now.isoformat() |
| | 44 | print socket.gethostname() |
| | 45 | # |
| | 46 | with open('number.dat','r') as f: |
| | 47 | data = f.readline() |
| | 48 | number = int(data) |
| | 49 | # |
| | 50 | print "Number = %d" % number |
| | 51 | with open('number.dat','w') as f: |
| | 52 | f.write(str(number + 1)) |
| | 53 | # |
| | 54 | }}} |
| | 55 | |
| | 56 | |
| | 57 | '''slurmscipt''' just run the code, |
| | 58 | {{{ |
| | 59 | [fuji@cypress1 JobDependencies]$ cat slurmscript |
| | 60 | #!/bin/bash |
| | 61 | #SBATCH --qos=workshop # Quality of Service |
| | 62 | #SBATCH --partition=workshop # partition |
| | 63 | #SBATCH --job-name=python # Job Name |
| | 64 | #SBATCH --time=00:01:00 # WallTime |
| | 65 | #SBATCH --nodes=1 # Number of Nodes |
| | 66 | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
| | 67 | #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) |
| | 68 | |
| | 69 | module load anaconda |
| | 70 | python addOne.py |
| | 71 | |
| | 72 | sleep 10 |
| | 73 | }}} |
| | 74 | |
| | 75 | Let submit one job and then submit anther job that depends on the first jobs as, |
| | 76 | {{{ |
| | 77 | [fuji@cypress1 JobDependencies]$ sbatch slurmscript |
| | 78 | Submitted batch job 773997 |
| | 79 | [fuji@cypress1 JobDependencies]$ sbatch --dependency=afterok:773997 slurmscript |
| | 80 | Submitted batch job 773998 |
| | 81 | }}} |
| | 82 | |
| | 83 | List the jobs, |
| | 84 | {{{ |
| | 85 | [fuji@cypress1 JobDependencies]$ squeue -u fuji |
| | 86 | JOBID QOS NAME USER ST TIME NO NODELIST(REASON) |
| | 87 | 773998 worksh python fuji PD 0:00 1 (Dependency) |
| | 88 | 773997 worksh python fuji R 0:00 1 cypress01-117 |
| | 89 | }}} |
| | 90 | |
| | 91 | After the first job completed, the second job begin to run, |
| | 92 | {{{ |
| | 93 | [fuji@cypress1 JobDependencies]$ squeue -u fuji |
| | 94 | JOBID QOS NAME USER ST TIME NO NODELIST(REASON) |
| | 95 | 773998 worksh python fuji R 0:05 1 cypress01-117 |
| | 96 | }}} |
| | 97 | |
| | 98 | The results are: |
| | 99 | {{{ |
| | 100 | [fuji@cypress1 JobDependencies]$ ls |
| | 101 | addOne.py number.dat script.sh slurm-773997.out slurm-773998.out slurmscript SubmitDependentJobs.sh |
| | 102 | [fuji@cypress1 JobDependencies]$ cat slurm-773997.out |
| | 103 | Hello, world! |
| | 104 | 2018-08-22T14:55:37.421310 |
| | 105 | cypress01-117 |
| | 106 | Number = 2 |
| | 107 | [fuji@cypress1 JobDependencies]$ cat slurm-773998.out |
| | 108 | Hello, world! |
| | 109 | 2018-08-22T14:55:47.619183 |
| | 110 | cypress01-117 |
| | 111 | Number = 3 |
| | 112 | }}} |