| 1 | = Work with SLURM on Cypress = |
| 2 | If you haven't done yet, download Samples by: |
| 3 | |
| 4 | {{{svn co file:///home/fuji/repos/workshop ./workshop}}} |
| 5 | |
| 6 | Checkout Sample files onto local machine, (linux shell) |
| 7 | |
| 8 | {{{svn co svn+ssh://USERID@cypress1.tulane.edu/home/fuji/repos/workshop ./workshop}}} |
| 9 | |
| 10 | |
| 11 | == Introduction to Managed Cluster Computing == |
| 12 | On your desktop you would open a terminal, compile the code using your favorite c compiler and execute the code. You can do this without worry as you are the only person using your computer and you know what demands are being made on your CPU and memory at the time you run your code. On a cluster, many users must share the available resources equitably and simultaneously. It's the job of the resource manager to choreograph this sharing of resources by accepting a description of your program and the resources it requires, searching the available hardware for resources that meet your requirements, and making sure that no one else is given those resources while you are using them. |
| 13 | |
| 14 | Occasionally the manager will be unable to find the resources you need due to usage by other user. In those instances your job will be "queued", that is the manager will wait until the needed resources become available before running your job. This will also occur if the total resources you request for all your jobs exceed the limits set by the cluster administrator. This ensures that all users have equal access to the cluster. |
| 15 | |
| 16 | [[Image(https://docs.google.com/drawings/d/e/2PACX-1vQL7pibkwB5EK2z6d2I9wIu28baQt8Mu3U4FCfwOttWncEwurGa8r-sP2wQxNA1no0j_ik3bVV5s0X8/pub?w=480&h=360)]] |
| 17 | |
| 18 | == Serial Job Submission == |
| 19 | Under 'workshop' directory, |
| 20 | {{{ |
| 21 | [fuji@cypress1 ~]$ cd workshop |
| 22 | [fuji@cypress1 workshop]$ ls |
| 23 | BlasLapack Eigen3 HeatMass JobArray1 JobDependencies MPI PETSc precision Python ScaLapack SimpleExample TestCodes uBLAS |
| 24 | CUDA FlowInCavity hybridTest JobArray2 Matlab OpenMP PI PSE R SerialJob SLU40 TextFiles VTK |
| 25 | }}} |
| 26 | |
| 27 | Under 'SerialJob' directory, |
| 28 | {{{ |
| 29 | [fuji@cypress1 workshop]$ cd SerialJob |
| 30 | [fuji@cypress1 SerialJob]$ ls |
| 31 | hello.py slurmscript1 slurmscript2 |
| 32 | }}} |
| 33 | |
| 34 | When your code runs on a single core only, your job-script should request a single core. The python code 'hello.py' runs on a single core that is, |
| 35 | {{{ |
| 36 | # HELLO PYTHON |
| 37 | import datetime |
| 38 | import socket |
| 39 | |
| 40 | now = datetime.datetime.now() |
| 41 | print 'Hello, world!' |
| 42 | print now.isoformat() |
| 43 | print socket.gethostname() |
| 44 | }}} |
| 45 | |
| 46 | Since this runs for a short time, you can try running it on the login node. |
| 47 | {{{ |
| 48 | [fuji@cypress1 SerialJob]$ python ./hello.py |
| 49 | Hello, world! |
| 50 | 2018-08-22T11:46:05.394952 |
| 51 | cypress1 |
| 52 | }}} |
| 53 | This code print a message, time, and the host name. |
| 54 | |
| 55 | Look at 'slurmscript1' |
| 56 | {{{ |
| 57 | [fuji@cypress1 SerialJob]$ more slurmscript1 |
| 58 | #!/bin/bash |
| 59 | #SBATCH --qos=workshop # Quality of Service |
| 60 | #SBATCH --partition=workshop # partition |
| 61 | #SBATCH --job-name=python # Job Name |
| 62 | #SBATCH --time=00:01:00 # WallTime |
| 63 | #SBATCH --nodes=1 # Number of Nodes |
| 64 | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
| 65 | #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) |
| 66 | |
| 67 | module load anaconda |
| 68 | python hello.py |
| 69 | }}} |
| 70 | |
| 71 | Notice that the SLURM script begins with '''#!/bin/bash'''. This tells the Linux shell what flavor shell interpreter to run. In this example we use BASh (Bourne Again Shell). |
| 72 | The choice of interpreter (and subsequent syntax) is up to the user, but every SLURM script should begin this way. |
| 73 | |
| 74 | For Bash and Shell Script, see |
| 75 | [https://en.wikibooks.org/wiki/Bash_Shell_Scripting] |
| 76 | |
| 77 | In Bash Shell Script, '''#''' and the strings after it are comments. |
| 78 | So all '''#SBATCH''' things in the script above are comments for Bash, |
| 79 | but those are directives for '''SLURM''' job scheduler. |
| 80 | |
| 81 | === qos, partition === |
| 82 | Those two lines determine the quality of service and the partition. |
| 83 | {{{ |
| 84 | #SBATCH --qos=workshop # Quality of Service |
| 85 | #SBATCH --partition=workshop # partition |
| 86 | }}} |
| 87 | The default partition is '''defq'''. In '''defq''', you can chose either '''normal''' or '''long''' for '''qos'''. |
| 88 | ||||||||= '''QOS limits''' =|| |
| 89 | || '''QOS name''' || '''maximum job size (node-hours)''' || '''maximum walltime per job''' || '''maximum nodes per user''' || |
| 90 | || normal || N/A ||24 hours || 18 || |
| 91 | || long || 168 ||168 hours || 8 || |
| 92 | |
| 93 | The differences between '''normal''' and '''long''' are the number of nodes you can request and duration you can run your code. |
| 94 | The details will be explained in Parallel Jobs below. |
| 95 | |
| 96 | If you are using a workshop account, you can use only '''workshop''' qos and partition. |
| 97 | |
| 98 | === job-name === |
| 99 | {{{ |
| 100 | #SBATCH --job-name=python # Job Name |
| 101 | }}} |
| 102 | This is the job name that you can specify as you like. |
| 103 | |
| 104 | === time === |
| 105 | {{{ |
| 106 | #SBATCH --time=00:01:00 # WallTime |
| 107 | }}} |
| 108 | The maximum walltime is specified by #SBATCH --time=T, where T has format h:m:s. |
| 109 | Normally, a job is expected to finish before the specified maximum walltime. |
| 110 | After the walltime reaches the maximum, the job terminates regardless whether the job processes are still running or not. |
| 111 | |
| 112 | === Resource Rwquest === |
| 113 | {{{ |
| 114 | #SBATCH --nodes=1 # Number of Nodes |
| 115 | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
| 116 | #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) |
| 117 | }}} |
| 118 | |
| 119 | The resource request '''#SBATCH --nodes=N''' determines how many compute nodes a job are allocated by the scheduler; only 1 node is allocated for this job. |
| 120 | |
| 121 | '''#SBATCH --ntasks-per-node=n''' determines the number of tasks for MPI jobs. The details will be explained in Parallel Jobs below. |
| 122 | |
| 123 | '''#SBATCH --cpus-per-task=c''' determines the number of cores/threads for a task. The details will be explained in Parallel Jobs below. |
| 124 | |
| 125 | |
| 126 | |
| 127 | |
| 128 | |
| 129 | |
| 130 | This script requests one core on one node. |
| 131 | |
| 132 | [[Image(https://docs.google.com/drawings/d/e/2PACX-1vSlffILDUxxzh_QpD4M7P5-bY_tCkYNjA9xIYWuUUqz_HBBczQ18o5AWA9OZ5_w5Q0bwQJbdgmUCuMJ/pub?w=594&h=209)]] |
| 133 | |
| 134 | There are 124 nodes on Cypress system. Each node has 20 cores. |
| 135 | |
| 136 | [[Image(https://docs.google.com/drawings/d/e/2PACX-1vQR7ztCNSIQhIjyW28FyYaQn92XC4Zq_vZzoPwALkywmXoyRl8qC2MEpT1t68zMopZv2yHNt2unMf-i/pub?w=155&h=134)]] |