| 94 | |
| 95 | === MPI Jobs === |
| 96 | |
| 97 | Now let’s look at how to run an MPI based job across multiple nodes. SLURM does a nice job of interfacing with the mpirun command to minimize the amount of information the user needs to provide. For instance, SLURM will automatically provide a hostlist and the number of processes based on the script directives provided by the user. |
| 98 | |
| 99 | Let’s say that we would like to run an MPI based executable named myMPIexecutable. Let’s further suppose that we wished to run it using a total of 80 MPI processes. Recall that each node of Cypress is equipped with two Intel Xeon 10 core processors. Then a natural way of breaking up our problem would be to run it on four nodes using 20 processes per core. Here we run into the semantics of SLURM. We would ask SLURM for four nodes and 20 “tasks” per node. |
| 100 | {{{#!bash |
| 101 | #!/bin/bash |
| 102 | #SBATCH --qos=normal |
| 103 | #SBATCH --job-name=MPI_JOB |
| 104 | #SBATCH --time=0-01:00:00 |
| 105 | #SBATCH --output=MPIoutput.out |
| 106 | #SBATCH --error=MPIerror.err |
| 107 | #SBATCH --nodes=4 |
| 108 | #SBATCH --ntasks-per-node=20 |
| 109 | |
| 110 | module load intel-psxe/2015-update1 |
| 111 | |
| 112 | ############ THE JOB ITSELF ############################# |
| 113 | echo Start Job |
| 114 | |
| 115 | echo nodes: $SLURM_JOB_NODELIST |
| 116 | echo job id: $SLURM_JOB_ID |
| 117 | echo Number of tasks: $SLURM_NTASKS |
| 118 | |
| 119 | mpirun myMPIexecutable |
| 120 | |
| 121 | echo End Job |
| 122 | }}} |
| 123 | |
| 124 | Again, notice that we did not need to feed any of the usual information to mpirun regarding the number of processes, hostfiles, etc. as this is handled automatically by SLURM. Another thing to note is the loading the intel-psxe (parallel studio) module. This loads the Intel instantiation of MPI including mpirun. If you would like to use OpenMPI then you should load the openmpi/gcc/64/1.8.2-mlnx-ofed2 module or one of the other OpenMPI versions currently available on Cypress. We also take advantage of a couple of SLURMS output environment variables to automate our record keeping. Now, a record of what nodes we ran on, our job ID, and the number of tasks used will be written to the MPIoutput.out file. While this is certainly not necessary, it often pays dividends when errors arise. |
| 125 | |
| 126 | |
| 127 | |
| 128 | |
| 129 | |
| 130 | === OpenMP Jobs === |
| 131 | |
| 132 | When running OpenMP (OMP) jobs on Cypress, it’s necessary to set your environment variables to reflect the resources you’ve requested. Specifically, you must export the variable OMP_NUM_THREAS so that its value matches the number of cores you have requested from SLURM. This can be accomplished through the use of SLURMS built in export environment variables. |
| 133 | |
| 134 | {{{#!bash |
| 135 | #!/bin/bash |
| 136 | #SBATCH --qos=normal |
| 137 | #SBATCH --job-name=OMP_JOB |
| 138 | #SBATCH --time=1-00:00:00 |
| 139 | #SBATCH --nodes=1 |
| 140 | #SBATCH --ntasks-per-node=1 |
| 141 | #SBATCH --cpus-per-task=20 |
| 142 | |
| 143 | export OMP_NUM_THREADS=$SLURM_JOB_CPUS_PER_NODE |
| 144 | |
| 145 | ./myOMPexecutable |
| 146 | }}} |
| 147 | |
| 148 | |
| 149 | In the script above we request 20 cores on one node of Cypress (which is all the cores available on any node). As SLURM regards tasks as being analogous to MPI processes, it’s better to use the cpus-per-task directive when employing OpenMP parallelism. Additionally, the SLURM export variable $SLURM_JOB_CPUS_PER_NODE stores whatever value we assign to cpus-per-task, and is therefore our candidate for passing to OMP_NUM_THREADS. |
| 150 | |
| 151 | |