| 87 | Starting with version 2.14.0, R has offered direct support for parallel computation through the "parallel" package. We will present two examples of running a parallel job of BATCH mode. They differ in the ways in which they communicate the number of cores reserved by SLURM to R. Both are based on code found in [https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf "Getting Started with doParallel and foreach" by Steve Weston and Rich Calaway] and modified by [https://rcc.uchicago.edu/docs/software/environments/R/index.html The University of Chicago Resource Computing Center]. |
| 88 | |
| 89 | In the first example, we will use the built in R function '''Sys.getenv( )''' to get the SLURM environment variable from the operating system. |
| 90 | |
| 91 | {{{#!r |
| 92 | #Based on code from the UCRCC website |
| 93 | |
| 94 | library(doParallel) |
| 95 | |
| 96 | # use the environment variable SLURM_NTASKS_PER_NODE to set the number of cores |
| 97 | registerDoParallel(cores=(Sys.getenv("SLURM_NTASKS_PER_NODE"))) |
| 98 | |
| 99 | # Bootstrapping iteration example |
| 100 | x <- iris[which(iris[,5] != "setosa"), c(1,5)] |
| 101 | iterations <- 10000# Number of iterations to run |
| 102 | |
| 103 | # Parallel version of code |
| 104 | # Note the '%dopar%' instruction |
| 105 | part <- system.time({ |
| 106 | r <- foreach(icount(iterations), .combine=cbind) %dopar% { |
| 107 | ind <- sample(100, 100, replace=TRUE) |
| 108 | result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit)) |
| 109 | coefficients(result1) |
| 110 | } |
| 111 | })[3] |
| 112 | |
| 113 | # Shows the number of Parallel Workers to be used |
| 114 | getDoParWorkers() |
| 115 | # Executes the functions |
| 116 | part |
| 117 | }}} |
| 118 | |
| 119 | This script will obtain the number of tasks per node set in our SLURM script and will pass that value to the '''registerDoParallel( )''' function. To implement this we need only set the correct parameters in our SLURM script. Suppose we wanted to use 16 cores. Then the correct script would be |
| 120 | |
| 121 | {{{#!bash |
| 122 | #!/bin/bash |
| 123 | #SBATCH --qos=workshop # Quality of Service |
| 124 | #SBATCH --partition=workshop # Partition |
| 125 | #SBATCH --job-name=R # Job Name |
| 126 | #SBATCH --time=00:01:00 # WallTime |
| 127 | #SBATCH --nodes=1 # Number of Nodes |
| 128 | #SBATCH --ntasks-per-node=16 # Number of Tasks per Node |
| 129 | |
| 130 | module load R/3.1.2 |
| 131 | |
| 132 | Rscript bootstrap.R |
| 133 | }}} |
| 134 | |
| 135 | The disadvantage of this approach is that it is system specific. If we move our code to a machine that uses PBS-Torque as it's manager (sphynx for example) we have to change our source code. An alternative method that results in a more portable code base uses command line arguments to pass the value of our environment variables into the script. |
| 136 | |
| 137 | {{{#!r |
| 138 | #Based on code from the UCRCC website |
| 139 | |
| 140 | library(doParallel) |
| 141 | # Enable command line arguments |
| 142 | args<-commandArgs(TRUE) |
| 143 | |
| 144 | # use the environment variable SLURM_NTASKS_PER_NODE to set the number of cores |
| 145 | registerDoParallel(cores=(as.integer(args[1]))) |
| 146 | |
| 147 | # Bootstrapping iteration example |
| 148 | x <- iris[which(iris[,5] != "setosa"), c(1,5)] |
| 149 | iterations <- 10000# Number of iterations to run |
| 150 | |
| 151 | # Parallel version of code |
| 152 | # Note the '%dopar%' instruction |
| 153 | part <- system.time({ |
| 154 | r <- foreach(icount(iterations), .combine=cbind) %dopar% { |
| 155 | ind <- sample(100, 100, replace=TRUE) |
| 156 | result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit)) |
| 157 | coefficients(result1) |
| 158 | } |
| 159 | })[3] |
| 160 | |
| 161 | # Shows the number of Parallel Workers to be used |
| 162 | getDoParWorkers() |
| 163 | # Executes the functions |
| 164 | part |
| 165 | }}} |
| 166 | |
| 167 | Note the use of '''args<-commandArgs(TRUE)''' and of '''as.integer(args[1])'''. This allows us to pass in a value from the command line when we call the script and the number of cores will be set to that value. Using the same basic submission script as last time, we need only pass the value of the correct SLRUM environment variable to the script at runtime. |
| 168 | |
| 169 | {{{#!bash |
| 170 | #!/bin/bash |
| 171 | #SBATCH --qos=workshop # Quality of Service |
| 172 | #SBATCH --partition=workshop # Partition |
| 173 | #SBATCH --job-name=R # Job Name |
| 174 | #SBATCH --time=00:01:00 # WallTime |
| 175 | #SBATCH --nodes=1 # Number of Nodes |
| 176 | #SBATCH --ntasks-per-node=16 # Number of Tasks per Node |
| 177 | |
| 178 | module load R/3.1.2 |
| 179 | |
| 180 | Rscript bootstrap.R $SLURM_TASKS_PER_NODE |
| 181 | }}} |
| 182 | |
| 183 | Not that since we did not specify an output file, the output will be written to slurm-<JobNumber>.out. For example: |
| 184 | |
| 185 | {{{ |
| 186 | [cmaggio@cypress1 ~]$ sbatch RsubmissionWargs.srun |
| 187 | Submitted batch job 52481 |
| 188 | [tulaneID@cypress1 ~]$ cat slurm-52481.out |
| 189 | Loading required package: foreach |
| 190 | Loading required package: iterators |
| 191 | Loading required package: parallel |
| 192 | [1] "16" |
| 193 | elapsed |
| 194 | 3.282 |
| 195 | [tulaneID@cypress1 ~]$ |
| 196 | }}} |