Changes between Version 15 and Version 16 of cypress/R
- Timestamp:
- 08/22/18 19:06:05 (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
cypress/R
v15 v16 39 39 40 40 {{{ 41 [tulaneID@cypress01-035 pp-1.6.4]$ module load R/3. 1.241 [tulaneID@cypress01-035 pp-1.6.4]$ module load R/3.2.4 42 42 [tulaneID@cypress01-035 pp-1.6.4]$ module list 43 43 Currently Loaded Modulefiles: 44 1) git/2.4.1 3) idev 5) R/3. 1.244 1) git/2.4.1 3) idev 5) R/3.2.4 45 45 2) slurm/14.03.0 4) bbcp/amd64_rhel60 46 46 }}} … … 51 51 [tulaneID@cypress01-035 pp-1.6.4]$R 52 52 53 R version 3. 1.2 (2014-10-31) -- "Pumpkin Helmet"54 Copyright (C) 201 4The R Foundation for Statistical Computing55 Platform: x86_64- unknown-linux-gnu (64-bit)53 R version 3.2.4 (2016-03-10) -- "Very Secure Dishes" 54 Copyright (C) 2016 The R Foundation for Statistical Computing 55 Platform: x86_64-pc-linux-gnu (64-bit) 56 56 57 57 R is free software and comes with ABSOLUTELY NO WARRANTY. … … 69 69 Type 'q()' to quit R. 70 70 71 > 72 71 > 73 72 }}} 74 73 … … 86 85 #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) 87 86 88 module load R/3. 1.287 module load R/3.2.4 89 88 Rscript myRscript.R 90 89 }}} 91 90 91 '''For Workshop''' : 92 If you use a temporary workshop account, modify the SLURM script like: 93 {{{#!bash 94 #!/bin/bash 95 #SBATCH --partition=workshop # Partition 96 #SBATCH --qos=workshop # Quality of Service 97 ##SBATCH --qos=normal ### Quality of Service (like a queue in PBS) 98 #SBATCH --job-name=R # Job Name 99 #SBATCH --time=00:01:00 # WallTime 100 #SBATCH --nodes=1 # Number of Nodes 101 #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) 102 #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) 103 104 module load R/3.2.4 105 Rscript myRscript.R 106 }}} 107 92 108 == Running a Parallel R Job == 93 109 94 110 Starting with version 2.14.0, R has offered direct support for parallel computation through the "parallel" package. We will present two examples of running a parallel job of BATCH mode. They differ in the ways in which they communicate the number of cores reserved by SLURM to R. Both are based on code found in [https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf "Getting Started with doParallel and foreach" by Steve Weston and Rich Calaway] and modified by [https://rcc.uchicago.edu/docs/software/environments/R/index.html The University of Chicago Resource Computing Center]. 95 111 96 In the first example, we will use the built in R function '''Sys.getenv( )''' to get the SLURM environment variable from the operating system. 112 === Passing (SLURM) Environment Variables === 113 114 In the first example, we will use the built in R function '''Sys.getenv( )''' to get the SLURM environment variable from the operating system. 115 116 Edit the new file '''bootstrap.R''' to contain the following code. 97 117 98 118 {{{#!r … … 124 144 }}} 125 145 126 Th is script will obtain the number of tasks per node set in our SLURM script and will pass that value to the '''registerDoParallel( )''' function. To implement this we need only set the correct parameters in our SLURM script. Suppose we wanted to use 16 cores. Then the correct script would be146 The above script will obtain the number of tasks per node via the SLURM environment variable '''SLURM_CPUS_PER_TASK''' set in our SLURM script and will pass that value to the '''registerDoParallel( )''' function. To implement this we need only set the correct parameters in our SLURM script. Suppose we wanted to use 16 cores. Then the correct SLURM script would be as follows. 127 147 128 148 {{{#!bash … … 135 155 #SBATCH --cpus-per-task=16 # Number of threads per task (OMP threads) 136 156 137 module load R/3.1.2 138 139 Rscript bootstrap.R $SLURM_CPUS_PER_TASK 140 }}} 141 142 The disadvantage of this approach is that it is system specific. If we move our code to a machine that uses PBS-Torque as it's manager (sphynx for example) we have to change our source code. An alternative method that results in a more portable code base uses command line arguments to pass the value of our environment variables into the script. 157 module load R/3.2.4 158 159 Rscript bootstrap.R 160 }}} 161 162 '''For Workshop''' : 163 If you use a temporary workshop account, modify the SLURM script like: 164 {{{#!bash 165 #!/bin/bash 166 #SBATCH --partition=workshop # Partition 167 #SBATCH --qos=workshop # Quality of Service 168 ##SBATCH --qos=normal ### Quality of Service (like a queue in PBS) 169 #SBATCH --job-name=R # Job Name 170 #SBATCH --time=00:01:00 # WallTime 171 #SBATCH --nodes=1 # Number of Nodes 172 #SBATCH --ntasks-per-node=1 # Number of Tasks per Node 173 #SBATCH --cpus-per-task=16 # Number of threads per task (OMP threads) 174 175 module load R/3.2.4 176 177 Rscript bootstrap.R 178 }}} 179 180 Edit the new file '''bootstrap.sh''' to contain the above SLURM script code. 181 182 === R Package Dependency === 183 184 So, for example, in order to run the above, we would submit the above SLURM script via '''sbatch''' command, but in general we would not expect this code to run the very first time with out additional setup. We can try running the above scripts thus far without any additional setup, but we can expect to get the error such as in the following R session. 185 186 {{{#!r 187 > library(doParallel) 188 Error in library(doParallel) : there is no package called ‘doParallel’ 189 }}} 190 191 Thus we should first ensure that the required R package, in this case the R package '''doParallel''', is available and installed in your environment. For a range of options for installing R packages - depending on the desired level of reproducibility, see the section [#InstallingRPackages Installing R Packages on Cypress]. 192 193 '''For Workshop''' : 194 If you use a temporary workshop account, use [#RPackageAlternative1 Alternative 1] for installing R packages. 195 196 Once we have resolved our package dependencies, we now expect the job to run without errors. 197 198 Also, note that since we did not specify an output file in the SLURM script, the output will be written to slurm-<!JobNumber>.out. For example: 199 200 {{{ 201 [tulaneID@cypress2 R]$ sbatch bootstrap.sh 202 Submitted batch job 774081 203 [tulaneID@cypress2 R]$ cat slurm-774081.out 204 Loading required package: foreach 205 Loading required package: iterators 206 Loading required package: parallel 207 [1] "16" 208 elapsed 209 2.954 210 [tulaneID@cypress2 R]$ 211 }}} 212 213 === Passing Parameters === 214 215 The disadvantage of the above approach is that it is system specific. If we move our code to a machine that uses PBS-Torque as it's manager (sphynx for example) we have to change our source code. An alternative method that results in a more portable code base uses command line arguments to pass the value of our environment variables into the script. 216 217 Edit the new file '''bootstrapWargs.R''' to contain the following code. 143 218 144 219 {{{#!r … … 183 258 #SBATCH --cpus-per-task=16 # Number of threads per task (OMP threads) 184 259 185 module load R/3. 1.2260 module load R/3.2.4 186 261 187 262 Rscript bootstrapWargs.R $SLURM_CPUS_PER_TASK 188 263 }}} 189 264 190 Not that since we did not specify an output file, the output will be written to slurm-<!JobNumber>.out. For example: 191 192 {{{ 193 [cmaggio@cypress1 ~]$ sbatch RsubmissionWargs.srun 265 '''For Workshop''' : 266 If you use a temporary workshop account, modify the SLURM script like: 267 {{{#!bash 268 #!/bin/bash 269 #SBATCH --partition=workshop # Partition 270 #SBATCH --qos=workshop # Quality of Service 271 ##SBATCH --qos=normal ### Quality of Service (like a queue in PBS) 272 #SBATCH --job-name=R # Job Name 273 #SBATCH --time=00:01:00 # WallTime 274 #SBATCH --nodes=1 # Number of Nodes 275 #SBATCH --ntasks-per-node=1 # Number of Tasks per Node 276 #SBATCH --cpus-per-task=16 # Number of threads per task (OMP threads) 277 278 module load R/3.2.4 279 280 Rscript bootstrapWargs.R $SLURM_CPUS_PER_TASK 281 }}} 282 283 Edit the new file '''bootstrapWargs.sh''' to contain the above SLURM script code. 284 285 Now submit as in the following. 286 287 {{{ 288 [tulaneID@cypress1 ~]$ sbatch bootstrapWargs.sh 194 289 Submitted batch job 52481 195 290 [tulaneID@cypress1 ~]$ cat slurm-52481.out … … 204 299 205 300 206 == Installing R Packages into user home directory or lustre sub-directory==301 == [=#InstallingRPackages Installing R Packages on Cypress] == 207 302 If you want to use some R packages that are not yet installed in your desired version of R on Cypress, 208 then you have several alternatives -depending on your desired level of reproducibility.209 210 === Alternative 1- default to home sub-directory ===303 then you have several alternatives, as prescribed below, for locations for installing those packages. Those locations include either your user home directory or lustre sub-directory, and the methods will vary depending on your desired level of reproducibility. 304 305 === [=#RPackageAlternative1 Alternative 1] - default to home sub-directory === 211 306 From your R session, you may choose to have R install its packages into a sub-directory under your home directory. 212 307 By default R will create such a sub-directory whose name corresponds to the R version of your current R session and install your packages there. … … 224 319 ~/R/x86_64-pc-linux-gnu-library/3.4 225 320 to install packages into? (y/n) y 226 ... 227 }}} 321 --- Please select a CRAN mirror for use in this session --- 322 PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused 323 HTTPS CRAN mirror 324 325 1: 0-Cloud [https] 2: Algeria [https] 326 ... 327 79: Vietnam [https] 80: (HTTP mirrors) 328 329 330 Selection: 77 331 ... 332 }}} 333 334 Note that the above example was performed without X11 forwarding, resulting in a prompt at the command line for selection of a CRAN mirror site in the above, at which point you should enter the number corresponding to the desired mirror site, e.g. '''77'''. 228 335 229 336 === Alternative 2 - specify your lustre sub-directory via exported environment variable ===