Version 18 (modified by 5 years ago) ( diff ) | ,
---|
Running R on Cypress
R Modules
As of June 7, 2019 the following versions of R are available on Cypress as modules.
- R/3.1.2(default)
- R/3.2.4
- R/3.2.5-intel
- R/3.3.1-intel
- R/3.4.1-intel
- R/3.5.2-intel
Running R Interactively
For Workshop
If you use a temporary workshop account, do this.
export MY_PARTITION=workshop export MY_QUEUE=workshop
Start an interactive session using idev
[tulaneID@cypress1 ~]$ idev Requesting 1 node(s) task(s) to workshop queue of workshop partition 1 task(s)/node, 20 cpu(s)/task, 2 MIC device(s)/node Time: 0 (hr) 60 (min). Submitted batch job 1164332 JOBID=1164332 begin on cypress01-121 --> Creating interactive terminal session (login) on node cypress01-121. --> You have 0 (hr) 60 (min). --> Assigned Host List : /tmp/idev_nodes_file_tuhpc002 Last login: Wed Aug 21 15:56:37 2019 from cypress1.cm.cluster [tulaneID@cypress01-121 ~]$
For Workshop
If you use a temporary workshop account, in order to use only 2 cpu's per node and thus allow for sharing of nodes among many users, do this.
[tulaneID@cypress1 ~]$ idev -c 2
Load the R module
[tulaneID@cypress01-121 ~]$ module load R/3.2.4 [tulaneID@cypress01-121 ~]$ module list Currently Loaded Modulefiles: 1) slurm/14.03.0 3) bbcp/amd64_rhel60 2) idev 4) R/3.2.4
Run R in the command line window
[tulaneID@cypress01-121 ~]$R R version 3.2.4 (2016-03-10) -- "Very Secure Dishes" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >
Running a R script in Batch mode
You can also submit your R job to the batch nodes (compute nodes) on Cypress. Inside your SLURM script, include a command to load the desired R module. Then invoke the Rscript command on your R script.
#!/bin/bash #SBATCH --qos=normal # Quality of Service #SBATCH --job-name=R # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) module load R/3.2.4 Rscript myRscript.R
For Workshop : If you use a temporary workshop account, modify the SLURM script like:
#!/bin/bash #SBATCH --partition=workshop # Partition #SBATCH --qos=workshop # Quality of Service ##SBATCH --qos=normal ### Quality of Service (like a queue in PBS) #SBATCH --job-name=R # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) module load R/3.2.4 Rscript myRscript.R
Running a Parallel R Job
Starting with version 2.14.0, R has offered direct support for parallel computation through the "parallel" package. We will present two examples of running a parallel job of BATCH mode. They differ in the ways in which they communicate the number of cores reserved by SLURM to R. Both are based on code found in "Getting Started with doParallel and foreach" by Steve Weston and Rich Calaway and modified by The University of Chicago Resource Computing Center.
Passing (SLURM) Environment Variables
In the first example, we will use the built in R function Sys.getenv( ) to get the SLURM environment variable from the operating system.
Edit the new file bootstrap.R to contain the following code.
#Based on code from the UCRCC website library(doParallel) # use the environment variable SLURM_CPUS_PER_TASK to set the number of cores registerDoParallel(cores=(Sys.getenv("SLURM_CPUS_PER_TASK"))) # Bootstrapping iteration example x <- iris[which(iris[,5] != "setosa"), c(1,5)] iterations <- 10000# Number of iterations to run # Parallel version of code # Note the '%dopar%' instruction part <- system.time({ r <- foreach(icount(iterations), .combine=cbind) %dopar% { ind <- sample(100, 100, replace=TRUE) result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit)) coefficients(result1) } })[3] # Shows the number of Parallel Workers to be used getDoParWorkers() # Executes the functions part
The above script will obtain the number of tasks per node via the SLURM environment variable SLURM_CPUS_PER_TASK set in our SLURM script and will pass that value to the registerDoParallel( ) function. To implement this we need only set the correct parameters in our SLURM script. Suppose we wanted to use 16 cores. Then the correct SLURM script would be as follows.
#!/bin/bash #SBATCH --qos=normal # Quality of Service #SBATCH --job-name=R # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of Tasks per Node #SBATCH --cpus-per-task=16 # Number of threads per task (OMP threads) module load R/3.2.4 Rscript bootstrap.R
For Workshop : If you use a temporary workshop account, modify the SLURM script like:
#!/bin/bash #SBATCH --partition=workshop # Partition #SBATCH --qos=workshop # Quality of Service ##SBATCH --qos=normal ### Quality of Service (like a queue in PBS) #SBATCH --job-name=R # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of Tasks per Node #SBATCH --cpus-per-task=16 # Number of threads per task (OMP threads) module load R/3.2.4 Rscript bootstrap.R
Edit the new file bootstrap.sh to contain the above SLURM script code.
R Package Dependency
So, for example, in order to run the above, we would submit the above SLURM script via sbatch command, but in general we would not expect this code to run the very first time with out additional setup. We can try running the above scripts thus far without any additional setup, but we can expect to get the error such as in the following R session.
> library(doParallel) Error in library(doParallel) : there is no package called ‘doParallel’
Thus we should first ensure that the required R package, in this case the R package doParallel, is available and installed in your environment. For a range of options for installing R packages - depending on the desired level of reproducibility, see the section Installing R Packages on Cypress.
For Workshop : If you use a temporary workshop account, use Alternative 1 for installing R packages.
Once we have resolved our package dependencies, we now expect the job to run without errors.
Also, note that since we did not specify an output file in the SLURM script, the output will be written to slurm-<JobNumber>.out. For example:
[tulaneID@cypress2 R]$ sbatch bootstrap.sh Submitted batch job 774081 [tulaneID@cypress2 R]$ cat slurm-774081.out Loading required package: foreach Loading required package: iterators Loading required package: parallel [1] "16" elapsed 2.954 [tulaneID@cypress2 R]$
Passing Parameters
The disadvantage of the above approach is that it is system specific. If we move our code to a machine that uses PBS-Torque as it's manager (sphynx for example) we have to change our source code. An alternative method that results in a more portable code base uses command line arguments to pass the value of our environment variables into the script.
Edit the new file bootstrapWargs.R to contain the following code.
#Based on code from the UCRCC website library(doParallel) # Enable command line arguments args<-commandArgs(TRUE) # use the first command line argument to set the number of cores registerDoParallel(cores=(as.integer(args[1]))) # Bootstrapping iteration example x <- iris[which(iris[,5] != "setosa"), c(1,5)] iterations <- 10000# Number of iterations to run # Parallel version of code # Note the '%dopar%' instruction part <- system.time({ r <- foreach(icount(iterations), .combine=cbind) %dopar% { ind <- sample(100, 100, replace=TRUE) result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit)) coefficients(result1) } })[3] # Shows the number of Parallel Workers to be used getDoParWorkers() # Executes the functions part
Note the use of args←commandArgs(TRUE) and of as.integer(args[1]). This allows us to pass in a value from the command line when we call the script and the number of cores will be set to that value. Using the same basic submission script as last time, we need only pass the value of the correct SLRUM environment variable to the script at runtime.
#!/bin/bash #SBATCH --qos=normal # Quality of Service #SBATCH --job-name=R # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of Tasks per Node #SBATCH --cpus-per-task=16 # Number of threads per task (OMP threads) module load R/3.2.4 Rscript bootstrapWargs.R $SLURM_CPUS_PER_TASK
For Workshop : If you use a temporary workshop account, modify the SLURM script like:
#!/bin/bash #SBATCH --partition=workshop # Partition #SBATCH --qos=workshop # Quality of Service ##SBATCH --qos=normal ### Quality of Service (like a queue in PBS) #SBATCH --job-name=R # Job Name #SBATCH --time=00:01:00 # WallTime #SBATCH --nodes=1 # Number of Nodes #SBATCH --ntasks-per-node=1 # Number of Tasks per Node #SBATCH --cpus-per-task=16 # Number of threads per task (OMP threads) module load R/3.2.4 Rscript bootstrapWargs.R $SLURM_CPUS_PER_TASK
Edit the new file bootstrapWargs.sh to contain the above SLURM script code.
Now submit as in the following.
[tulaneID@cypress1 ~]$ sbatch bootstrapWargs.sh Submitted batch job 52481 [tulaneID@cypress1 ~]$ cat slurm-52481.out Loading required package: foreach Loading required package: iterators Loading required package: parallel [1] "16" elapsed 3.282 [tulaneID@cypress1 ~]$
Installing R Packages on Cypress
If you want to use some R packages that are not yet installed in your desired version of R on Cypress, then you have several alternatives, as prescribed below, for locations for installing those packages. Those locations include either your user home directory or lustre sub-directory, and the methods will vary depending on your desired level of reproducibility.
Alternative 1 - default to home sub-directory
From your R session, you may choose to have R install its packages into a sub-directory under your home directory. By default R will create such a sub-directory whose name corresponds to the R version of your current R session and install your packages there.
> R.version.string [1] "R version 3.4.1 (2017-06-30)" > install.packages("copula") Installing package into ‘/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib’ (as ‘lib’ is unspecified) Warning in install.packages("copula") : 'lib = "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib"' is not writable Would you like to use a personal library instead? (y/n) y Would you like to create a personal library ~/R/x86_64-pc-linux-gnu-library/3.4 to install packages into? (y/n) y --- Please select a CRAN mirror for use in this session --- PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused HTTPS CRAN mirror 1: 0-Cloud [https] 2: Algeria [https] ... 79: Vietnam [https] 80: (HTTP mirrors) Selection: 77 ...
Note that the above example was performed without X11 forwarding, resulting in a prompt at the command line for selection of a CRAN mirror site in the above, at which point you should enter the number corresponding to the desired mirror site, e.g. 77.
Alternative 2 - specify your lustre sub-directory via exported environment variable
Alternatively, if you prefer to use, say, your lustre sub-directory rather than your home directory, then you may do so via an exported environment variable setting as in the following. The environmental variable R_LIBS_USER points the desired location of user package(s).
First, create a directory and export the environment variable.
mkdir -p /lustre/project/<your-group-name>/R/Library export R_LIBS_USER=/lustre/project/<your-group-name>/R/Library
Then run R and install a package. Note that we can use the R function .libPaths() as confirmation of the user library location.
> .libPaths() [1] "/lustre/project/<your-group-name>/R/Library" [2] "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib" [3] "/share/apps/R/3.4.1-intel/lib64/R/library" > install.packages("copula") Installing package into ‘/lustre/project/<your-group-name>/R/Library’ (as ‘lib’ is unspecified) ...
Alternative 3 - specify lustre sub-directory via environment file
Similarly, you may accomplish the above via the same environment variable setting as above but in a local file as in the following.
First, create a directory as above.
mkdir -p /lustre/project/<your-group-name>/R/Library
Then setting R_LIBS_USER in the file ~/.Renviron will tell R a default location.
Note however that setting or unsetting the environment variable R_LIBS_USER in the file ~/.Renviron will override any previously exported value of that same environment variable!
echo 'R_LIBS_USER="/lustre/project/<your-group-name>/R/Library"' > ~/.Renviron
Or use a text editor in order to create and edit the file ~/.Renviron so that the file includes the following line.
R_LIBS_USER="/lustre/project/<your-group-name>/R/Library"
Then run R and install a package. Note again the use of R function .libPaths() as confirmation of the user library location.
> .libPaths() [1] "/lustre/project/<your-group-name>/R/Library" [2] "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib" [3] "/share/apps/R/3.4.1-intel/lib64/R/library" > install.packages("copula") Installing package into ‘/lustre/project/<your-group-name>/R/Library’ (as ‘lib’ is unspecified) ...
Alternative 4 - specify lustre sub-directory via R profile file
Similarly, you may set the sub-directory depending on R major.minor version via the R profile file as in the following.
Edit the file ~/.Rprofile as follows.
majorMinorPatch <- paste(R.version[c("major", "minor")], collapse=".") majorMinor <- gsub("(.*)\\..*", "\\1", majorMinorPatch) #print(paste0("majorMinor=", majorMinor)) myLibPath <- paste0("/lustre/project/<your-group-name>/R/Library/", majorMinor) dir.create(myLibPath, showWarnings = FALSE) #print(paste0("myLibPath=", myLibPath)) newLibPaths <- c(myLibPath, .libPaths()) .libPaths(newLibPaths)
Note that setting the R library trees directly via the R function .libPaths() in the file ~/.Rprofile can thus either override or append to that of any previously set value of R_LIBS_USER!
Then run R and install a package. Note again the use of R function .libPaths() as confirmation of the user library location.
> .libPaths() [1] "/lustre/project/<your-group-name>/R/Library/3.4" [2] "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib" [3] "/share/apps/R/3.4.1-intel/lib64/R/library" > install.packages("copula") Installing package into ‘/lustre/project/<your-group-name>/R/Library/3.4’ (as ‘lib’ is unspecified) ...
Alternative 5 - specify lustre sub-directory via R code
As for yet another alternative, you can accomplish the above entirely in your R code via the following. First, create a directory as before.
mkdir -p /lustre/project/<your-group-name>/R/Library
Then run R and install a package, but note that you must also specify the location from which to load the package in the ensuing call to the R function library().
> myLib := "/lustre/project/<your-group-name>/R/Library" > install.packages("copula",lib=myLib) ... > library(copula, lib.loc=myLib)