[[PageOutline]]

= Running R on Cypress = 

== R Modules ==
As of July 31st, 2017 there are five versions of R installed on Cypress in the modules

* R/3.1.2(default)
* R/3.2.4
* R/3.2.5-intel
* R/3.3.1-intel
* R/3.4.1-intel

== Running R Interactively ==

==== For Workshop ====
If you use a temporary workshop account, do this.
{{{
export MY_PARTITION=workshop
export MY_QUEUE=workshop
}}}

Start an interactive session using idev

{{{
[tulaneID@cypress1 pp-1.6.4]$ idev 
Requesting 1 node(s)  task(s) to normal queue of defq partition
1 task(s)/node, 20 cpu(s)/task, 2 MIC device(s)/node
Time: 0 (hr) 60 (min).
Submitted batch job 52311
Seems your requst is pending.
JOBID=52311 begin on cypress01-035
--> Creating interactive terminal session (login) on node cypress01-035.
--> You have 0 (hr) 60 (min).
[tulaneID@cypress01-035 pp-1.6.4]$ 
}}}

Load the R module

{{{
[tulaneID@cypress01-035 pp-1.6.4]$ module load R/3.1.2
[tulaneID@cypress01-035 pp-1.6.4]$ module list
Currently Loaded Modulefiles:
  1) git/2.4.1           3) idev                5) R/3.1.2
  2) slurm/14.03.0       4) bbcp/amd64_rhel60
}}}

Run R in the command line window

{{{
[tulaneID@cypress01-035 pp-1.6.4]$R

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> 

}}}

== Running a R script in Batch mode ==

You can also submit your R job to the batch nodes (compute nodes) on Cypress. Inside your SLURM script, include a command to load the desired R module. Then invoke the '''Rscript''' command on your R script.

{{{#!bash
#!/bin/bash
#SBATCH --qos=normal            # Quality of Service
#SBATCH --job-name=R            # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)

module load R/3.1.2
Rscript myRscript.R
}}}

== Running a Parallel R Job ==

Starting with version 2.14.0, R has offered direct support for parallel computation through the "parallel" package. We will present two examples of running a parallel job of BATCH mode. They differ in the ways in which they communicate the number of cores reserved by SLURM to R. Both are based on code found in [https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf "Getting Started with doParallel and foreach" by Steve Weston and Rich Calaway] and modified by [https://rcc.uchicago.edu/docs/software/environments/R/index.html The University of Chicago Resource Computing Center].

In the first example, we will use the built in R function '''Sys.getenv( )''' to get the SLURM environment variable from the operating system. 

{{{#!r
#Based on code from the UCRCC website

library(doParallel)

# use the environment variable SLURM_NTASKS_PER_NODE to set the number of cores
registerDoParallel(cores=(Sys.getenv("SLURM_NTASKS_PER_NODE")))

# Bootstrapping iteration example
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
iterations <- 10000# Number of iterations to run

# Parallel version of code 
# Note the '%dopar%' instruction
part <- system.time({
  r <- foreach(icount(iterations), .combine=cbind) %dopar% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
  }
})[3]

# Shows the number of Parallel Workers to be used
getDoParWorkers()
# Executes the functions
part
}}}

This script will obtain the number of tasks per node set in our SLURM script and will pass that value to the '''registerDoParallel( )''' function. To implement this we need only set the correct parameters in our SLURM script. Suppose we wanted to use 16 cores. Then the correct script would be 

{{{#!bash
#!/bin/bash
#SBATCH --qos=normal            # Quality of Service
#SBATCH --job-name=R            # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=16    # Number of Tasks per Node

module load R/3.1.2

Rscript bootstrap.R
}}}

The disadvantage of this approach is that it is system specific. If we move our code to a machine that uses PBS-Torque as it's manager (sphynx for example) we have to change our source code. An alternative method that results in a more portable code base uses command line arguments to pass the value of our environment variables into the script.

{{{#!r
#Based on code from the UCRCC website

library(doParallel)
# Enable command line arguments
args<-commandArgs(TRUE)

# use the environment variable SLURM_NTASKS_PER_NODE to set the number of cores
registerDoParallel(cores=(as.integer(args[1])))

# Bootstrapping iteration example
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
iterations <- 10000# Number of iterations to run

# Parallel version of code 
# Note the '%dopar%' instruction
part <- system.time({
  r <- foreach(icount(iterations), .combine=cbind) %dopar% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
  }
})[3]

# Shows the number of Parallel Workers to be used
getDoParWorkers()
# Executes the functions
part
}}}

Note the use of '''args<-commandArgs(TRUE)''' and of '''as.integer(args[1])'''. This allows us to pass in a value from the command line when we call the script and the number of cores will be set to that value. Using the same basic submission script as last time, we need only pass the value of the correct SLRUM environment variable to the script at runtime.

{{{#!bash
#!/bin/bash
#SBATCH --qos=normal            # Quality of Service
#SBATCH --job-name=R       # Job Name
#SBATCH --time=00:01:00         # WallTime
#SBATCH --nodes=1               # Number of Nodes
#SBATCH --ntasks-per-node=16    # Number of Tasks per Node

module load R/3.1.2

Rscript bootstrapWargs.R $SLURM_TASKS_PER_NODE
}}}

Not that since we did not specify an output file, the output will be written to slurm-<!JobNumber>.out. For example:

{{{
[cmaggio@cypress1 ~]$ sbatch RsubmissionWargs.srun 
Submitted batch job 52481
[tulaneID@cypress1 ~]$ cat slurm-52481.out 
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
[1] "16"
elapsed 
  3.282 
[tulaneID@cypress1 ~]$ 
}}}


== Installing R Packages into user home directory or lustre sub-directory ==
If you want to use some R packages that are not yet installed in your desired version of R on Cypress,
then you have several alternatives - depending on your desired level of reproducibility.

=== Alternative 1 - default to home sub-directory ===
From your R session, you may choose to have R install its packages into a sub-directory under your home directory.
By default R will create such a sub-directory whose name corresponds to the R version of your current R session and install your packages there.

{{{
> R.version.string
[1] "R version 3.4.1 (2017-06-30)"
> install.packages("copula")
Installing package into ‘/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib’
(as ‘lib’ is unspecified)
Warning in install.packages("copula") :
  'lib = "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib"' is not writable
Would you like to use a personal library instead?  (y/n) y
Would you like to create a personal library
~/R/x86_64-pc-linux-gnu-library/3.4
to install packages into?  (y/n) y
...
}}}

=== Alternative 2 - specify your lustre sub-directory via exported environment variable ===
Alternatively, if you prefer to use, say, your lustre sub-directory rather than your home directory, then you may do so via an exported environment variable setting as in the following.
The environmental variable **R_LIBS_USER** points the desired location of user package(s).

First, create a directory and export the environment variable.
{{{
mkdir -p /lustre/project/<your-group-name>/R/Library
export R_LIB_USER=/lustre/project/<your-group-name>/R/Library
}}} 

Then run R and install a package. Note that we can use the R function **.libPaths()** as confirmation of the user library location.
{{{
> .libPaths()
[1] "/lustre/project/<your-group-name>/R/Library"
[2] "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib"
[3] "/share/apps/R/3.4.1-intel/lib64/R/library"
> install.packages("copula")
Installing package into ‘/lustre/project/<your-group-name>/R/Library’
(as ‘lib’ is unspecified)
...
}}}

=== Alternative 3 - specify lustre sub-directory via environment file ===
Similarly, you may accomplish the above via the same environment variable setting as above but in a local file as in the following.

First, create a directory as above.
{{{
mkdir -p /lustre/project/<your-group-name>/R/Library
}}}
Then setting **R_LIBS_USER** in the file **~/.Renviron** will tell R a default location.

Note however that setting or unsetting the environment variable **R_LIBS_USER** in the file **~/.Renviron** will //override// any previously exported value of that same environment variable!

{{{
echo 'R_LIBS_USER="/lustre/project/<your-group-name>/R/Library"' > ~/.Renviron
}}}
Or use a text editor in order to create and edit the file **~/.Renviron** so that the file includes the following line.
{{{
R_LIBS_USER="/lustre/project/<your-group-name>/R/Library"
}}}

Then run R and install a package. Note again the use of R function **.libPaths()** as confirmation of the user library location.
{{{
> .libPaths()
[1] "/lustre/project/<your-group-name>/R/Library"
[2] "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib"
[3] "/share/apps/R/3.4.1-intel/lib64/R/library"
> install.packages("copula")
Installing package into ‘/lustre/project/<your-group-name>/R/Library’
(as ‘lib’ is unspecified)
...
}}}

=== Alternative 4 - specify lustre sub-directory via R profile file ===
Similarly, you may set the sub-directory depending on R major.minor version via the R profile file as in the following.

Edit the file **~/.Rprofile** as follows.
{{{
majorMinorPatch <- paste(R.version[c("major", "minor")], collapse=".")
majorMinor <- gsub("(.*)\\..*", "\\1", majorMinorPatch)
#print(paste0("majorMinor=", majorMinor))
myLibPath <- paste0("/lustre/project/<your-group-name>/R/Library/", majorMinor)
dir.create(myLibPath, showWarnings = FALSE)
#print(paste0("myLibPath=", myLibPath))
newLibPaths <- c(myLibPath, .libPaths())
.libPaths(newLibPaths)
}}}

Note that setting the R library trees directly via the R function **.libPaths()** in the file **~/.Rprofile** can thus either //override// or //append// to that of any previously set value of **R_LIBS_USER**!

Then run R and install a package. Note again the use of R function **.libPaths()** as confirmation of the user library location.
{{{
> .libPaths()
[1] "/lustre/project/<your-group-name>/R/Library/3.4"
[2] "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib"
[3] "/share/apps/R/3.4.1-intel/lib64/R/library"
> install.packages("copula")
Installing package into ‘/lustre/project/<your-group-name>/R/Library/3.4’
(as ‘lib’ is unspecified)
...
}}}

=== Alternative 5 - specify lustre sub-directory via R code ===
As for yet another alternative, you can accomplish the above entirely in your R code via the following.
First, create a directory as before.
{{{
mkdir -p /lustre/project/<your-group-name>/R/Library
}}} 

Then run R and install a package, but note that you must also specify the location from which to load the package in the ensuing call to the R function **library()**.
{{{
> myLib := "/lustre/project/<your-group-name>/R/Library"
> install.packages("copula",lib=myLib)
...
> library(copula, lib.loc=myLib)
}}}


[[cypress/Python|Next Section: Running Python on Cypress]]