Changes between Initial Version and Version 1 of Workshops/IntroToHpc2015/R


Ignore:
Timestamp:
10/12/15 15:58:31 (9 years ago)
Author:
pdejesus
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Workshops/IntroToHpc2015/R

    v1 v1  
     1[[PageOutline]]
     2
     3= Running R on Cypress =
     4
     5== R Modules ==
     6As of August 18th, 2015 there is one version of R installed on Cypress in the module
     7
     8* R/3.1.2
     9
     10== Installing Packages ==
     11
     12
     13== Running R Interactively ==
     14
     15Start an interactive session using idev
     16
     17{{{
     18[tulaneID@cypress1 pp-1.6.4]$ idev
     19Requesting 1 node(s)  task(s) to normal queue of defq partition
     201 task(s)/node, 20 cpu(s)/task, 2 MIC device(s)/node
     21Time: 0 (hr) 60 (min).
     22Submitted batch job 52311
     23Seems your requst is pending.
     24JOBID=52311 begin on cypress01-035
     25--> Creating interactive terminal session (login) on node cypress01-035.
     26--> You have 0 (hr) 60 (min).
     27[tulaneID@cypress01-035 pp-1.6.4]$
     28}}}
     29
     30Load the R module
     31
     32{{{
     33[tulaneID@cypress01-035 pp-1.6.4]$ module load R/3.1.2
     34[tulaneID@cypress01-035 pp-1.6.4]$ module list
     35Currently Loaded Modulefiles:
     36  1) git/2.4.1           3) idev                5) R/3.1.2
     37  2) slurm/14.03.0       4) bbcp/amd64_rhel60
     38}}}
     39
     40Run R in the command line window
     41
     42{{{
     43[tulaneID@cypress01-035 pp-1.6.4]$R
     44
     45R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
     46Copyright (C) 2014 The R Foundation for Statistical Computing
     47Platform: x86_64-unknown-linux-gnu (64-bit)
     48
     49R is free software and comes with ABSOLUTELY NO WARRANTY.
     50You are welcome to redistribute it under certain conditions.
     51Type 'license()' or 'licence()' for distribution details.
     52
     53  Natural language support but running in an English locale
     54
     55R is a collaborative project with many contributors.
     56Type 'contributors()' for more information and
     57'citation()' on how to cite R or R packages in publications.
     58
     59Type 'demo()' for some demos, 'help()' for on-line help, or
     60'help.start()' for an HTML browser interface to help.
     61Type 'q()' to quit R.
     62
     63>
     64
     65}}}
     66
     67== Running a R script in Batch mode ==
     68
     69You can also submit your R job to the batch nodes (compute nodes) on Cypress. Inside your SLURM script, include a command to load the desired R module. Then invoke the '''Rscript''' command on your R script.
     70
     71{{{#!bash
     72#!/bin/bash
     73#SBATCH --qos=workshop          # Quality of Service
     74#SBATCH --partition=workshop    # Partition
     75#SBATCH --job-name=R            # Job Name
     76#SBATCH --time=00:01:00         # WallTime
     77#SBATCH --nodes=1               # Number of Nodes
     78#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
     79#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)
     80
     81module load R/3.1.2
     82Rscript myRscript.R
     83}}}
     84
     85== Running a Parallel R Job ==
     86
     87Starting with version 2.14.0, R has offered direct support for parallel computation through the "parallel" package. We will present two examples of running a parallel job of BATCH mode. They differ in the ways in which they communicate the number of cores reserved by SLURM to R. Both are based on code found in [https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf "Getting Started with doParallel and foreach" by Steve Weston and Rich Calaway] and modified by [https://rcc.uchicago.edu/docs/software/environments/R/index.html The University of Chicago Resource Computing Center].
     88
     89In the first example, we will use the built in R function '''Sys.getenv( )''' to get the SLURM environment variable from the operating system.
     90
     91{{{#!r
     92#Based on code from the UCRCC website
     93
     94library(doParallel)
     95
     96# use the environment variable SLURM_NTASKS_PER_NODE to set the number of cores
     97registerDoParallel(cores=(Sys.getenv("SLURM_NTASKS_PER_NODE")))
     98
     99# Bootstrapping iteration example
     100x <- iris[which(iris[,5] != "setosa"), c(1,5)]
     101iterations <- 10000# Number of iterations to run
     102
     103# Parallel version of code
     104# Note the '%dopar%' instruction
     105part <- system.time({
     106  r <- foreach(icount(iterations), .combine=cbind) %dopar% {
     107    ind <- sample(100, 100, replace=TRUE)
     108    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
     109    coefficients(result1)
     110  }
     111})[3]
     112
     113# Shows the number of Parallel Workers to be used
     114getDoParWorkers()
     115# Executes the functions
     116part
     117}}}
     118
     119This script will obtain the number of tasks per node set in our SLURM script and will pass that value to the '''registerDoParallel( )''' function. To implement this we need only set the correct parameters in our SLURM script. Suppose we wanted to use 16 cores. Then the correct script would be
     120
     121{{{#!bash
     122#!/bin/bash
     123#SBATCH --qos=workshop          # Quality of Service
     124#SBATCH --partition=workshop    # Partition
     125#SBATCH --job-name=R            # Job Name
     126#SBATCH --time=00:01:00         # WallTime
     127#SBATCH --nodes=1               # Number of Nodes
     128#SBATCH --ntasks-per-node=16    # Number of Tasks per Node
     129
     130module load R/3.1.2
     131
     132Rscript bootstrap.R
     133}}}
     134
     135The disadvantage of this approach is that it is system specific. If we move our code to a machine that uses PBS-Torque as it's manager (sphynx for example) we have to change our source code. An alternative method that results in a more portable code base uses command line arguments to pass the value of our environment variables into the script.
     136
     137{{{#!r
     138#Based on code from the UCRCC website
     139
     140library(doParallel)
     141# Enable command line arguments
     142args<-commandArgs(TRUE)
     143
     144# use the environment variable SLURM_NTASKS_PER_NODE to set the number of cores
     145registerDoParallel(cores=(as.integer(args[1])))
     146
     147# Bootstrapping iteration example
     148x <- iris[which(iris[,5] != "setosa"), c(1,5)]
     149iterations <- 10000# Number of iterations to run
     150
     151# Parallel version of code
     152# Note the '%dopar%' instruction
     153part <- system.time({
     154  r <- foreach(icount(iterations), .combine=cbind) %dopar% {
     155    ind <- sample(100, 100, replace=TRUE)
     156    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
     157    coefficients(result1)
     158  }
     159})[3]
     160
     161# Shows the number of Parallel Workers to be used
     162getDoParWorkers()
     163# Executes the functions
     164part
     165}}}
     166
     167Note the use of '''args<-commandArgs(TRUE)''' and of '''as.integer(args[1])'''. This allows us to pass in a value from the command line when we call the script and the number of cores will be set to that value. Using the same basic submission script as last time, we need only pass the value of the correct SLRUM environment variable to the script at runtime.
     168
     169{{{#!bash
     170#!/bin/bash
     171#SBATCH --qos=workshop          # Quality of Service
     172#SBATCH --partition=workshop    # Partition
     173#SBATCH --job-name=R       # Job Name
     174#SBATCH --time=00:01:00         # WallTime
     175#SBATCH --nodes=1               # Number of Nodes
     176#SBATCH --ntasks-per-node=16    # Number of Tasks per Node
     177
     178module load R/3.1.2
     179
     180Rscript bootstrapWargs.R $SLURM_TASKS_PER_NODE
     181}}}
     182
     183Not that since we did not specify an output file, the output will be written to slurm-<!JobNumber>.out. For example:
     184
     185{{{
     186[cmaggio@cypress1 ~]$ sbatch RsubmissionWargs.srun
     187Submitted batch job 52481
     188[tulaneID@cypress1 ~]$ cat slurm-52481.out
     189Loading required package: foreach
     190Loading required package: iterators
     191Loading required package: parallel
     192[1] "16"
     193elapsed
     194  3.282
     195[tulaneID@cypress1 ~]$
     196}}}
     197
     198[[cypress/Python|Next Section: Running Python on Cypress]]