Changes between Version 15 and Version 16 of cypress/R


Ignore:
Timestamp:
Aug 22, 2018 7:06:05 PM (3 years ago)
Author:
cbaribault
Comment:

CEB 20180822 - Fixed R package dependency error, clarified workflow in general.

Legend:

Unmodified
Added
Removed
Modified
  • cypress/R

    v15 v16  
    3939
    4040{{{
    41 [tulaneID@cypress01-035 pp-1.6.4]$ module load R/3.1.2
     41[tulaneID@cypress01-035 pp-1.6.4]$ module load R/3.2.4
    4242[tulaneID@cypress01-035 pp-1.6.4]$ module list
    4343Currently Loaded Modulefiles:
    44   1) git/2.4.1           3) idev                5) R/3.1.2
     44  1) git/2.4.1           3) idev                5) R/3.2.4
    4545  2) slurm/14.03.0       4) bbcp/amd64_rhel60
    4646}}}
     
    5151[tulaneID@cypress01-035 pp-1.6.4]$R
    5252
    53 R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
    54 Copyright (C) 2014 The R Foundation for Statistical Computing
    55 Platform: x86_64-unknown-linux-gnu (64-bit)
     53R version 3.2.4 (2016-03-10) -- "Very Secure Dishes"
     54Copyright (C) 2016 The R Foundation for Statistical Computing
     55Platform: x86_64-pc-linux-gnu (64-bit)
    5656
    5757R is free software and comes with ABSOLUTELY NO WARRANTY.
     
    6969Type 'q()' to quit R.
    7070
    71 >
    72 
     71>
    7372}}}
    7473
     
    8685#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)
    8786
    88 module load R/3.1.2
     87module load R/3.2.4
    8988Rscript myRscript.R
    9089}}}
    9190
     91'''For Workshop''' :
     92If you use a temporary workshop account, modify the SLURM script like:
     93{{{#!bash
     94#!/bin/bash
     95#SBATCH --partition=workshop    # Partition
     96#SBATCH --qos=workshop          # Quality of Service
     97##SBATCH --qos=normal          ### Quality of Service (like a queue in PBS)
     98#SBATCH --job-name=R            # Job Name
     99#SBATCH --time=00:01:00         # WallTime
     100#SBATCH --nodes=1               # Number of Nodes
     101#SBATCH --ntasks-per-node=1     # Number of tasks (MPI processes)
     102#SBATCH --cpus-per-task=1       # Number of threads per task (OMP threads)
     103
     104module load R/3.2.4
     105Rscript myRscript.R
     106}}}
     107
    92108== Running a Parallel R Job ==
    93109
    94110Starting with version 2.14.0, R has offered direct support for parallel computation through the "parallel" package. We will present two examples of running a parallel job of BATCH mode. They differ in the ways in which they communicate the number of cores reserved by SLURM to R. Both are based on code found in [https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf "Getting Started with doParallel and foreach" by Steve Weston and Rich Calaway] and modified by [https://rcc.uchicago.edu/docs/software/environments/R/index.html The University of Chicago Resource Computing Center].
    95111
    96 In the first example, we will use the built in R function '''Sys.getenv( )''' to get the SLURM environment variable from the operating system.
     112=== Passing (SLURM) Environment Variables ===
     113
     114In the first example, we will use the built in R function '''Sys.getenv( )''' to get the SLURM environment variable from the operating system.
     115
     116Edit the new file '''bootstrap.R''' to contain the following code.
    97117
    98118{{{#!r
     
    124144}}}
    125145
    126 This script will obtain the number of tasks per node set in our SLURM script and will pass that value to the '''registerDoParallel( )''' function. To implement this we need only set the correct parameters in our SLURM script. Suppose we wanted to use 16 cores. Then the correct script would be
     146The above script will obtain the number of tasks per node via the SLURM environment variable '''SLURM_CPUS_PER_TASK''' set in our SLURM script and will pass that value to the '''registerDoParallel( )''' function. To implement this we need only set the correct parameters in our SLURM script. Suppose we wanted to use 16 cores. Then the correct SLURM script would be as follows.
    127147
    128148{{{#!bash
     
    135155#SBATCH --cpus-per-task=16      # Number of threads per task (OMP threads)
    136156
    137 module load R/3.1.2
    138 
    139 Rscript bootstrap.R $SLURM_CPUS_PER_TASK
    140 }}}
    141 
    142 The disadvantage of this approach is that it is system specific. If we move our code to a machine that uses PBS-Torque as it's manager (sphynx for example) we have to change our source code. An alternative method that results in a more portable code base uses command line arguments to pass the value of our environment variables into the script.
     157module load R/3.2.4
     158
     159Rscript bootstrap.R
     160}}}
     161
     162'''For Workshop''' :
     163If you use a temporary workshop account, modify the SLURM script like:
     164{{{#!bash
     165#!/bin/bash
     166#SBATCH --partition=workshop    # Partition
     167#SBATCH --qos=workshop          # Quality of Service
     168##SBATCH --qos=normal          ### Quality of Service (like a queue in PBS)
     169#SBATCH --job-name=R            # Job Name
     170#SBATCH --time=00:01:00         # WallTime
     171#SBATCH --nodes=1               # Number of Nodes
     172#SBATCH --ntasks-per-node=1    # Number of Tasks per Node
     173#SBATCH --cpus-per-task=16      # Number of threads per task (OMP threads)
     174
     175module load R/3.2.4
     176
     177Rscript bootstrap.R
     178}}}
     179
     180Edit the new file '''bootstrap.sh''' to contain the above SLURM script code.
     181
     182=== R Package Dependency ===
     183
     184So, for example, in order to run the above, we would submit the above SLURM script via '''sbatch''' command, but in general we would not expect this code to run the very first time with out additional setup.  We can try running the above scripts thus far without any additional setup, but we can expect to get the error such as in the following R session.
     185
     186{{{#!r
     187> library(doParallel)
     188Error in library(doParallel) : there is no package called ‘doParallel’
     189}}}
     190
     191Thus we should first ensure that the required R package, in this case the R package '''doParallel''', is available and installed in your environment. For a range of options for installing R packages - depending on the desired level of reproducibility, see the section [#InstallingRPackages Installing R Packages on Cypress].
     192
     193'''For Workshop''' :
     194If you use a temporary workshop account, use [#RPackageAlternative1 Alternative 1] for installing R packages.
     195
     196Once we have resolved our package dependencies, we now expect the job to run without errors.
     197
     198Also, note that since we did not specify an output file in the SLURM script, the output will be written to slurm-<!JobNumber>.out. For example:
     199
     200{{{
     201[tulaneID@cypress2 R]$ sbatch bootstrap.sh
     202Submitted batch job 774081
     203[tulaneID@cypress2 R]$ cat slurm-774081.out
     204Loading required package: foreach
     205Loading required package: iterators
     206Loading required package: parallel
     207[1] "16"
     208elapsed
     209  2.954
     210[tulaneID@cypress2 R]$
     211}}}
     212
     213=== Passing Parameters ===
     214
     215The disadvantage of the above approach is that it is system specific. If we move our code to a machine that uses PBS-Torque as it's manager (sphynx for example) we have to change our source code. An alternative method that results in a more portable code base uses command line arguments to pass the value of our environment variables into the script.
     216
     217Edit the new file '''bootstrapWargs.R''' to contain the following code.
    143218
    144219{{{#!r
     
    183258#SBATCH --cpus-per-task=16      # Number of threads per task (OMP threads)
    184259
    185 module load R/3.1.2
     260module load R/3.2.4
    186261
    187262Rscript bootstrapWargs.R $SLURM_CPUS_PER_TASK
    188263}}}
    189264
    190 Not that since we did not specify an output file, the output will be written to slurm-<!JobNumber>.out. For example:
    191 
    192 {{{
    193 [cmaggio@cypress1 ~]$ sbatch RsubmissionWargs.srun
     265'''For Workshop''' :
     266If you use a temporary workshop account, modify the SLURM script like:
     267{{{#!bash
     268#!/bin/bash
     269#SBATCH --partition=workshop    # Partition
     270#SBATCH --qos=workshop          # Quality of Service
     271##SBATCH --qos=normal          ### Quality of Service (like a queue in PBS)
     272#SBATCH --job-name=R       # Job Name
     273#SBATCH --time=00:01:00         # WallTime
     274#SBATCH --nodes=1               # Number of Nodes
     275#SBATCH --ntasks-per-node=1    # Number of Tasks per Node
     276#SBATCH --cpus-per-task=16      # Number of threads per task (OMP threads)
     277
     278module load R/3.2.4
     279
     280Rscript bootstrapWargs.R $SLURM_CPUS_PER_TASK
     281}}}
     282
     283Edit the new file '''bootstrapWargs.sh''' to contain the above SLURM script code.
     284
     285Now submit as in the following.
     286
     287{{{
     288[tulaneID@cypress1 ~]$ sbatch bootstrapWargs.sh
    194289Submitted batch job 52481
    195290[tulaneID@cypress1 ~]$ cat slurm-52481.out
     
    204299
    205300
    206 == Installing R Packages into user home directory or lustre sub-directory ==
     301== [=#InstallingRPackages Installing R Packages on Cypress] ==
    207302If you want to use some R packages that are not yet installed in your desired version of R on Cypress,
    208 then you have several alternatives - depending on your desired level of reproducibility.
    209 
    210 === Alternative 1 - default to home sub-directory ===
     303then you have several alternatives, as prescribed below, for locations for installing those packages. Those locations include either your user home directory or lustre sub-directory, and the methods will vary depending on your desired level of reproducibility.
     304
     305=== [=#RPackageAlternative1 Alternative 1] - default to home sub-directory ===
    211306From your R session, you may choose to have R install its packages into a sub-directory under your home directory.
    212307By default R will create such a sub-directory whose name corresponds to the R version of your current R session and install your packages there.
     
    224319~/R/x86_64-pc-linux-gnu-library/3.4
    225320to install packages into?  (y/n) y
    226 ...
    227 }}}
     321--- Please select a CRAN mirror for use in this session ---
     322PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused
     323HTTPS CRAN mirror
     324
     325 1: 0-Cloud [https]                   2: Algeria [https]
     326...
     32779: Vietnam [https]                  80: (HTTP mirrors)
     328
     329
     330Selection: 77
     331...
     332}}}
     333
     334Note that the above example was performed without X11 forwarding, resulting in a prompt at the command line for selection of a CRAN mirror site in the above, at which point you should enter the number corresponding to the desired mirror site, e.g. '''77'''.
    228335
    229336=== Alternative 2 - specify your lustre sub-directory via exported environment variable ===