wiki:cypress/InstallingRPackages

Version 3 (modified by cbaribault, 3 years ago) ( diff )

Added section Capturing Logging Output From R Package Installation

Installing R Packages on Cypress

If you want to use some R packages that are not yet installed in your desired version of R on Cypress, then you have several alternatives, as prescribed below, for locations for installing those packages. Those locations include either your user home directory or lustre sub-directory, and the methods will vary depending on your desired level of reproducibility.

For capturing logging output including error messages during installation, see below.

For more information on how the R startup process works, see https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html

Alternative 1 - default to home sub-directory

From your R session, you may choose to have R install its packages into a sub-directory under your home directory. By default R will create such a sub-directory whose name corresponds to the R version of your current R session and install your packages there.

> R.version.string
[1] "R version 3.4.1 (2017-06-30)"
> install.packages("copula")
Installing package into ‘/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib’
(as ‘lib’ is unspecified)
Warning in install.packages("copula") :
  'lib = "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib"' is not writable
Would you like to use a personal library instead?  (y/n) y
Would you like to create a personal library
~/R/x86_64-pc-linux-gnu-library/3.4
to install packages into?  (y/n) y
--- Please select a CRAN mirror for use in this session ---
PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused
HTTPS CRAN mirror

 1: 0-Cloud [https]                   2: Algeria [https]
...
79: Vietnam [https]                  80: (HTTP mirrors)


Selection: 77
...

Note that the above example was performed without X11 forwarding, resulting in a prompt at the command line for selection of a CRAN mirror site in the above, at which point you should enter the number corresponding to the desired mirror site, e.g. 77.

Alternative 2 - specify your lustre sub-directory via exported environment variable

Alternatively, if you prefer to use, say, your lustre sub-directory rather than your home directory, then you may do so via an exported environment variable setting as in the following. The environmental variable R_LIBS_USER points the desired location of user package(s).

First, create a directory and export the environment variable.

mkdir -p /lustre/project/<your-group-name>/R/Library
export R_LIBS_USER=/lustre/project/<your-group-name>/R/Library

Then run R and install a package. Note that we can use the R function .libPaths() as confirmation of the user library location.

> .libPaths()
[1] "/lustre/project/<your-group-name>/R/Library"
[2] "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib"
[3] "/share/apps/R/3.4.1-intel/lib64/R/library"
> install.packages("copula")
Installing package into ‘/lustre/project/<your-group-name>/R/Library’
(as ‘lib’ is unspecified)
...

Alternative 3 - specify lustre sub-directory via environment file

Similarly, you may accomplish the above via the same environment variable setting as above but in a local file as in the following.

First, create a directory as above.

mkdir -p /lustre/project/<your-group-name>/R/Library

Then setting R_LIBS_USER in the file ~/.Renviron will tell R a default location.

Note however that setting or unsetting the environment variable R_LIBS_USER in the file ~/.Renviron will override any previously exported value of that same environment variable!

echo 'R_LIBS_USER="/lustre/project/<your-group-name>/R/Library"' > ~/.Renviron

Or use a text editor in order to create and edit the file ~/.Renviron so that the file includes the following line.

R_LIBS_USER="/lustre/project/<your-group-name>/R/Library"

Then run R and install a package. Note again the use of R function .libPaths() as confirmation of the user library location.

> .libPaths()
[1] "/lustre/project/<your-group-name>/R/Library"
[2] "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib"
[3] "/share/apps/R/3.4.1-intel/lib64/R/library"
> install.packages("copula")
Installing package into ‘/lustre/project/<your-group-name>/R/Library’
(as ‘lib’ is unspecified)
...

Alternative 4 - specify lustre sub-directory via R profile file

Similarly, you may set the sub-directory depending on R major.minor version via the R profile file as in the following.

Edit the file ~/.Rprofile as follows.

majorMinorPatch <- paste(R.version[c("major", "minor")], collapse=".")
majorMinor <- gsub("(.*)\\..*", "\\1", majorMinorPatch)
#print(paste0("majorMinor=", majorMinor))
myLibPath <- paste0("/lustre/project/<your-group-name>/R/Library/", majorMinor)
dir.create(myLibPath, showWarnings = FALSE)
#print(paste0("myLibPath=", myLibPath))
newLibPaths <- c(myLibPath, .libPaths())
.libPaths(newLibPaths)

Note that setting the R library trees directly via the R function .libPaths() in the file ~/.Rprofile can thus either override or append to that of any previously set value of R_LIBS_USER!

Then run R and install a package. Note again the use of R function .libPaths() as confirmation of the user library location.

> .libPaths()
[1] "/lustre/project/<your-group-name>/R/Library/3.4"
[2] "/share/apps/spark/spark-2.0.0-bin-hadoop2.6/R/lib"
[3] "/share/apps/R/3.4.1-intel/lib64/R/library"
> install.packages("copula")
Installing package into ‘/lustre/project/<your-group-name>/R/Library/3.4’
(as ‘lib’ is unspecified)
...

Alternative 5 - specify lustre sub-directory via R code

As for yet another alternative, you can accomplish the above entirely in your R code via the following. First, create a directory as before.

mkdir -p /lustre/project/<your-group-name>/R/Library

Then run R and install a package, but note that you must also specify the location from which to load the package in the ensuing call to the R function library().

> myLib := "/lustre/project/<your-group-name>/R/Library"
> install.packages("copula",lib=myLib)
...
> library(copula, lib.loc=myLib)

Capturing Logging Output From R Package Installation

When you install an R package, the logging output can include multiple screens of information before finally ending with a simple, brief indication of success or failure such as installation of package 'RSQLite' had non-zero exit status.

As a result, any helpful diagnostic information can be easily lost - including one or more critical error messages that you may not notice as they scroll quickly and entirely out of view and out of your terminal window buffer.

To avoid this loss of diagnostic information, the R function install.packages() provides an option keep_outputs=T (or keep_outputs=TRUE).

You can use the keep_outputs=T option for capturing the logging output in files - one file per attempted R package - for your later inspection to look for possible error messages - as in the following.

> install.packages("RSQLite", keep_outputs=T)  # captures log output in a file RSQLite.out

Then from the BASH command line you can search for occurrences of the string error: either via less or grep BASH commands. (See Linux Commands.)

For example in the following, multiple error messages captured in the file RSQLite.out indicate that the collection of boost C++ libraries is unexpectedly missing, which can be provided by loading the appropriate module, boost/1.76.0, on Cypress. (See Module Command.)

[tulaneid@cypress2 ~]$ grep -i error: RSQLite.out
vendor/boost/preprocessor/list/fold_left.hpp(341): catastrophic error: cannot open source file "boost/preprocessor/list/detail/edg/fold_left.hpp"
...
vendor/boost/preprocessor/list/fold_left.hpp(341): catastrophic error: cannot open source file "boost/preprocessor/list/detail/edg/fold_left.hpp"
ERROR: compilation failed for package ‘RSQLite’

The following excerpt is taken from the output in the R session of > help("install.packages")

keep_outputs: a logical: if true, keep the outputs from installing
          source packages in the current working directory, with the
          names of the output files the package names with ‘.out’
          appended.  Alternatively, a character string giving the
          directory in which to save the outputs.  Ignored when
          installing from local files.
Note: See TracWiki for help on using the wiki.