Modernizing my approach to my R packages

I’ve been using R since 2000 or so, probably earlier, off and on. I’ve always just hacked out big old spaghetti-code programs. More recently, as alluded to with this past post, I’ve been migrating to using node.js to call into R. The initial impetus was to solve a problem with memory leakage, and with a single crash disrupting a really big sequence of jobs. By setting up my big outer loops in node.js, I can now fire off as many simultaneous R jobs as my RAM can handle, and if any die, node.js can handle the errors appropriately.

The one remaining issue is that my R code was still pretty brutish. I dislike the formal R packaging stuff, and I wanted something more lightweight, more like what node.js uses. I first tried to use component, but that was the wrong choice for a number of reasons. Way back in October I toyed with the idea of using npm to package up my R code, but I didn’t really start to do that in earnest until very recently. It turns out, with just a few quirks, this works pretty well. This post outlines my general approach to using npm to load R packages.

First, I owe a big debt to Hadley Wickham, who’s book on R packaging and many useful R packages (devtools and testthat are indispensible) helped tremendously with my comprehension of R packaging. Because what I discovered by trial and error is that even though I don’t want to use CRAN to distribute my code, I found that it really does work best to use R’s native packaging constructs. So kicking and screaming, I dragged myself out of my state of willful ignorance.

What I like about node.js packages is the fact that they are local to the code that requires them. For example, I might have a web server that relies on some old version of connect and express, and I can simultaneously have other servers that rely on the newer express versions. While I could install express globally using npm’s -g flag, that would be unhelpful, as I’d have to update old code every time a new version of any of its dependencies came out. Languages like R take a different approach, with the global library the usual default install location. The presumption is that all programs use the same, modern version of their dependencies. I must admit it took me a while to accept node’s and npm’s “local versions” approach, but now that I have, I like it.

So my approach is to use npm install to download and install my R dependencies. Obviously this only works for my code at the moment, as everybody else in the universe of R programming is most definitely not using npm for packaging.

First of all, npm needs a valid “config.json” file that tells it what to download. This is usually created by running npm init.

mkdir testrepo
cd testrepo
npm init

This utility will walk you through creating a package.json file.
It only covers the most common items, and tries to guess sane defaults.

See `npm help json` for definitive documentation on these fields
and exactly what they do.

Use `npm install  --save` afterwards to install a package and
save it as a dependency in the package.json file.

Press ^C at any time to quit.
name: (testrepo) 
version: (1.0.0) 
description: A test repo
entry point: (index.js) 
test command: Rscript Rtest.R
git repository: jmarca/testrepo
keywords: 
author: James E. Marca
license: (ISC) GPL-V2
About to write to /home/james/repos/jem/testrepo/package.json:

{
  "name": "testrepo",
  "version": "1.0.0",
  "description": "A test repo",
  "main": "index.js",
  "scripts": {
    "test": "Rscript Rtest.R"
  },
  "repository": {
    "type": "git",
    "url": "https://github.com/jmarca/testrepo"
  },
  "author": "James E. Marca",
  "license": "GPL-V2",
  "bugs": {
    "url": "https://github.com/jmarca/testrepo/issues"
  },
  "homepage": "https://github.com/jmarca/testrepo"
}


Is this ok? (yes) 

If this were an R package we’d also need to add all the R package scaffolding. But it isn’t, it is just an R program that uses R packages.

Suppose I want to use my R package rcouchutils in this package, because I need to save stuff in CouchDB. To do that I need to add the dependency in the package.json file, as so:

...
  "dependencies": {
      "rcouchutils":"jmarca/rstats_couch_utils"
  },
...

Now after saving the new package.json file, in the root directory of the project, I run npm install. If you have installed devtools, then you should see lots of output scroll by. If you haven’t, then most likely the install will fail. I haven’t fixed this yet.

At the end of a successful install, the final lines should look like:

Reloading installed rcouchutils
rcouchutils@0.1.0 node_modules/rcouchutils
└── configr@0.1.0

What this means is that rcouchutils was installed under node_modules, and that under that its dependency configr was installed as well. To make this work, both rcouchutils and configr were set up with another hack on the npm package.json. Specifically, instead of the default test and install scripts, instead I have the following:

...
  "scripts": {
    "test": "Rscript Rtest.R",
    "install": "Rscript Rinstall.R"
  },
...

This configuration snippet tells npm that the command npm test should run the command Rscript Rtest.R, and npm install should run Rscript Rinstall.R. A more complicated package could use a full-blown Makefile, and the test and install scripts could be assigned make test and make install respectively. For this example, my code is very simple and can get by with a small R script to test and install.

The install program Rinstall.R uses devtools to install the package in a local directory .Rlibs. Specifically, Rinstall.R looks like:

dot_is <- getwd()
node_paths <- dir(dot_is,pattern='\.Rlibs',
                  full.names=TRUE,recursive=TRUE,
                  ignore.case=TRUE,include.dirs=TRUE,
                  all.files = TRUE)
path <- normalizePath('../.Rlibs', winslash = "/", mustWork = FALSE)
if(!file.exists('path')){
    dir.create(path)
}
lib_paths <- .libPaths()
.libPaths(c(path,node_paths,lib_paths))
## ready to go
devtools::document()
devtools::install()

The first lines of that program set up the local directory .Rlibs as the preferred library location. The directory is created if it doesn’t exist, and then is prepended to the list of existing library paths. This will cause devtools::install() to install this package into .Rlibs.

I call devtools::document() first to make sure that the NAMESPACE and documentation files are regenerated, so that they don’t have to be stored with the package in github.

The test program Rtest.R looks like this:

## need node_modules directories
dot_is <- getwd()
node_paths <- dir(dot_is,pattern='\.Rlibs',
                  full.names=TRUE,recursive=TRUE,
                  ignore.case=TRUE,include.dirs=TRUE,
                  all.files = TRUE)
path <- normalizePath(node_paths, winslash = "/", mustWork = FALSE)
lib_paths <- .libPaths()
.libPaths(c(path, lib_paths))

## need env for test file
Sys.setenv(RCOUCHUTILS_TEST_CONFIG=paste(dot_is,'test.config.json',sep='/'))

devtools::check()

Again, the first few lines set up the correct R library paths, prepending the local .Rlibs to the existing, global R library paths. The reason I do a search (dir(...)) for all subdirectories called .Rlibs is that, following the node/npm way, each dependency might also include dependencies that are installed locally. This isn’t perfect, and hasn’t been polished as much as node’s management of local node_modules paths. However, it works well enough for now. The idea is that the top level program can call exported functions from its immediate dependencies (in this case rcouchutils), and those dependencies can also invoke functions from their dependencies (here, rcouchutils depends on configr).

You can see actual code that uses this approach in my github repos for rcouchutils (https://github.com/jmarca/rstats_couch_utils) and configr (
https://github.com/jmarca/configr).

When developing a package that relies on local libraries in node_modules/.Rlibs (and all nested .Rlibs), I make sure to run the starting part of Rtest.R (up to the devtools::check()) command, in order to let my interactive R session know about these local libraries. Finally, when calling a program from node.js that uses locally installed libraries, I just stick that snippet of code at the start of my R script.

Finally, the best advantage of this is that you can safely ignore the warnings from devtools::check() that complain about grammar in your DESCRIPTION file. But do pay attention to all the other notes and warnings, as I’ve caught lots of minor bugs related to global variables that my interactive development sessions don’t reveal.

Advertisements

One thought on “Modernizing my approach to my R packages

  1. Pingback: Using npm with R is great | Contour Line

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s