Using npm with R is great

A few weeks ago I wrote up how I am using npm, the standard package manager for node.js. When I first started using node.js, and when npm first started cropping up as the best package manager to use, I was annoyed by the idea of copying libraries over and over again into each package’s node_modules directory. It seemed wasteful, since they were mostly copies, so I would generally use the -g flag and install globally.

Then I ran into trouble with versions, and I decided my insistence on using -g was stupid. Better to have the required version locally installed than to fight with multiple versions of a package at the global level.

The point is that today, in R, I need to depend on readr but the github version, not the CRAN version, because I need to match a column of times that use “AM/PM” time. In R, there isn’t a clean way to load conflicting versions of a package that I am aware of. I don’t want my programs to use the bleeding edge of readr, but I am willing to accept the devel version for this package.

Unfortunately, I’m the only person using npm to load R packages local to my project. Phooey. But I can hack my R launching script to use devtools to load the package I need locally as follows.

First, I have a standard incantation to make my runtime R find my local, node_modules-installed R libraries:

## need node_modules directories
dot_is <- getwd()
node_paths <- dir(dot_is,pattern='.Rlibs',
                  full.names=TRUE,recursive=TRUE,
                  ignore.case=TRUE,include.dirs=TRUE,
                  all.files = TRUE)
path <- normalizePath(node_paths, winslash = "/", mustWork = FALSE)
lib_paths <- .libPaths()
.libPaths(c(path, lib_paths))

This bit of code will dive down into the local node_modules directory, recursively find all of the .Rlibs directories, and prepend them to the runtime .libPaths, so that local libraries take precedence over global ones.

All I have to do is to insert a command to load the required devel-level packages before installing and testing my code. Something like:

## need node_modules directories
dot_is <- getwd()
node_paths <- dir(dot_is,pattern='.Rlibs',
                  full.names=TRUE,recursive=TRUE,
                  ignore.case=TRUE,include.dirs=TRUE,
                  all.files = TRUE)
path <- normalizePath('node_modules/.Rlibs', winslash = "/", mustWork = FALSE)
if(!file.exists('path')){
    dir.create(path)
}
.libPaths(c(path,node_paths, lib_paths))
vc <-  list(op=">=",version=package_version("0.1.1.9000"))
if(!requireNamespace(package='readr',versionCheck=vc)){
    devtools::install_github('hadley/readr')
}

I can save that as Requirements.R, and then add the following to my package.json file:

...
  "scripts": {
      "test": "/usr/bin/Rscript Rtest.R",
      "preinstall": "/usr/bin/Rscript Requirements.R",
      "install":"/usr/bin/Rscript Rinstall.R"
  },
...

That works and is cool, but extremely one-off. Better would be to add dependencies in the package.json and get them loaded automatically. My unfinished start at this is to create an entry “rDependencies” in the package.json, which npm will then expose to my script in the system environment as “npm_package_rDependencies_…”. But I have to move on and so this is unfinished as of yet:

package.json

...
  "dependencies": {
      "calvad_rscripts": "jmarca/calvad_rscripts",
      "rcouchutils":"git://github.com/jmarca/rstats_couch_utils.git",
      "configr":"git://github.com/jmarca/configr.git"
  },
  "devDependencies": {
    "should": "^6.0.1"
  },
  "rDependencies":{
      "readr":"0.1.1.9000"
  }
  "scripts": {
      "test": "/usr/bin/Rscript Rtest.R",
      "preinstall": "/usr/bin/Rscript Requirements.R",
      "install":"/usr/bin/Rscript Rinstall.R"
  },
...

script snippet to read package.json dependencies

## ideally I would plumb versions from package.json environment variables?

envrr <- Sys.getenv()
print(envrr)
dependencies <- grep(pattern='npm_package_rDependencies'
                    ,x=names(envrr),perl=TRUE,value=TRUE)
print(dependencies)
pkgs <- strsplit(x=dependencies,split='npm_package_rDependencies_')
print(pkgs)
for(i in 1:length(dependencies)){
    pkg <- pkgs[[i]][2]
    ver <- envrr[[dependencies[i]]]
    vc <-  list(op=">=",version=package_version(ver))
    print(vc)
    if(!requireNamespace(package=pkg,versionCheck=vc)){
        print('need to download')
        devtools::install_github(paste('hadley',pkg,sep='/'))
        ## whoops, need to add proper github user, repo name here
    }else{
        print(paste('got',pkg,ver,'already'))
    }
}

Really I need to specify the required development R package like:

  "rDependencies":{
      "readr":{
          "repo":"hadley/readr",
          "version":"0.1.1.9000"
      }
  },

But the hacking gets uglier and uglier because this is passed to the script as npm_package_rDependencies_readr_repo and npm_package_rDependencies_readr_version
which means my braindead regexpr and split calls will need to be tweaked and patched some more to combine the repo and the version with the package.

So, future me, you have work to do and another blog post when you get this cleaned up.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s