How to use npm to conditionally download a file

I am working on a package to process OpenStreetMap data, cleaning up some older code. My old REAMDE used to say "download the file from…", but sometimes people have trouble with that step. What I wanted to do was to automate the process, to check if the downloaded OSM file was older than some period (say 30 days), and if so, to download a new snapshot. I also wanted to use npm, because then it would cooperate nicely with all my other crazy uses of npm, such as building and packaging R scripts. Because I couldn’t find any exact recipes on the internet, here’s how I did it.

First, check out how to use npm as a build tool and the more recent why npm scripts. Both of these posts are excellent introductions to using npm scripts.

For my problem, there are two basic tasks I need to solve with npm scripts. First, I need to be able to check the age of a file, and second I need to be able to download a file. Note that because I only run Linux, I’m not even going to pretend that my solution is portable. Mac OSX users can probably use similar commands, but Windows users are likely going to have to change things around a bit. With that Linux-centric caveat aside, here is how I solved this problem.

File age

To determine if a file is too old I can use find.

find . -name "thefilename" -mtime +30

This will find a file called "thefilename" if it is older than 30 days (more or less…there is some gray area about how fractional days get counted). Rather than using this as an if statement, it’s probably easier to just use the built-in "-delete" operator in find to remove any file older than 30 days.

Download a file

To download a file, I can use curl. Specifically, I want to download the California latest file from geofabrik, so I would use

curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf > california-latest.osm.pbf

Fitting into run scripts, mistakes and all

Delete the old file

My idea was to use the find command to delete a file that is older than my desired age, and then to use the curl command to download a file if and only if it doesn’t yet exist.

First, I codified the delete operation into a run script as follows:

"build:pbfclean":"find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete"

Running that failed spectacularly!

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete

find: `./binaries': No such file or directory

npm ERR! Linux 4.4.10
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "build:pbfclean"
npm ERR! node v6.2.0
npm ERR! npm  v3.8.9
npm ERR! code ELIFECYCLE
npm ERR! osm_load@1.0.0 build:pbfclean: `find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the osm_load@1.0.0 build:pbfclean script 'find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the osm_load package,
npm ERR! not with npm itself.
...

The problem is that find failed because I hadn’t created the destination directory yet. I don’t really want to create a directory just to empty it, so instead I tried running a test first.

So I extended the script a little bit:

"build:pbfclean":"test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete"

This was another crashing failure:

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete


npm ERR! Linux 4.4.10
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "build:pbfclean"
npm ERR! node v6.2.0
npm ERR! npm  v3.8.9
npm ERR! code ELIFECYCLE
npm ERR! osm_load@1.0.0 build:pbfclean: `test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the osm_load@1.0.0 build:pbfclean script 'test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the osm_load package,

The problem here is that the test -d binaries was doing its job, but was exiting with a non-zero condition. Reading the docs (npm help scripts) shows that non-zero exit is interpreted as a problem:

BEST PRACTICES

  • Don’t exit with a non-zero error code unless you really mean it. Except for uninstall scripts, this will cause the npm action to fail, and potentially be rolled back. If the failure is minor or only will prevent some optional features, then it’s better to just print a warning and exit successfully.

So test is the wrong tool to use here, so I switched to if;then;fi:

"build:pbfclean":"if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi",

And the results are better:

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi

Unfortunately, while that doesn’t crash, I also want to check that it works to delete a file older than 30 days. So I made the directory in question, grabbed any old file older than 30 days, copied it into place and renamed it california-latest.osm.pbf:

find ~ -maxdepth 1 -mtime +30

...
/home/james/3.7.10.generic.config
...

james@emma osm_load[master]$ ls -lrt ~/3.7.10.generic.config
-rw-r--r-- 1 james users 129512 Sep 27  2014 /home/james/3.7.10.generic.config
james@emma osm_load[master]$ mkdir binaries/osm -p
james@emma osm_load[master]$ rsync -a ~/3.7.10.generic.config binaries/osm/california-latest.osm.pbf
james@emma osm_load[master]$ ls -lrt binaries/osm/
total 128
-rw-r--r-- 1 james users 129512 Sep 27  2014 california-latest.osm.pbf
james@emma osm_load[master]$  find ./binaries/osm -name california-latest.osm.pbf -mtime +30
./binaries/osm/california-latest.osm.pbf

Now running my build:pbfclean should delete that file:

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi

james@emma osm_load[master]$ ls -lrt binaries/osm/
total 0

Success!

Download a new file

To download a new file I need to run a simple curl command, but I also need to do two other things first. I need to make sure first that the destination directory is there, and second that the file does not already exist.

To make sure the destination directory exists, all I have to do is run mkdir -p. Alternately, I could check if the directories exist, and then run mkdir -p if they don’t, but that seems excessive for a simple two level path.

"build:pbfdir":"mkdir -p binaries/osm",

To test if the file exists already (and so to skip the download), I used if; then; fi again (having already been burned by test) as follows:

"build:pbfget":"if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi "

Here the -e option checks if the file exists, and if it does not (the ! modifier before the -e) then it will run the curl download. If the file does exist, then nothing will happen.

Putting them together, I first call the build:pbfdir script, and then do the curl download check and execute:

"build:pbfdir":"mkdir -p binaries/osm",
"build:pbfget":"npm run build:pbfdir -s && if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi "

(I couldn’t find the -s option documented anywhere in the npm docs, but apparently it suppresses output.)

It works fine:

james@emma osm_load[master]$ npm run build:pbfget

> osm_load@1.0.0 build:pbfget /home/james/repos/jem/calvad/sqitch_packages/osm_load
> npm run build:pbfdir -s && if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  578M    0  1194    0     0   1329      0   5d 06h --:--:--   5d 06h  1669^C

Of course, I could have just slotted the mkdir -p command inside of the build:pbfget command, but this is a better example of how to cascade two run scripts. And besides, maybe in the future I will be using a symbolic link pointing into a big disk, and so mkdir -p would be inappropriate.

The final scripts portion of my package looks like this:

{
  "name": "osm_load",
  "version": "1.0.0",
  "description": "Load OSM data (California) into a local database",
  "main": "load.js",
  "scripts": {
      "test": "tap test/*.js",
      "build": "npm run build:pbfclean  && npm run build:pbfget",
      "build:pbfclean":"if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi",
      "build:pbfdir":"mkdir -p binaries/osm",
      "build:pbfget":"npm run build:pbfdir -s && if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi "
  }
}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s