Writing tests and learning promises in node.js (part 2)

(In which I explore how errors are thrown and caught from within promises.)

Following up on my last post documenting work fixing up my bulkdoc appender code, for the paste few days I’ve worked on my couch_set_state code. The purpose of this package is to facilitate using CouchDB as a way to save state. I find this incredibly convenient for tracking jobs across multiple machines. I just set up my state tracking database to replicate between all my computers, and then can coordinate work between them by having code check the local CouchDB for state—things like whether a chunk of data has been processed, whether figures have been rendered, and so on. This package has utilities to set state, and a companion package couch_check_state facilitates checking state.

One of the things I wrote for the bulkdocs package was the ability to generate database conflicts. One of the design features of CouchDB is that it refuses to save a document unless the saver is updating the latest version of the document. The way my libraries work (both the bulkdocs appender and this state setting code) they first get the latest version of the document from the database, then they make their changes, then they save the updated document to the database. The problem is that if many jobs are running at once it is always possible that a race condition will arise where two processes are trying to update the same document at the same time. I actually hit this in practice last week, which is why I decided to write test code to trigger this issue. Continue reading

Writing tests and learning promises in node.js

(In which I describe how I did a naive transition to using promises instead of callbacks, and then did something slightly better.)

At the moment, I’m in an under-employed state. Hopefully the consulting will pick up soon, but instead of looking for a real job I’m cleaning up lots of old library code I use (and learning new things like Elixir and Kafka, but that’s a different topic). This post is the second about my work learning to use promises and async/await.

Today I worked on upgrading my code for saving and/or updating bulk docs to CouchDB (couchdb_bulkdoc_appender). As I wrote in my last post, I’m switching from mocha to tap, trying never to use lodash and d3-queue, and trying to use promises or async/await rather than callbacks. Replacing callbacks with promises is tricky because most of my code follows the usual node.js callback idiom. The last thing I want to do is break old code as a result of a stray npm install.

One strategy I’ve been using to migrate to promises is to test the callback function. If it is falsy (null or undefined), then I return a Promise, and if it is true, I run through my usual callback approach. For example, my old code did this:

Continue reading

Mixing async/await, promises, and callbacks

I’ve been editing some older code, but at the same time I’ve been trying to be strict about upgrading my node.js to a more modern idiom. That means wherever and whenever possible, I want to use node tap for my tests, and async/await and Promise instead of callbacks.

In the past, I have used the async library, and then later the d3 queue library for handling asynchronous functions. But no longer—my intent is to never use those libraries again.

Of course one big barrier to all this is that I have little experience with using Promises and async/await. This is the first in hopefully a set of posts that talk about some small things that I discover. Hopefully they will be useful to someone else.

First up, mixing callbacks and promises. As I said above, I’m switching to using the node tap library for testing. The docs indicate that the callback function passed into t.test (the signature is t.test([name], [options], [function])) should return a Promise in order to inform tap that this test is asynchronous. I’m writing new tests for older code, and I don’t have the luxury of rewriting my old code to switch from callbacks to Promises. On the other hand, I didn’t want to mess around with figuring out how to use old-style node.js callback conventions with t.test.

As an example, an old library code looked like:

// pass in year, config_file (optional) in opts object
function get_wim_imputed_status(opts,cb){
    var year = +opts.year
    var o = set_couchdb_options(opts)
    o['view']= '_design/wim/_view/imputed_status'
    o['startkey'] = [year]
    o['endkey'] = [year,"\ufff0"] // verified high sentinel as of 1.6.2 couchdb
    o['reduce'] = false
                  throw new Error('oops')
               return null
    return null

Pretty basic function—all it does is poke CouchDB and get some response. My old tests, using mocha, looked something like this:

    it('should get the imputed status of all the wim sites 2012',function(done){
                                             ... other stuff ...
                                             return done()

Eventually I will rewrite everything to use Promises, doing that now would also require rewriting my viewer library, and then rewriting all of the code that uses the viewer library, and so on. Someday maybe, but not today. So what I really need is a way to quickly wrap this callback in a Promise.

There are a few libraries that exist to wrap callback style functions in Promises. Node version 8.0.0 adds a very handy function called util.promisify (source code link is here), but I’m only running node version 8 on my laptop and on Travis CI, so that won’t work for me.

While the node.js Promisify wrapper is robust and covers lots of edge cases, it is pretty easy to hack up a much simpler version for more limited use. What I did was Continue reading

Learning Elm

After trying to learn Clojure and ClojureScript, I got to the point where I was able to make my map app and load it with data. However, loading the initial map of California took 10 seconds, and loading up and coloring traffic volume tiles on the map took another 5. Super slow.

This is not a knock against ClojureScript, etc, but rather a knock on my understanding and abilities in said language. I suck at clojure.

So I switched to Elm.

Here is my first trial with Elm, doing the letterfall thing.


Transfer prints

It is possible to make transfer prints from ink jet print outs. Professor Gerald R. Van Hecke was absolutely correct when he said, back in 1989, that I should use my knowledge of chemistry rather than saying lighter fluid “magically” lifts off the images from magazines. Had I listened, I might have been open to other methods.

Apparently, the techniques are all dependent upon chemistry—something needs to attack the bonds between the ink particles and the paper. In my old use of Zippo fluid and magazines, the lighter fluid did the trick. With Polaroid type 669 transfers, in the one case the ink hasn’t yet transferred to the photo paper, and in the emulsion transfer technique, the hot water dissolves the bond between the emulsion and the paper. These new-to-me techniques (it seems most articles on the internet were from 2011 through 2013, with nothing much new happening that I can find) some substance is used to lift the ink.

A good series of articles is here, a long article covering lots of different lifting media is here, and some all-in-one PDFs are here for gel printing and here for direct transfers. This last recipe is one of many approaches that print to non-porous surfaces (cheap plastic overheads; glossy backing to printable stickers; etc.) and then slap that surface down on the receiving surface before the ink has had much chance to dry.

So next weekend’s project is lined up I guess.

How to use npm to conditionally download a file

I am working on a package to process OpenStreetMap data, cleaning up some older code. My old REAMDE used to say "download the file from…", but sometimes people have trouble with that step. What I wanted to do was to automate the process, to check if the downloaded OSM file was older than some period (say 30 days), and if so, to download a new snapshot. I also wanted to use npm, because then it would cooperate nicely with all my other crazy uses of npm, such as building and packaging R scripts. Because I couldn’t find any exact recipes on the internet, here’s how I did it.

First, check out how to use npm as a build tool and the more recent why npm scripts. Both of these posts are excellent introductions to using npm scripts.

For my problem, there are two basic tasks I need to solve with npm scripts. First, I need to be able to check the age of a file, and second I need to be able to download a file. Note that because I only run Linux, I’m not even going to pretend that my solution is portable. Mac OSX users can probably use similar commands, but Windows users are likely going to have to change things around a bit. With that Linux-centric caveat aside, here is how I solved this problem.

File age

To determine if a file is too old I can use find.

find . -name "thefilename" -mtime +30

This will find a file called "thefilename" if it is older than 30 days (more or less…there is some gray area about how fractional days get counted). Rather than using this as an if statement, it’s probably easier to just use the built-in "-delete" operator in find to remove any file older than 30 days.

Download a file

To download a file, I can use curl. Specifically, I want to download the California latest file from geofabrik, so I would use

curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf > california-latest.osm.pbf

Fitting into run scripts, mistakes and all

Delete the old file

My idea was to use the find command to delete a file that is older than my desired age, and then to use the curl command to download a file if and only if it doesn’t yet exist.

First, I codified the delete operation into a run script as follows:

"build:pbfclean":"find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete"

Running that failed spectacularly!

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete

find: `./binaries': No such file or directory

npm ERR! Linux 4.4.10
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "build:pbfclean"
npm ERR! node v6.2.0
npm ERR! npm  v3.8.9
npm ERR! osm_load@1.0.0 build:pbfclean: `find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the osm_load@1.0.0 build:pbfclean script 'find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the osm_load package,
npm ERR! not with npm itself.

The problem is that find failed because I hadn’t created the destination directory yet. I don’t really want to create a directory just to empty it, so instead I tried running a test first.

So I extended the script a little bit:

"build:pbfclean":"test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete"

This was another crashing failure:

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete

npm ERR! Linux 4.4.10
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "build:pbfclean"
npm ERR! node v6.2.0
npm ERR! npm  v3.8.9
npm ERR! osm_load@1.0.0 build:pbfclean: `test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the osm_load@1.0.0 build:pbfclean script 'test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the osm_load package,

The problem here is that the test -d binaries was doing its job, but was exiting with a non-zero condition. Reading the docs (npm help scripts) shows that non-zero exit is interpreted as a problem:


  • Don’t exit with a non-zero error code unless you really mean it. Except for uninstall scripts, this will cause the npm action to fail, and potentially be rolled back. If the failure is minor or only will prevent some optional features, then it’s better to just print a warning and exit successfully.

So test is the wrong tool to use here, so I switched to if;then;fi:

"build:pbfclean":"if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi",

And the results are better:

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi

Unfortunately, while that doesn’t crash, I also want to check that it works to delete a file older than 30 days. So I made the directory in question, grabbed any old file older than 30 days, copied it into place and renamed it california-latest.osm.pbf:

find ~ -maxdepth 1 -mtime +30


james@emma osm_load[master]$ ls -lrt ~/3.7.10.generic.config
-rw-r--r-- 1 james users 129512 Sep 27  2014 /home/james/3.7.10.generic.config
james@emma osm_load[master]$ mkdir binaries/osm -p
james@emma osm_load[master]$ rsync -a ~/3.7.10.generic.config binaries/osm/california-latest.osm.pbf
james@emma osm_load[master]$ ls -lrt binaries/osm/
total 128
-rw-r--r-- 1 james users 129512 Sep 27  2014 california-latest.osm.pbf
james@emma osm_load[master]$  find ./binaries/osm -name california-latest.osm.pbf -mtime +30

Now running my build:pbfclean should delete that file:

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi

james@emma osm_load[master]$ ls -lrt binaries/osm/
total 0


Download a new file

To download a new file I need to run a simple curl command, but I also need to do two other things first. I need to make sure first that the destination directory is there, and second that the file does not already exist.

To make sure the destination directory exists, all I have to do is run mkdir -p. Alternately, I could check if the directories exist, and then run mkdir -p if they don’t, but that seems excessive for a simple two level path.

"build:pbfdir":"mkdir -p binaries/osm",

To test if the file exists already (and so to skip the download), I used if; then; fi again (having already been burned by test) as follows:

"build:pbfget":"if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi "

Here the -e option checks if the file exists, and if it does not (the ! modifier before the -e) then it will run the curl download. If the file does exist, then nothing will happen.

Putting them together, I first call the build:pbfdir script, and then do the curl download check and execute:

"build:pbfdir":"mkdir -p binaries/osm",
"build:pbfget":"npm run build:pbfdir -s && if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi "

(I couldn’t find the -s option documented anywhere in the npm docs, but apparently it suppresses output.)

It works fine:

james@emma osm_load[master]$ npm run build:pbfget

> osm_load@1.0.0 build:pbfget /home/james/repos/jem/calvad/sqitch_packages/osm_load
> npm run build:pbfdir -s && if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  578M    0  1194    0     0   1329      0   5d 06h --:--:--   5d 06h  1669^C

Of course, I could have just slotted the mkdir -p command inside of the build:pbfget command, but this is a better example of how to cascade two run scripts. And besides, maybe in the future I will be using a symbolic link pointing into a big disk, and so mkdir -p would be inappropriate.

The final scripts portion of my package looks like this:

  "name": "osm_load",
  "version": "1.0.0",
  "description": "Load OSM data (California) into a local database",
  "main": "load.js",
  "scripts": {
      "test": "tap test/*.js",
      "build": "npm run build:pbfclean  && npm run build:pbfget",
      "build:pbfclean":"if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi",
      "build:pbfdir":"mkdir -p binaries/osm",
      "build:pbfget":"npm run build:pbfdir -s && if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi "

An example of using sqitch with cross-project dependencies

File this as yet another post of something I couldn’t find when searching the internet. I recently started using sqitch and despite the horrible spelling (dude, my brain always puts ‘u’ after ‘q’; not cool), the tool is incredibly helpful to bring order to my chaotic database mis-management practices.

This post isn’t about sqitch itself—there are lots of tutorials available for that—but rather about using sqitch to manage dependencies across projects. When learning sqitch I made a big project and dumped all of the deploy/revert/verify/test rules for each table/schema/function I needed for part of a database. Now that I have a little bit of a clue, I’m moving towards smaller, do-one-thing packages. But to enable that, I had to figure out how to enable cross-project dependencies in sqitch.

There isn’t an example that I could find anywhere, so I just hacked on the sqitch.plan file until things worked.

This is the original plan from the monolithic project. This snippet first adds a schema, then a counties table, then a city abbreviations table. The cities are located inside counties, and there are links between the two tables, so the cities sql needs to depend on the counties, and both need to depend on the schema.


appschema 2016-02-02T20:56:11Z James E. Marca <james@example.com> # Add schema for geocoding work.
counties_fips [appschema] 2016-02-02T23:23:13Z James E. Marca <james@example.com> # Add counties_fips table.
city_abbrevs [appschema counties_fips] 2016-02-04T17:57:02Z James E. Marca <james@example.com> # Add city abbreviations.

Splitting this into three projects, one for the schema, one for counties, and one for cities:

First, after initializing and adding for the geocode_schema package, the sqitch.plan looks like:


geocode_schema 2016-03-16T16:30:54Z James E. Marca <james@example.com> # add schema for geocoding

Now when creating the county package, when I add the new sql I have to declare its dependency on the geocoding schema package:

sqitch add county_fips --requires calvad_db_geocode_schema:geocode_schema -n 'county_fips table'

Unlike in the sqitch tutorials, here the project-specific dependency has the form “project_name:change_name” instead of just “change_name”.

This creates the plan file something like the following:


county_fips [calvad_db_geocode_schema:geocode_schema] 2016-03-16T17:02:35Z James E. Marca <james@example.com> # county_fips table

As usual, after slotting in a non-trivial deploy/county_fips.sql, etc, if I simply attempt to deploy this new addition to a database it will fail.

james@emma calvad_db_county[testingsqitch]$ sqitch deploy db:pg:sqitchtesting
Adding registry tables to db:pg:sqitchtesting
Deploying changes to db:pg:sqitchtesting
Missing required change: calvad_db_geocode_schema:geocode_schema

So the acid test: deploy the calvad_db_geocode_schema:geocode_schema project:

james@emma calvad_db_county[testingsqitch]$ cd ../calvad_db_geocode_schema/           
james@emma calvad_db_geocode_schema$ sqitch deploy --verify db:pg:sqitchtesting
Deploying changes to db:pg:sqitchtesting
  + geocode_schema .. ok
james@emma calvad_db_geocode_schema$ cd ../calvad_db_county/                   
james@emma calvad_db_county[testingsqitch]$ sqitch deploy --verify db:pg:sqitchtesting
Deploying changes to db:pg:sqitchtesting
  + county_fips .. ok

It worked!

Popping into the database and looking at the sqitch tables is also instructive:

psql (9.4.5)
Type "help" for help.

sqitchtesting=# \dt sqitch.
           List of relations
 Schema |     Name     | Type  | Owner 
 sqitch | changes      | table | james
 sqitch | dependencies | table | james
 sqitch | events       | table | james
 sqitch | projects     | table | james
 sqitch | releases     | table | james
 sqitch | tags         | table | james
(6 rows)

sqitchtesting=# select change_id,change,project,note from sqitch.changes ;
                change_id                 |     change     |         project          |           note           
 f0964df0e223700ad34d9bd50bd48a8cde14d0f5 | geocode_schema | calvad_db_geocode_schema | add schema for geocoding
 e4f6cae819e3c6753518dac9c4922c18853f6d88 | county_fips    | calvad_db_county         | county_fips table
(2 rows)

sqitchtesting=# \d sqitch.projects
                            Table "sqitch.projects"
    Column     |           Type           |             Modifiers              
 project       | text                     | not null
 uri           | text                     | 
 created_at    | timestamp with time zone | not null default clock_timestamp()
 creator_name  | text                     | not null
 creator_email | text                     | not null
    "projects_pkey" PRIMARY KEY, btree (project)
    "projects_uri_key" UNIQUE CONSTRAINT, btree (uri)
Referenced by:
    TABLE "sqitch.changes" CONSTRAINT "changes_project_fkey" FOREIGN KEY (project) REFERENCES sqitch.projects(project) ON UPDATE CASCADE
    TABLE "sqitch.events" CONSTRAINT "events_project_fkey" FOREIGN KEY (project) REFERENCES sqitch.projects(project) ON UPDATE CASCADE
    TABLE "sqitch.tags" CONSTRAINT "tags_project_fkey" FOREIGN KEY (project) REFERENCES sqitch.projects(project) ON UPDATE CASCADE

sqitchtesting=# select * from sqitch.projects;
         project          |                           uri                     |          created_at           |  creator_name  |      creator_email      
 calvad_db_county         | git@example.com/a/jmarca/calvad_db_county         | 2016-03-16 10:38:05.402549-07 | James E. Marca | james@example.com
 calvad_db_geocode_schema | git@example.com/a/jmarca/calvad_db_geocode_schema | 2016-03-16 10:38:14.244119-07 | James E. Marca | james@example.com
(2 rows)

Actually I don’t like the design of the projects table at all. In my opinion, the unique key for projects should be the URI, not the project name. That quibble aside, it is clear that sqitch can indeed use dependencies that are defined in external projects.

Now the next step for me is to wire this up inside of npm to make npm install pull down sqitch dependencies from the sqitch URI and then deploy/verify them, so that the package is ready for its own deploy/verify/test dance.

Dump a doc from CouchDB with attachments

In order to dummy up a test in node.js, I need data to populate a testing CouchDB database. Specifically, I am testing some code that creates statistics plots (in R) and then saves them to a doc as attachments. So for my tests, I need at least one document with its PNG attachments already in place.

I couldn’t find a simple “howto” for this on the Internet, so here’s a note to my future self.

First of all, the CouchDB docs are great, and curl is your friend. Curl lets you set the headers. In this case, I don’t want HTML to come back, I want a valid JSON document, so (in typical belt-and-suspenders style) I specify both the content type and the accept header parameters to be application/json as follows:

curl -H 'Content-Type: application/json' \
-H 'Accept: application/json' \> 801447.json

The returned document has encoded the binary PNG files as JSON fields, in accordance with the CouchDB specs:

  {"name":"SERFAS CLUB",

Lovely binary-to-hex, looking good.

To verify that the returned document is actually valid json, I use the command line some more (and I’m not sure which Linux library installed this, but there are several JSON pretty printers and verifiers out there):

james@emma files[bug/fixplots]$ json_verify< 801451.json

JSON is valid

Then to use the document in my test, all I have to do is read it in and send it off:

function put_json_file(file,couchurl,cb){
    var db_dump = require(file) // in node you can require json too!
        return cb(e)
    return null

To see that in action, I put my various CouchDB-related utilities in a file here, and then my actual test has a before job that creates the CouchDB database and populates it, and a corresponding after task that deletes the temporary database.

The choice between hard-coding and parameterization

I am revising some older code, and once again facing the choice of whether to hard-code database names and database tables in my code. Ordinarily I’d say no way, that’s stupid. But in this case, it might not be.

This post has zero pat answers, and is really just me thinking about things now so that my future self can revisit my thoughts and reevaluate my conclusions. However, I haven’t written a post in a while, and there is a slight chance someone else might find this useful in some way. Also, I’ve been chosen to present at this year’s PGConf US, and my talk is about testing SQL, so this is somewhat relevant to that.

So the code. First of all, a while back I wrote a general purpose query generator to grab shape data from PostgreSQL/PostGIS. See https://github.com/jmarca/shapes_postgis. That library allows me to do some fairly complicated queries from my express-based server through liberal use of parameterization. For example, in my tests, I have a handler defined as:

var app = express()
var vds_options=_.assign({
    ,'select_properties':{'tvd.freeway_id' : 'freeway'
                          ,'tvd.freeway_dir': 'direction'
                          ,"'vdsid_' || id"   : 'detector_id'
                          ,'vdstype'        : 'type'
    ,'username' : config.postgresql.auth.username
    ,'password' : config.postgresql.auth.password

var vdsservice = shape_service(vds_options)

var server=http

This is cool because the query coming in from the client can specify which type of vds to return. For example (again pulling from the tests), the query can be:

request({url:'http://'+ testhost +':'+_testport+'/points/10/174/407.json?vdstype=\'ff\''
            if(e) return done(e)

By adding vdstype='ff' in my http query, the generated SQL query will limit the result to those entries that only match “ff”. The generated SQL is:

with bounding_area as (
    select geom4326 as geom
    from public.carb_airdistricts_aligned_03
    where dis='SC'
SELECT tvd.freeway_id as freeway,
       tvd.freeway_dir as direction,
       'vdsid_' || id as detector_id,
       vdstype as type,
            )).geom,0.1),1) as geojson
FROM newtbmap.tvd as tvd
JOIN bounding_area ON (st_intersects(tvd.geom,bounding_area.geom))
WHERE vdstype~*'ff'

Continue reading