Transfer prints

It is possible to make transfer prints from ink jet print outs. Professor Gerald R. Van Hecke was absolutely correct when he said, back in 1989, that I should use my knowledge of chemistry rather than saying lighter fluid “magically” lifts off the images from magazines. Had I listened, I might have been open to other methods.

Apparently, the techniques are all dependent upon chemistry—something needs to attack the bonds between the ink particles and the paper. In my old use of Zippo fluid and magazines, the lighter fluid did the trick. With Polaroid type 669 transfers, in the one case the ink hasn’t yet transferred to the photo paper, and in the emulsion transfer technique, the hot water dissolves the bond between the emulsion and the paper. These new-to-me techniques (it seems most articles on the internet were from 2011 through 2013, with nothing much new happening that I can find) some substance is used to lift the ink.

A good series of articles is here, a long article covering lots of different lifting media is here, and some all-in-one PDFs are here for gel printing and here for direct transfers. This last recipe is one of many approaches that print to non-porous surfaces (cheap plastic overheads; glossy backing to printable stickers; etc.) and then slap that surface down on the receiving surface before the ink has had much chance to dry.

So next weekend’s project is lined up I guess.

How to use npm to conditionally download a file

I am working on a package to process OpenStreetMap data, cleaning up some older code. My old REAMDE used to say "download the file from…", but sometimes people have trouble with that step. What I wanted to do was to automate the process, to check if the downloaded OSM file was older than some period (say 30 days), and if so, to download a new snapshot. I also wanted to use npm, because then it would cooperate nicely with all my other crazy uses of npm, such as building and packaging R scripts. Because I couldn’t find any exact recipes on the internet, here’s how I did it.

First, check out how to use npm as a build tool and the more recent why npm scripts. Both of these posts are excellent introductions to using npm scripts.

For my problem, there are two basic tasks I need to solve with npm scripts. First, I need to be able to check the age of a file, and second I need to be able to download a file. Note that because I only run Linux, I’m not even going to pretend that my solution is portable. Mac OSX users can probably use similar commands, but Windows users are likely going to have to change things around a bit. With that Linux-centric caveat aside, here is how I solved this problem.

File age

To determine if a file is too old I can use find.

find . -name "thefilename" -mtime +30

This will find a file called "thefilename" if it is older than 30 days (more or less…there is some gray area about how fractional days get counted). Rather than using this as an if statement, it’s probably easier to just use the built-in "-delete" operator in find to remove any file older than 30 days.

Download a file

To download a file, I can use curl. Specifically, I want to download the California latest file from geofabrik, so I would use

curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf > california-latest.osm.pbf

Fitting into run scripts, mistakes and all

Delete the old file

My idea was to use the find command to delete a file that is older than my desired age, and then to use the curl command to download a file if and only if it doesn’t yet exist.

First, I codified the delete operation into a run script as follows:

"build:pbfclean":"find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete"

Running that failed spectacularly!

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete

find: `./binaries': No such file or directory

npm ERR! Linux 4.4.10
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "build:pbfclean"
npm ERR! node v6.2.0
npm ERR! npm  v3.8.9
npm ERR! code ELIFECYCLE
npm ERR! osm_load@1.0.0 build:pbfclean: `find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the osm_load@1.0.0 build:pbfclean script 'find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the osm_load package,
npm ERR! not with npm itself.
...

The problem is that find failed because I hadn’t created the destination directory yet. I don’t really want to create a directory just to empty it, so instead I tried running a test first.

So I extended the script a little bit:

"build:pbfclean":"test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete"

This was another crashing failure:

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete


npm ERR! Linux 4.4.10
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "build:pbfclean"
npm ERR! node v6.2.0
npm ERR! npm  v3.8.9
npm ERR! code ELIFECYCLE
npm ERR! osm_load@1.0.0 build:pbfclean: `test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the osm_load@1.0.0 build:pbfclean script 'test -d binaries && test -d osm && find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the osm_load package,

The problem here is that the test -d binaries was doing its job, but was exiting with a non-zero condition. Reading the docs (npm help scripts) shows that non-zero exit is interpreted as a problem:

BEST PRACTICES

  • Don’t exit with a non-zero error code unless you really mean it. Except for uninstall scripts, this will cause the npm action to fail, and potentially be rolled back. If the failure is minor or only will prevent some optional features, then it’s better to just print a warning and exit successfully.

So test is the wrong tool to use here, so I switched to if;then;fi:

"build:pbfclean":"if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi",

And the results are better:

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi

Unfortunately, while that doesn’t crash, I also want to check that it works to delete a file older than 30 days. So I made the directory in question, grabbed any old file older than 30 days, copied it into place and renamed it california-latest.osm.pbf:

find ~ -maxdepth 1 -mtime +30

...
/home/james/3.7.10.generic.config
...

james@emma osm_load[master]$ ls -lrt ~/3.7.10.generic.config
-rw-r--r-- 1 james users 129512 Sep 27  2014 /home/james/3.7.10.generic.config
james@emma osm_load[master]$ mkdir binaries/osm -p
james@emma osm_load[master]$ rsync -a ~/3.7.10.generic.config binaries/osm/california-latest.osm.pbf
james@emma osm_load[master]$ ls -lrt binaries/osm/
total 128
-rw-r--r-- 1 james users 129512 Sep 27  2014 california-latest.osm.pbf
james@emma osm_load[master]$  find ./binaries/osm -name california-latest.osm.pbf -mtime +30
./binaries/osm/california-latest.osm.pbf

Now running my build:pbfclean should delete that file:

james@emma osm_load[master]$ npm run build:pbfclean

> osm_load@1.0.0 build:pbfclean /home/james/repos/jem/calvad/sqitch_packages/osm_load
> if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi

james@emma osm_load[master]$ ls -lrt binaries/osm/
total 0

Success!

Download a new file

To download a new file I need to run a simple curl command, but I also need to do two other things first. I need to make sure first that the destination directory is there, and second that the file does not already exist.

To make sure the destination directory exists, all I have to do is run mkdir -p. Alternately, I could check if the directories exist, and then run mkdir -p if they don’t, but that seems excessive for a simple two level path.

"build:pbfdir":"mkdir -p binaries/osm",

To test if the file exists already (and so to skip the download), I used if; then; fi again (having already been burned by test) as follows:

"build:pbfget":"if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi "

Here the -e option checks if the file exists, and if it does not (the ! modifier before the -e) then it will run the curl download. If the file does exist, then nothing will happen.

Putting them together, I first call the build:pbfdir script, and then do the curl download check and execute:

"build:pbfdir":"mkdir -p binaries/osm",
"build:pbfget":"npm run build:pbfdir -s && if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi "

(I couldn’t find the -s option documented anywhere in the npm docs, but apparently it suppresses output.)

It works fine:

james@emma osm_load[master]$ npm run build:pbfget

> osm_load@1.0.0 build:pbfget /home/james/repos/jem/calvad/sqitch_packages/osm_load
> npm run build:pbfdir -s && if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  578M    0  1194    0     0   1329      0   5d 06h --:--:--   5d 06h  1669^C

Of course, I could have just slotted the mkdir -p command inside of the build:pbfget command, but this is a better example of how to cascade two run scripts. And besides, maybe in the future I will be using a symbolic link pointing into a big disk, and so mkdir -p would be inappropriate.

The final scripts portion of my package looks like this:

{
  "name": "osm_load",
  "version": "1.0.0",
  "description": "Load OSM data (California) into a local database",
  "main": "load.js",
  "scripts": {
      "test": "tap test/*.js",
      "build": "npm run build:pbfclean  && npm run build:pbfget",
      "build:pbfclean":"if [ -d binaries -a -d binaries/osm ]; then find ./binaries/osm -name california-latest.osm.pbf -mtime +30 -delete; fi",
      "build:pbfdir":"mkdir -p binaries/osm",
      "build:pbfget":"npm run build:pbfdir -s && if [ ! -e binaries/osm/california-latest.osm.pbf ]; then curl http://download.geofabrik.de/north-america/us/california-latest.osm.pbf -o binaries/osm/california-latest.osm.pbf; fi "
  }
}

An example of using sqitch with cross-project dependencies

File this as yet another post of something I couldn’t find when searching the internet. I recently started using sqitch and despite the horrible spelling (dude, my brain always puts ‘u’ after ‘q’; not cool), the tool is incredibly helpful to bring order to my chaotic database mis-management practices.

This post isn’t about sqitch itself—there are lots of tutorials available for that—but rather about using sqitch to manage dependencies across projects. When learning sqitch I made a big project and dumped all of the deploy/revert/verify/test rules for each table/schema/function I needed for part of a database. Now that I have a little bit of a clue, I’m moving towards smaller, do-one-thing packages. But to enable that, I had to figure out how to enable cross-project dependencies in sqitch.

There isn’t an example that I could find anywhere, so I just hacked on the sqitch.plan file until things worked.

This is the original plan from the monolithic project. This snippet first adds a schema, then a counties table, then a city abbreviations table. The cities are located inside counties, and there are links between the two tables, so the cities sql needs to depend on the counties, and both need to depend on the schema.

%syntax-version=1.0.0
%project=ge0coding
%uri=git@example.com/hpms_geocode

appschema 2016-02-02T20:56:11Z James E. Marca <james@example.com> # Add schema for geocoding work.
counties_fips [appschema] 2016-02-02T23:23:13Z James E. Marca <james@example.com> # Add counties_fips table.
city_abbrevs [appschema counties_fips] 2016-02-04T17:57:02Z James E. Marca <james@example.com> # Add city abbreviations.

Splitting this into three projects, one for the schema, one for counties, and one for cities:

First, after initializing and adding for the geocode_schema package, the sqitch.plan looks like:

%syntax-version=1.0.0
%project=calvad_db_geocode_schema
%uri=git@example.com/a/jmarca/calvad_db_geocode_schema

geocode_schema 2016-03-16T16:30:54Z James E. Marca <james@example.com> # add schema for geocoding

Now when creating the county package, when I add the new sql I have to declare its dependency on the geocoding schema package:

sqitch add county_fips --requires calvad_db_geocode_schema:geocode_schema -n 'county_fips table'

Unlike in the sqitch tutorials, here the project-specific dependency has the form “project_name:change_name” instead of just “change_name”.

This creates the plan file something like the following:

%syntax-version=1.0.0
%project=calvad_db_county
%uri=git@example.com/a/jmarca/calvad_db_county

county_fips [calvad_db_geocode_schema:geocode_schema] 2016-03-16T17:02:35Z James E. Marca <james@example.com> # county_fips table

As usual, after slotting in a non-trivial deploy/county_fips.sql, etc, if I simply attempt to deploy this new addition to a database it will fail.

james@emma calvad_db_county[testingsqitch]$ sqitch deploy db:pg:sqitchtesting
Adding registry tables to db:pg:sqitchtesting
Deploying changes to db:pg:sqitchtesting
Missing required change: calvad_db_geocode_schema:geocode_schema

So the acid test: deploy the calvad_db_geocode_schema:geocode_schema project:

james@emma calvad_db_county[testingsqitch]$ cd ../calvad_db_geocode_schema/           
james@emma calvad_db_geocode_schema$ sqitch deploy --verify db:pg:sqitchtesting
Deploying changes to db:pg:sqitchtesting
  + geocode_schema .. ok
james@emma calvad_db_geocode_schema$ cd ../calvad_db_county/                   
james@emma calvad_db_county[testingsqitch]$ sqitch deploy --verify db:pg:sqitchtesting
Deploying changes to db:pg:sqitchtesting
  + county_fips .. ok

It worked!

Popping into the database and looking at the sqitch tables is also instructive:

psql (9.4.5)
Type "help" for help.

sqitchtesting=# \dt sqitch.
           List of relations
 Schema |     Name     | Type  | Owner 
--------+--------------+-------+-------
 sqitch | changes      | table | james
 sqitch | dependencies | table | james
 sqitch | events       | table | james
 sqitch | projects     | table | james
 sqitch | releases     | table | james
 sqitch | tags         | table | james
(6 rows)

sqitchtesting=# select change_id,change,project,note from sqitch.changes ;
                change_id                 |     change     |         project          |           note           
------------------------------------------+----------------+--------------------------+--------------------------
 f0964df0e223700ad34d9bd50bd48a8cde14d0f5 | geocode_schema | calvad_db_geocode_schema | add schema for geocoding
 e4f6cae819e3c6753518dac9c4922c18853f6d88 | county_fips    | calvad_db_county         | county_fips table
(2 rows)

sqitchtesting=# \d sqitch.projects
                            Table "sqitch.projects"
    Column     |           Type           |             Modifiers              
---------------+--------------------------+------------------------------------
 project       | text                     | not null
 uri           | text                     | 
 created_at    | timestamp with time zone | not null default clock_timestamp()
 creator_name  | text                     | not null
 creator_email | text                     | not null
Indexes:
    "projects_pkey" PRIMARY KEY, btree (project)
    "projects_uri_key" UNIQUE CONSTRAINT, btree (uri)
Referenced by:
    TABLE "sqitch.changes" CONSTRAINT "changes_project_fkey" FOREIGN KEY (project) REFERENCES sqitch.projects(project) ON UPDATE CASCADE
    TABLE "sqitch.events" CONSTRAINT "events_project_fkey" FOREIGN KEY (project) REFERENCES sqitch.projects(project) ON UPDATE CASCADE
    TABLE "sqitch.tags" CONSTRAINT "tags_project_fkey" FOREIGN KEY (project) REFERENCES sqitch.projects(project) ON UPDATE CASCADE

sqitchtesting=# select * from sqitch.projects;
         project          |                           uri                     |          created_at           |  creator_name  |      creator_email      
--------------------------+---------------------------------------------------+-------------------------------+----------------+-------------------------
 calvad_db_county         | git@example.com/a/jmarca/calvad_db_county         | 2016-03-16 10:38:05.402549-07 | James E. Marca | james@example.com
 calvad_db_geocode_schema | git@example.com/a/jmarca/calvad_db_geocode_schema | 2016-03-16 10:38:14.244119-07 | James E. Marca | james@example.com
(2 rows)

Actually I don’t like the design of the projects table at all. In my opinion, the unique key for projects should be the URI, not the project name. That quibble aside, it is clear that sqitch can indeed use dependencies that are defined in external projects.

Now the next step for me is to wire this up inside of npm to make npm install pull down sqitch dependencies from the sqitch URI and then deploy/verify them, so that the package is ready for its own deploy/verify/test dance.

Dump a doc from CouchDB with attachments

In order to dummy up a test in node.js, I need data to populate a testing CouchDB database. Specifically, I am testing some code that creates statistics plots (in R) and then saves them to a doc as attachments. So for my tests, I need at least one document with its PNG attachments already in place.

I couldn’t find a simple “howto” for this on the Internet, so here’s a note to my future self.

First of all, the CouchDB docs are great, and curl is your friend. Curl lets you set the headers. In this case, I don’t want HTML to come back, I want a valid JSON document, so (in typical belt-and-suspenders style) I specify both the content type and the accept header parameters to be application/json as follows:

curl -H 'Content-Type: application/json' \
-H 'Accept: application/json' \
127.0.0.1:5984/my%2freal%2fdatabase/801447?attachments=true> 801447.json

The returned document has encoded the binary PNG files as JSON fields, in accordance with the CouchDB specs:

{"_id":"801447","_rev":"55-8e15623f21dce9ed556cfe96b9c85a8e",
"2012":{"properties":[
  {"name":"SERFAS CLUB",
   "cal_pm":"R3.688",
   "abs_pm":40.920000000000001705,
   "latitude_4269":"33.880712",
   "longitude_4269":"-117.613596",
   "lanes":1,
   "segment_length":"0.316",
   "freeway":91,
   "direction":"E",
   "vdstype":"ML",
   "district":8,
   "versions":["2012-12-04","2012-12-12"],
     "geojson":{"type":"Point",
                "crs":{"type":"name",
                       "properties":{"name":"EPSG:4326"}},
                "coordinates":[-117.62000000000000455,
                                 33.881000000000000227]}
               }
   ]},
"_attachments":{
 "801447_2012_raw_004.png":{
  "content_type":"image/png",
  "revpos":53,"digest":"md5-tF2vnhvNw7pLHlK31DVNUw==",
  "data":"iVBORw0KGgoAAAANSUhEUgAABkAAAAGQCAIAAAB59ztRAAAgAElEQVR4
nOzdeYAUxd038Opr7tmbXVYRXEQEOeSSy3greK1sIJqIRImaRONLPBJDTFBRDGp4Dl
GjicYj4oEJyimsyHItyiWPoCAYjQQQuZZll71m5+r3jwrtOEd1z0xP9TDz/fzDzNBb
v6ruquqemupqQVVVAgAAAAAAAAAAkK1EqzMAAAAAAAAAAADAggEsAAAAAAAAAADIah
jAAgAAAAAAAACArIYBLAAAAAAAAAAAyGoYwAIAAAAAAAAAgKyGASwAAAAAAAAAAMhq
GMACAAAAAAAAAICshgEsAAAAAAAAAADIahjAAgAAAAAAAACArIYBLAAAAAAAAAAAyG
oYwAIAAAAAAAAAgKyGASwAAAAAAAAAAMhqGMACAAAAAAAAAICshgEsAAAAAAAAAADI
ahjAAgAAAAAAAACArIYBLAAAAAAAAAAAyGoYwAIAAAAAAAAAgKyGASwAAAAAAAAAAM
hqGMACAAAAAAAAAICshgEsAAA ..."

Lovely binary-to-hex, looking good.

To verify that the returned document is actually valid json, I use the command line some more (and I’m not sure which Linux library installed this, but there are several JSON pretty printers and verifiers out there):

james@emma files[bug/fixplots]$ json_verify< 801451.json

JSON is valid

Then to use the document in my test, all I have to do is read it in and send it off:

function put_json_file(file,couchurl,cb){
    var db_dump = require(file) // in node you can require json too!
    superagent.post(couchurl)
    .type('json')
    .send(db_dump)
    .end(function(e,r){
        should.not.exist(e)
        should.exist(r)
        return cb(e)
    })
    return null
}

To see that in action, I put my various CouchDB-related utilities in a file here, and then my actual test has a before job that creates the CouchDB database and populates it, and a corresponding after task that deletes the temporary database.

cropped-sitebg2.jpg

stupid patents

Okay, Google just patented automated delivery vehicles. Dumb. Car with a lock on it. Not hard, super obvious. US009256852

And to paraphrase Mr. Bumble, “If the law supposes [that this kind of invention is patentable before we even have widespread use of driverless cars], then the law [(and Google)] is a ass—a idiot.”
Continue reading

Musing on summer tarts and cobblers

Two weeks ago I made a blueberry and nectarine cobbler, more or less sticking to the recipe from Thomas Keller’s book Ad Hoc at Home. My only variation was that I added nectarines too, not just blueberries. It was terrible; in my opinion the worst fruit cobbler I’ve ever made. The “cobbler” part became a gross, soggy layer of cake-like stuff on top of a too-thin layer of fruit. On the one hand, perhaps my pan was too big and the fruit spread out too much, but on the other hand, if the pan was too big, why did the topping (which was supposed to come out like individual dumplings) glob together into a single surface? Sucky recipe, bad quality control on the cookbook authors’ part, thereby reinforcing my dislike of celebrity chefs and their vanity cookbook projects.

Anyway, that disaster got me thinking about making another blueberry and nectarine cobbler. While I usually go for b&n pie with a proper crust, the time constraints of yesterday’s dinner party precluded putting in the time to make the crust. And Brooke wanted a cobbler.

So I started thinking what would make a good cobbler topping, and I remembered the success I had a long time ago making the caramel topping on pecan rolls. The basic idea is to press half a stick of butter into a cake pan, then layer on a cup or so of brown sugar. As the pecan rolls bake in the oven, the butter and sugar turn into caramel and infuse the pecan rolls with sticky goodness.

So I raided the fridge for some butter and discovered (horrors) that all I had left was a little blob of unsalted butter. But I also spied some clarified butter in a little container. Good enough, so I mixed the two and pressed them into the bottom of my cake pan. Being the good cook that I am, I licked the butter off my fingers—and discovered that the clarified butter wasn’t clarified butter, but rather left over butter-sage sauce!

It’s a funny thing but I am actually a pretty good taster of food (although I am not a very good taster of wine) (or else maybe I just drink a lot of swill) (but I digress). As I tasted the butter, I definitely tasted the sage, and I decided I was okay with that, but I could also taste a hint of garlic, and I was not okay with that. Since I had just crushed and chopped garlic for the sizzling shrimp I was going to make, I really had to think about whether it was my tongue tasting the garlic or my nose smelling it, and that gave me time to think about how the sage would work with the fruit.

I decided the garlic really was in the butter, and it had to go (actually I just added it to the oil I was going to use for the shrimp), and I grabbed a fresh stick of, sadly, salted butter. But I also decided that I really wanted the sage, so I trucked out to the garden to grab some sage leaves. My sage plant from several years got uprooted and didn’t survive this spring’s planting, so all I have is a variegated sage plant with lots of very small leaves. Still good, but I wanted the visual of the leaves, not just the flavor. Then I saw the lemon verbena plant we have growing next to the sage that we intended to use for tea but instead just let it grow. I remember Emma made some fruit dessert once—poached peaches I think—with lemon verbena in the sugar syrup, so I grabbed about 10 nice looking leaves along with the sage.

After washing all the leaves, I placed them in a sunburst pattern on top of the brown sugar I had pressed into the thick layer of butter. Then I added about a quarter of Julia Child’s apple crumble recipe crumble on top of the leaves so that I couldn’t see them any more, and then I tumbled alternating layers of nectarines and blueberries on top of that. Finally, when the fruit was about to the top of the cake pan, I topped it with the rest of the crumble topping (one cup oats, half cup flour, 6 tbsp butter, pinch of salt, 3/4 cup brown sugar, bzapper in the cuisinart to mix) and pressed it down firmly to make a solid layer of sugar-butter-oats.

My idea was to bake it for about an hour at 350 until I could see the caramel bubbling up the sides, and until I could see the fruit begin to bubble through the topping. Then I was going to flip the whole mess onto a big plate, so that the caramel and leaves ended up on top, and the crumble ended up on the bottom like a tart crust.

The results were visually disastrous, but the flavors were great. The few sage leaves really spiked the sugars and flavors of the fruit, and the lemon verbena added a hint of “mystery flavor” that is always fun in a dessert. The crumble crust didn’t add much for me, however, and I don’t think I’ll do that quite like that again.

Unfortunately, I used a completely wrong pan for the cake pan. I actually used a removable bottom pan, which was pretty stupid because a lot of the caramel seeped out onto the baking sheet (I’m not that stupid) rather than bubbling up the sides. And after I flipped the whole thing onto the serving platter, I realized this was just like a tarte tatin, and I could have made it in a cast iron skillet with a pie crust bottom.

So I’m going to make this again, but this time:

  1. use a cast iron pan
  2. maybe put the lemon verbena and sage leaves down first, then the butter, then the sugar, so that the leave show
  3. perhaps a graham cracker crust on top, so it holds together a bit more than the crumble, and gives a bit more crunch
  4. or else perhaps a puff pastry topping that becomes the bottoming, because how cool is it to have crispy puff pastry at the bottom of a oozy drippy fruit tart?

The best part about this dessert was its reception. I had a small serving and really liked the flavor, which is rare for me (I usually just eat my cooking rather than enjoy the flavors). After the first round there was about half the dessert still left on the plate. I mentioned that it looked like we hadn’t really made a dent in the dessert, and suddenly all the adults said they’d like more. In this day and age of low carbs and healthy eating, that’s a resounding success. Finally, when we were cleaning up, there was a very small serving left. I said—hah, we almost finished it!, whereupon Marc asked for a fork and finished it off right from the serving platter. A dessert that is all gone the night it was served is the best kind of dessert, in my opinion.

But while the flavors were great, there is room for improvement, and I have inspiration for more tarts and crumbles.

Using npm with R is great

A few weeks ago I wrote up how I am using npm, the standard package manager for node.js. When I first started using node.js, and when npm first started cropping up as the best package manager to use, I was annoyed by the idea of copying libraries over and over again into each package’s node_modules directory. It seemed wasteful, since they were mostly copies, so I would generally use the -g flag and install globally.

Then I ran into trouble with versions, and I decided my insistence on using -g was stupid. Better to have the required version locally installed than to fight with multiple versions of a package at the global level.

The point is that today, in R, I need to depend on readr but the github version, not the CRAN version, because I need to match a column of times that use “AM/PM” time. In R, there isn’t a clean way to load conflicting versions of a package that I am aware of. I don’t want my programs to use the bleeding edge of readr, but I am willing to accept the devel version for this package.

Unfortunately, I’m the only person using npm to load R packages local to my project. Phooey. But I can hack my R launching script to use devtools to load the package I need locally as follows.

First, I have a standard incantation to make my runtime R find my local, node_modules-installed R libraries:

## need node_modules directories
dot_is <- getwd()
node_paths <- dir(dot_is,pattern='.Rlibs',
                  full.names=TRUE,recursive=TRUE,
                  ignore.case=TRUE,include.dirs=TRUE,
                  all.files = TRUE)
path <- normalizePath(node_paths, winslash = "/", mustWork = FALSE)
lib_paths <- .libPaths()
.libPaths(c(path, lib_paths))

This bit of code will dive down into the local node_modules directory, recursively find all of the .Rlibs directories, and prepend them to the runtime .libPaths, so that local libraries take precedence over global ones.

All I have to do is to insert a command to load the required devel-level packages before installing and testing my code. Something like:

## need node_modules directories
dot_is <- getwd()
node_paths <- dir(dot_is,pattern='.Rlibs',
                  full.names=TRUE,recursive=TRUE,
                  ignore.case=TRUE,include.dirs=TRUE,
                  all.files = TRUE)
path <- normalizePath('node_modules/.Rlibs', winslash = "/", mustWork = FALSE)
if(!file.exists('path')){
    dir.create(path)
}
.libPaths(c(path,node_paths, lib_paths))
vc <-  list(op=">=",version=package_version("0.1.1.9000"))
if(!requireNamespace(package='readr',versionCheck=vc)){
    devtools::install_github('hadley/readr')
}

I can save that as Requirements.R, and then add the following to my package.json file:

...
  "scripts": {
      "test": "/usr/bin/Rscript Rtest.R",
      "preinstall": "/usr/bin/Rscript Requirements.R",
      "install":"/usr/bin/Rscript Rinstall.R"
  },
...

That works and is cool, but extremely one-off. Better would be to add dependencies in the package.json and get them loaded automatically. My unfinished start at this is to create an entry “rDependencies” in the package.json, which npm will then expose to my script in the system environment as “npm_package_rDependencies_…”. But I have to move on and so this is unfinished as of yet:

package.json

...
  "dependencies": {
      "calvad_rscripts": "jmarca/calvad_rscripts",
      "rcouchutils":"git://github.com/jmarca/rstats_couch_utils.git",
      "configr":"git://github.com/jmarca/configr.git"
  },
  "devDependencies": {
    "should": "^6.0.1"
  },
  "rDependencies":{
      "readr":"0.1.1.9000"
  }
  "scripts": {
      "test": "/usr/bin/Rscript Rtest.R",
      "preinstall": "/usr/bin/Rscript Requirements.R",
      "install":"/usr/bin/Rscript Rinstall.R"
  },
...

script snippet to read package.json dependencies

## ideally I would plumb versions from package.json environment variables?

envrr <- Sys.getenv()
print(envrr)
dependencies <- grep(pattern='npm_package_rDependencies'
                    ,x=names(envrr),perl=TRUE,value=TRUE)
print(dependencies)
pkgs <- strsplit(x=dependencies,split='npm_package_rDependencies_')
print(pkgs)
for(i in 1:length(dependencies)){
    pkg <- pkgs[[i]][2]
    ver <- envrr[[dependencies[i]]]
    vc <-  list(op=">=",version=package_version(ver))
    print(vc)
    if(!requireNamespace(package=pkg,versionCheck=vc)){
        print('need to download')
        devtools::install_github(paste('hadley',pkg,sep='/'))
        ## whoops, need to add proper github user, repo name here
    }else{
        print(paste('got',pkg,ver,'already'))
    }
}

Really I need to specify the required development R package like:

  "rDependencies":{
      "readr":{
          "repo":"hadley/readr",
          "version":"0.1.1.9000"
      }
  },

But the hacking gets uglier and uglier because this is passed to the script as npm_package_rDependencies_readr_repo and npm_package_rDependencies_readr_version
which means my braindead regexpr and split calls will need to be tweaked and patched some more to combine the repo and the version with the package.

So, future me, you have work to do and another blog post when you get this cleaned up.

Modernizing my approach to my R packages

I’ve been using R since 2000 or so, probably earlier, off and on. I’ve always just hacked out big old spaghetti-code programs. More recently, as alluded to with this past post, I’ve been migrating to using node.js to call into R. The initial impetus was to solve a problem with memory leakage, and with a single crash disrupting a really big sequence of jobs. By setting up my big outer loops in node.js, I can now fire off as many simultaneous R jobs as my RAM can handle, and if any die, node.js can handle the errors appropriately.

The one remaining issue is that my R code was still pretty brutish. I dislike the formal R packaging stuff, and I wanted something more lightweight, more like what node.js uses. I first tried to use component, but that was the wrong choice for a number of reasons. Way back in October I toyed with the idea of using npm to package up my R code, but I didn’t really start to do that in earnest until very recently. It turns out, with just a few quirks, this works pretty well. This post outlines my general approach to using npm to load R packages.

Continue reading

Another note to my future self on DBIx::Class

I’ve been writing a lot of javascript, and I really like node.js. I like lots of languages, but I find that node.js tends to work how I expect.

That said, sometimes I need to use perl. Last week, after some searching and testing out libraries, I was generally dissatisfied with the node.js packages available for parsing spreadsheets. The node.js way is to be non-blocking and streaming, but I couldn’t find a package that handled old and new spreadsheets that was either non-blocking or streaming (or both). Faced with that, I’d much rather use the tried, true, and extremely well tested Spreadsheet::Read perl module. It is also blocking, but at least it is pretty much guaranteed to work.

So using perl to parse a spreadsheet means I also had to dust off my database code to put the parsed results into my database. Since my last round of perl programming, I’ve gotten much more diligent about testing things as I hack, and writing much smaller modules. So I’m writing a small module to save a list of data to the database. Pretty simple with DBIx::Class.

Creating a test database from Perl

One wrinkle came in testing my code. What I normally do in node.js (with mocha) is to write a little “before” script that creates a database, and then a little “after” script that tears it down. Then all the testing code can write and delete without worrying about bombing the production db, and without requiring me to manually create and delete databases.

The missing link for me (and the purpose of this blog post) was how to create a database and slot in tables from perl and DBIx::Class.

My final solution is a hack of sorts. Instead of being creative, I just dropped down to DBD::Pg and issued a “create database” command directly. My code looks like this:

# create a test database

use DBI;

my $host = $ENV{PGHOST} || '127.0.0.1';
my $port = $ENV{PGPORT} || 5432;
my $db = $ENV{PGTESTDATABASE} || 'test_db';
my $user = $ENV{PGTESTUSER} || $ENV{PGUSER} || 'postgres';
my $pass =  '';

my $admindb = $ENV{PGADMINDATABASE} || 'postgres';
my $adminuser = $ENV{PGADMINUSER} || 'postgres';


my $dbh;
eval{
    $dbh = DBI->connect("dbi:Pg:dbname=$admindb", $adminuser);
};
if($@) {
    croak $@;
}
my $create = "create database $db";
if($user ne $adminuser){
    $create .= " with owner $user";
}
eval {
        $dbh->do($create);
};

That works fine, and is mirrored at the end of the test with a similar $dbh->do("drop database $db"); statement. Sadly, I can’t remember how to do before and after type blocks in perl tests. I seem to remember doing them long ago, but the semantics escape me. Like the subjunctive tense in Italian.

Creating test tables using DBIx::Class

Now the next step that tripped me up was populating the few tables I need for the tests. I have a large crufty db, and lazily used an automated script to create my DBIx::Class schema from the existing PostgreSQL tables. But running $schema->deploy() didn’t work because I have views and so on that muck things up. I really only need two tables for my current spreadsheet data save tests, so I only wanted to deploy() those two tables.

The documentation says:

Additionally, the DBIx::Class parser accepts a sources parameter as a
hash ref or an array ref, containing a list of source to deploy. If
present, then only the sources listed will get deployed.

That’s great, but I couldn’t find any examples of exactly what that meant. So I tried a few things, and one thing worked, and so here I am writing a note to my future self (and anyone else who lands on this page).

My database has multiple postgresql schemas, and so my DBIx::Class schema generation script took that into account. That needs its own documentation, but essentially what I did was:

{"schema_class":"Testbed::Spatial::VDS::Schema",

 "connect_info":{
     "dsn":"dbi:Pg:dbname=spatialvds;host=localhost",
     "user":"myuser"
 },
 "loader_options":{
     "dump_directory": "./lib",
     "db_schema": ["public","hsis","wim","newctmlmap","tempseg"],
     "debug":false,
     "moniker_parts":  ["schema", "name"],
     "moniker_part_separator":  "::",
     "naming": {"ALL":"v8", "force_ascii" : true}
 }
}

The super long class name of Testbed::Spatial::VDS::Schema is cruft from the distant past, but not too difficult to work with. The upshot is that my schema definitions are buried in a directory structure rooted at ./lib/Testbed/Spatial/VDS/Schema/Result/[Public,Hsis,Wim,...]. The two packages that I want to deploy for my tests are called Testbed::Spatial::VDS::Schema::Result::Public::WimStatus and Testbed::Spatial::VDS::Schema::Result::Public::WimStatusCodes.

So.

To deploy just these two tables, I first wrote the fully qualified package names as the “sources” option. But that didn’t work. Then I remembered that when using DBIx::Class, usually you just refer to the different classes (representing tables) by everything after the top level package name. So in this case, I could drop the Testbed::Spatial::VDS::Schema part in both names. My final, working bit of code is:

## deploy via DBIx::Class

use Testbed::Spatial::VDS::Schema;

my $schema = Testbed::Spatial::VDS::Schema->connect(
    "dbi:Pg:dbname=$db;host=$host;port=$port",
    $user,
    );

## deploy just the tables I'm going to be accessing during testing

my $deploy_result;
eval{
    $deploy_result =  $schema->deploy(
        { 'sources'=>["Public::WimStatus",
                      "Public::WimStatusCode"]});
};
if($@) {
    carp 'test db deploy failed';
    croak $@;
}

One final note to my future self. I never like passing passwords around in my programs. What I’ve found is that PostgreSQL uses a .pgpass file, documented here. So as long as the username, host, and database name match one of the lines in that file, it will pull out the correct password. Because this file is chmod 0600, it is less likely to get accidentally read by someone else, and also it will never get slurped up into a git repository. Because perl uses the PostgreSQL C libraries, it automatically inherits this behavior. So with Postgres, you should never be putting passwords into command lines or environment variables or source code.

Non-obvious fix to a dzil problem

Last Friday I decided to skip trying to use node.js to parse spreadsheet files and instead stick with my existing perl solution based on Spreadsheet::Read. Because the code was really old, I had no proper tests, so I just started over from scratch. Poking around Modern Perl 2014 I found a note about using dzil to setup packages.

So I followed along with the choose-your-own-adventure style documentation at http://dzil.org/tutorial/start.html and had good success with setting things up. I rewrote my old code using Moose and immutable state and all that great stuff, and wrote pretty thorough test coverage of the various conditions and edge cases I can think of at the moment.

All was going well until I tried to set up the [AutoPrereqs] plugin.
When asked to compute the dependencies automatically, dzil choked on my binary spreadsheet files stored in the ./t/files directory:

ParseStatusSpreadsheeets[master]$ dzil listdeps
Could not decode UTF-8 t/files/07-2009.xls; filename set by GatherDir 
(Dist::Zilla::Plugin::GatherDir line 215); encoded_content added by 
GatherDir (Dist::Zilla::Plugin::GatherDir line 216); error was: utf8 
"xD0" does not map to Unicode at /usr/lib64/perl5/Encode.pm line 176.

I can easily set the [Prereqs] configuration to list my dependencies manually, but I wanted to do it automatically. I couldn’t believe I was the only person to have binary files mucking up the AutoPrereqs plugin, but the documentation was not helpful at all. The only hints given were to use a custom FileFinder (), but no help on what exactly a FileFinder was or how to set one up in the config file.

Eventually I searched for “binary file” in the Github issues, and found this old bug: https://github.com/rjbs/Dist-Zilla/issues/407. The solution I found there is to tell dzil to ignore files as being binary by listing them in the config file. In my case, that fix works out to be:

[Encoding]
encoding = bytes
match    = xls    ; these are all spreadsheet test files

[AutoPrereqs]

(Another thing not mentioned in the dzil docs is that when they say that “match” is a regex, they don’t mean that you should write match = /xls/i because that won’t work! The config file isn’t perl, it is text that gets manipulated by perl.)

Anyway, with that fix to my dist.ini file, the AutoPrereqs plugin works as expected:

ParseStatusSpreadsheeets[master]$ dzil listdeps
Carp
Data::Dumper
DateTime::Format::DateParse
DateTime::Format::Pg
ExtUtils::MakeMaker
Moose
namespace::autoclean
Spreadsheet::Read
strict
Test::More
warnings