Using npm with R is great

A few weeks ago I wrote up how I am using npm, the standard package manager for node.js. When I first started using node.js, and when npm first started cropping up as the best package manager to use, I was annoyed by the idea of copying libraries over and over again into each package’s node_modules directory. It seemed wasteful, since they were mostly copies, so I would generally use the -g flag and install globally.

Then I ran into trouble with versions, and I decided my insistence on using -g was stupid. Better to have the required version locally installed than to fight with multiple versions of a package at the global level.

The point is that today, in R, I need to depend on readr but the github version, not the CRAN version, because I need to match a column of times that use “AM/PM” time. In R, there isn’t a clean way to load conflicting versions of a package that I am aware of. I don’t want my programs to use the bleeding edge of readr, but I am willing to accept the devel version for this package.

Unfortunately, I’m the only person using npm to load R packages local to my project. Phooey. But I can hack my R launching script to use devtools to load the package I need locally as follows.

First, I have a standard incantation to make my runtime R find my local, node_modules-installed R libraries:

## need node_modules directories
dot_is <- getwd()
node_paths <- dir(dot_is,pattern='.Rlibs',
                  all.files = TRUE)
path <- normalizePath(node_paths, winslash = "/", mustWork = FALSE)
lib_paths <- .libPaths()
.libPaths(c(path, lib_paths))

This bit of code will dive down into the local node_modules directory, recursively find all of the .Rlibs directories, and prepend them to the runtime .libPaths, so that local libraries take precedence over global ones.

All I have to do is to insert a command to load the required devel-level packages before installing and testing my code. Something like:

## need node_modules directories
dot_is <- getwd()
node_paths <- dir(dot_is,pattern='.Rlibs',
                  all.files = TRUE)
path <- normalizePath('node_modules/.Rlibs', winslash = "/", mustWork = FALSE)
.libPaths(c(path,node_paths, lib_paths))
vc <-  list(op=">=",version=package_version(""))

I can save that as Requirements.R, and then add the following to my package.json file:

  "scripts": {
      "test": "/usr/bin/Rscript Rtest.R",
      "preinstall": "/usr/bin/Rscript Requirements.R",
      "install":"/usr/bin/Rscript Rinstall.R"

That works and is cool, but extremely one-off. Better would be to add dependencies in the package.json and get them loaded automatically. My unfinished start at this is to create an entry “rDependencies” in the package.json, which npm will then expose to my script in the system environment as “npm_package_rDependencies_…”. But I have to move on and so this is unfinished as of yet:


  "dependencies": {
      "calvad_rscripts": "jmarca/calvad_rscripts",
  "devDependencies": {
    "should": "^6.0.1"
  "scripts": {
      "test": "/usr/bin/Rscript Rtest.R",
      "preinstall": "/usr/bin/Rscript Requirements.R",
      "install":"/usr/bin/Rscript Rinstall.R"

script snippet to read package.json dependencies

## ideally I would plumb versions from package.json environment variables?

envrr <- Sys.getenv()
dependencies <- grep(pattern='npm_package_rDependencies'
pkgs <- strsplit(x=dependencies,split='npm_package_rDependencies_')
for(i in 1:length(dependencies)){
    pkg <- pkgs[[i]][2]
    ver <- envrr[[dependencies[i]]]
    vc <-  list(op=">=",version=package_version(ver))
        print('need to download')
        ## whoops, need to add proper github user, repo name here

Really I need to specify the required development R package like:


But the hacking gets uglier and uglier because this is passed to the script as npm_package_rDependencies_readr_repo and npm_package_rDependencies_readr_version
which means my braindead regexpr and split calls will need to be tweaked and patched some more to combine the repo and the version with the package.

So, future me, you have work to do and another blog post when you get this cleaned up.

Modernizing my approach to my R packages

I’ve been using R since 2000 or so, probably earlier, off and on. I’ve always just hacked out big old spaghetti-code programs. More recently, as alluded to with this past post, I’ve been migrating to using node.js to call into R. The initial impetus was to solve a problem with memory leakage, and with a single crash disrupting a really big sequence of jobs. By setting up my big outer loops in node.js, I can now fire off as many simultaneous R jobs as my RAM can handle, and if any die, node.js can handle the errors appropriately.

The one remaining issue is that my R code was still pretty brutish. I dislike the formal R packaging stuff, and I wanted something more lightweight, more like what node.js uses. I first tried to use component, but that was the wrong choice for a number of reasons. Way back in October I toyed with the idea of using npm to package up my R code, but I didn’t really start to do that in earnest until very recently. It turns out, with just a few quirks, this works pretty well. This post outlines my general approach to using npm to load R packages.

Continue reading

Another note to my future self on DBIx::Class

I’ve been writing a lot of javascript, and I really like node.js. I like lots of languages, but I find that node.js tends to work how I expect.

That said, sometimes I need to use perl. Last week, after some searching and testing out libraries, I was generally dissatisfied with the node.js packages available for parsing spreadsheets. The node.js way is to be non-blocking and streaming, but I couldn’t find a package that handled old and new spreadsheets that was either non-blocking or streaming (or both). Faced with that, I’d much rather use the tried, true, and extremely well tested Spreadsheet::Read perl module. It is also blocking, but at least it is pretty much guaranteed to work.

So using perl to parse a spreadsheet means I also had to dust off my database code to put the parsed results into my database. Since my last round of perl programming, I’ve gotten much more diligent about testing things as I hack, and writing much smaller modules. So I’m writing a small module to save a list of data to the database. Pretty simple with DBIx::Class.

Creating a test database from Perl

One wrinkle came in testing my code. What I normally do in node.js (with mocha) is to write a little “before” script that creates a database, and then a little “after” script that tears it down. Then all the testing code can write and delete without worrying about bombing the production db, and without requiring me to manually create and delete databases.

The missing link for me (and the purpose of this blog post) was how to create a database and slot in tables from perl and DBIx::Class.

My final solution is a hack of sorts. Instead of being creative, I just dropped down to DBD::Pg and issued a “create database” command directly. My code looks like this:

# create a test database

use DBI;

my $host = $ENV{PGHOST} || '';
my $port = $ENV{PGPORT} || 5432;
my $db = $ENV{PGTESTDATABASE} || 'test_db';
my $user = $ENV{PGTESTUSER} || $ENV{PGUSER} || 'postgres';
my $pass =  '';

my $admindb = $ENV{PGADMINDATABASE} || 'postgres';
my $adminuser = $ENV{PGADMINUSER} || 'postgres';

my $dbh;
    $dbh = DBI->connect("dbi:Pg:dbname=$admindb", $adminuser);
if($@) {
    croak $@;
my $create = "create database $db";
if($user ne $adminuser){
    $create .= " with owner $user";
eval {

That works fine, and is mirrored at the end of the test with a similar $dbh->do("drop database $db"); statement. Sadly, I can’t remember how to do before and after type blocks in perl tests. I seem to remember doing them long ago, but the semantics escape me. Like the subjunctive tense in Italian.

Creating test tables using DBIx::Class

Now the next step that tripped me up was populating the few tables I need for the tests. I have a large crufty db, and lazily used an automated script to create my DBIx::Class schema from the existing PostgreSQL tables. But running $schema->deploy() didn’t work because I have views and so on that muck things up. I really only need two tables for my current spreadsheet data save tests, so I only wanted to deploy() those two tables.

The documentation says:

Additionally, the DBIx::Class parser accepts a sources parameter as a
hash ref or an array ref, containing a list of source to deploy. If
present, then only the sources listed will get deployed.

That’s great, but I couldn’t find any examples of exactly what that meant. So I tried a few things, and one thing worked, and so here I am writing a note to my future self (and anyone else who lands on this page).

My database has multiple postgresql schemas, and so my DBIx::Class schema generation script took that into account. That needs its own documentation, but essentially what I did was:


     "dump_directory": "./lib",
     "db_schema": ["public","hsis","wim","newctmlmap","tempseg"],
     "moniker_parts":  ["schema", "name"],
     "moniker_part_separator":  "::",
     "naming": {"ALL":"v8", "force_ascii" : true}

The super long class name of Testbed::Spatial::VDS::Schema is cruft from the distant past, but not too difficult to work with. The upshot is that my schema definitions are buried in a directory structure rooted at ./lib/Testbed/Spatial/VDS/Schema/Result/[Public,Hsis,Wim,...]. The two packages that I want to deploy for my tests are called Testbed::Spatial::VDS::Schema::Result::Public::WimStatus and Testbed::Spatial::VDS::Schema::Result::Public::WimStatusCodes.


To deploy just these two tables, I first wrote the fully qualified package names as the “sources” option. But that didn’t work. Then I remembered that when using DBIx::Class, usually you just refer to the different classes (representing tables) by everything after the top level package name. So in this case, I could drop the Testbed::Spatial::VDS::Schema part in both names. My final, working bit of code is:

## deploy via DBIx::Class

use Testbed::Spatial::VDS::Schema;

my $schema = Testbed::Spatial::VDS::Schema->connect(

## deploy just the tables I'm going to be accessing during testing

my $deploy_result;
    $deploy_result =  $schema->deploy(
        { 'sources'=>["Public::WimStatus",
if($@) {
    carp 'test db deploy failed';
    croak $@;

One final note to my future self. I never like passing passwords around in my programs. What I’ve found is that PostgreSQL uses a .pgpass file, documented here. So as long as the username, host, and database name match one of the lines in that file, it will pull out the correct password. Because this file is chmod 0600, it is less likely to get accidentally read by someone else, and also it will never get slurped up into a git repository. Because perl uses the PostgreSQL C libraries, it automatically inherits this behavior. So with Postgres, you should never be putting passwords into command lines or environment variables or source code.

how to yf sl 1, k2tog, psso

As I usually do after publishing a new post, I scanned my previous hits and such and found a new search term that somehow hit my blog: “how to yf sl 1, k2tog, psso”.

Well, I know how to do that, and I even have a feeling that the search string contains a typo! So to help out future people who are accidentally thrown here searching for how to knit, here’s my mini tutorial.

yf sl 1, k2tog, psso

Taken as a whole, this instruction will reduce three stitches down to one stitch in the next row. The “standard” way to do this is not to bring the working yarn forward (the yf part). This is a non-standard way of doing this three into one reduction. Obviously the pattern author has some reason for requesting that the yarn be brought forward, but for my money it doesn’t make much difference in the final product (see my pictures at the end of this article).

The next sections will break down each part of this instruction.

yf sl 1

Translated, yf sl 1 means “Yarn forward, slip one.” There are two parts to this “move”—first to bring the yarn forward, then to slip one.

yf or Yarn Forward

Ordinarily when one knits, the yarn is held in the back of the work. When doing purl stitches, the yarn is held in front of the work. So the instruction yf or “yarn forward” means to bring the yarn in front of your work as if you are about to purl.


Usually, before a reduction like this, you do a yarn over, or “yo”, not a yarn forward.

sl 1 or Slip One

Slipping a stitch is simply moving one stitch from the left needle to the right needle without working it. There are two ways to slip. First, you can stick the right needle into the next stitch as if you are about to knit. Second you can stick the right needle into the next stitch as if you are about to purl.

The first way (sticking your needle into the next stitch as if you are about to knit) is the “non-standard” way to slip a stitch. This way of slipping results in a twisted stitch. If the pattern author wants you to slip in this way, it is usually indicated by a note that says “slip one as if to knit” or something similar. The only exception to this is when performing Barbara Walker’s SSK move, or “slip-slip-knit”. In that move, both the slips are done as if to knit.

The second way (sticking your right needle into the next stitch as if to purl) is the “standard” way to slip a stitch. If the instructions say “slip 1″ or similar, with no other explanation, you can safely assume that you should slip as if to purl. This way of slipping the stitch does not produce a twist in the stitch.

k2tog or Knit Two Together

Just as the instructions suggest, this move asks you to knit two stitches together. One thing that the instructions leave out is the fact that you’re supposed to move the working yarn back to the back of the work.

To knit together two stitches, you insert your needle into the second and next stitch at the same time, and then knit one stitch from the working yarn. Pictures are easier than words here. One tip to a novice knitter or one who happens to knit very tightly is that sometimes it is easier to do this move if you first put your right needle into both stitches purl-wise to loosen up both stitches.

psso or Pass Slipped Stitch Over

Take the left needle and stick it into the stitch you passed to the right needle earlier. Lift that stitch up and carefully move it over the stitch you just made. Drop it off the end of the needle and it will fall neatly into place around the stitch you just made, facing the opposite direction of the two stitches you knit together.


So I tried this and it looks ugly. The yf is clearly wrong, or the designer is being extra cutesy for no reason at all. I think the instruction and the search string should have been yo, sl1-k2tog-psso.

Non-obvious fix to a dzil problem

Last Friday I decided to skip trying to use node.js to parse spreadsheet files and instead stick with my existing perl solution based on Spreadsheet::Read. Because the code was really old, I had no proper tests, so I just started over from scratch. Poking around Modern Perl 2014 I found a note about using dzil to setup packages.

So I followed along with the choose-your-own-adventure style documentation at and had good success with setting things up. I rewrote my old code using Moose and immutable state and all that great stuff, and wrote pretty thorough test coverage of the various conditions and edge cases I can think of at the moment.

All was going well until I tried to set up the [AutoPrereqs] plugin.
When asked to compute the dependencies automatically, dzil choked on my binary spreadsheet files stored in the ./t/files directory:

ParseStatusSpreadsheeets[master]$ dzil listdeps
Could not decode UTF-8 t/files/07-2009.xls; filename set by GatherDir 
(Dist::Zilla::Plugin::GatherDir line 215); encoded_content added by 
GatherDir (Dist::Zilla::Plugin::GatherDir line 216); error was: utf8 
"xD0" does not map to Unicode at /usr/lib64/perl5/ line 176.

I can easily set the [Prereqs] configuration to list my dependencies manually, but I wanted to do it automatically. I couldn’t believe I was the only person to have binary files mucking up the AutoPrereqs plugin, but the documentation was not helpful at all. The only hints given were to use a custom FileFinder (), but no help on what exactly a FileFinder was or how to set one up in the config file.

Eventually I searched for “binary file” in the Github issues, and found this old bug: The solution I found there is to tell dzil to ignore files as being binary by listing them in the config file. In my case, that fix works out to be:

encoding = bytes
match    = xls    ; these are all spreadsheet test files


(Another thing not mentioned in the dzil docs is that when they say that “match” is a regex, they don’t mean that you should write match = /xls/i because that won’t work! The config file isn’t perl, it is text that gets manipulated by perl.)

Anyway, with that fix to my dist.ini file, the AutoPrereqs plugin works as expected:

ParseStatusSpreadsheeets[master]$ dzil listdeps

Recompiling glibc with a diff patch

Update: don’t do this, Pat did it for you!

Right after posting this, of course the official patched binaries hit the internet:


Wed Jan 28 19:23:00 UTC 2015
patches/packages/glibc-2.17-x86_64-10_slack14.1.txz: Rebuilt.

So the following is only to be used as a general howto, not as specific instructions.

How to

Yesterday I recompiled glibc with a few diff patches from the helpful post on linux questions here. Today I refreshed that page to see if anything else new was happening on that bug, and there was a comment that applying a diff patch was difficult to do.

So I was inspired to write this post because in fact applying diff patches to glibc in Slackware is actually fairly easy (on a scale of 1 being download a binary and 10 being compile and install glibc from scratch, I’d put it at a 3)

What makes it so easy is that Slackware uses build scripts for each package, and these scripts are included when you download the slackware sources.

So here are the steps I took.

Download the Slackware glibc source directory

The first step is to get the source code you need to compile. Slackware is available from a number of locations. I went to the slackware main site and picked a mirror from the list at I chose Next using a browser, navigate around the source tree to find what you’re looking for—in this case, glibc sources. For 32-bit slackware, they are located at For 64-bit Slackware (what I’m using) I went to (Of course, if you’re using a different mirror, the first bit with “” would change)

Because I like rsync, and because the mirrors support rsync, I used that to download the source directory:

mkdir -p Downloads/slackware/source/l
rsync -av rsync:// Downloads/slackware/source/l/.

But you can also just right-click and save every file if you want. You can’t use wget as follows:

wget --mirror --no-parent

because at least for this mirror, the robots.txt file disallows using wget.

Download the patches

Next you need the patches. Go to the page linked above , and download the patches:

cd Downloads/slackware/source/l/glibc

Then to make your life easier in the next step, use gzip to compress each diff file:

gzip glibc-2.17_CVE-2014-7817.diff
gzip glibc-2.17_CVE-2014-9402.diff
gzip glibc-2.17_CVE-2015-0235.diff

Modify the stock Slackware build script

The next step is to apply these patches by modifying the stock Slackware build script. Open up the file glibc.SlackBuild in your favorite text editor, and scroll down until you see the function that applies the patches. It should look like this:

# This is a patch function to put all glibc patches in the build script
# up near the top.
apply_patches() {
# Use old-style locale directories rather than a single (and strangely
# formatted) /usr/lib/locale/locale-archive file:
zcat $CWD/ | patch -p1 --verbose || exit 1
# The is_IS locale is causing a strange error about the "echn" command

Scroll to the very bottom of this function, and using the prior zcat ... statements as a guide, apply the three patches. I chose to apply them in the order listed in the posting on the Linux Questions post:

... (existing patches) ...
# various patches from GHOST alert thing
zcat $CWD/glibc-2.17_CVE-2014-9402.diff.gz | patch -p1 --verbose || exit 1
zcat $CWD/glibc-2.17_CVE-2015-0235.diff.gz | patch -p1 --verbose || exit 1
zcat $CWD/glibc-2.17_CVE-2014-7817.diff.gz | patch -p1 --verbose || exit 1

Notice that the patches are “unzipped” using zcat. That’s why I recommended gzipping the diff files. If you don’t do that, you have to mess around with the syntax of that line. We’re just lazily copypasting here, and changing a file name is the path of least effort. So gzip it, and use zcat just like all the other diff patches.

An additional edit is needed that will save you some pain. I like to run my build scripts using sudo, but apparently our fearless Slackware leader runs as root, or else has a different $PATH than I do. There are various calls to makepkg towards the end of the script. You should copy and replace to make them all read /sbin/makepkg. If your $PATH is set like mine, your regular user account won’t be able to see the binaries in /sbin. The first time I ran this script, I didn’t get any packages at all for this reason.

So somehow, using your editor, search and replace.

Be careful…the very last call to makepkg already has the /sbin/ part. A double /sbin/sbin/ is not going to work!

Finally, scroll to the very top of the build script, and look for the line that says


Change the 7 to an 8.

Finally finally, make sure that Pat hasn’t already done this work. If you don’t see a 7 as the BUILD variable, then chances are the appropriate patches have already been applied, and you should go download the binary.

Build the packages

The next step is to build the packages. This step must be done as the root user.

sudo ./glibc.SlackBuild

Go have a refreshing drink, this takes a while.

Upgrade glibc

The last step is to upgrade glibc. At the very end of the build process, you should see a message that says something like: “glibc packages built in /glibc-tmp-3ea66757c2278dca4f7e829eb4a941f7″. That is where the packages are.

Because I am neurotic, I usually drop to runlevel 1 before upgrading glibc

So, as root

sudo /sbin/telinit 1

This will shut things down and drop you into a console mode. Log in as root, change to the glibc-tmp directory, and upgrade the packages:

cd /glibc-tmp
/sbin/upgradepkg glibc*t?z

Most likely you will see at least one issue, with the “debug” package not installing. upgradepkg will only upgrade packages that are already installed. That is a good thing.

If the upgrade went well (only the “debug” and maybe the “profile” packages refusing to install), then you’re done.

Reboot, and you should be good to go.

If the upgrade didn’t go well, you’re on your own. Best advice is to force install the existing (version 7) glibc packages by using the --reinstall --install-new options to upgradepkg, so that your system isn’t completely unstable. Then dig down, figure out what broke, and write your own blog post.

Final words

Hosing your system is a time-honored tradition for part-time sysadmins like myself. I’ve done it many times. All it means is that I have an afternoon of work ahead of me to rebuild things. Building glibc can fail, and restarting a system with a broken glibc can be challenging. Make sure you have a rescue disk on hand if this is your first time doing it.

If you’re new to building glibc, there are many ways to completely screw up. Read through the glibc.SlackBuild carefully, and see all the hints that Pat has left, especially the note about needing sanitized kernel headers towards the top.

But don’t be afraid to break your system. Failure is a lesson learned.

Obsessed with cases

Ever since I bought a Lemolo Daypack, I’ve become mildly obsessed with quality bags and cases. Part of the reason was that I was looking for a decent, good looking pair of bike panniers. There isn’t a large market for panniers, so they’re a little bit hard to find. And when I did find them they tended to be made out of waterproof nylon with plastic buckles and clips. As I searched the internet, I slowly found lots of beautiful panniers, and a lot more beautiful bags and wallets. (In the end I bought the Lemolo Daypack, and a Lemolo Toolroll, but I still don’t have a decent pair of panniers.)

So that brings me to this evening’s entertainment. Today I pulled the trigger and ordered a new iphone. So now I need a case for it that doesn’t look like a clunky chunky clippy thing my daughter would have liked when she was 3. Mind you, I haven’t touched any of these cases, so I know nothing at all about whether I really want to plunk down my cash on these, but I thought I’d collect today’s bookmarks and thoughts in one place for posterity.

First off, carryology is a great resource, but as of right now, if you type iphone6 in their search tool you get back zip.

Second off, I haven’t owned a phone since my tiny sony-ericsson phone from 2002 (or thereabouts).

Third, my reason for wanting a case (your reasons will be different, of course) is that I like to toss my things in my backpack (my Lemolo Daypack, of course) or in my pockets with my keys and change and other abusive items. The case will need to prevent casual scrapes, scratches, and dings, and will need to guard against the occasional busted fountain pen. A bonus would be if the case could protect against the occasional drop, but the more common role will be minimizing wear and tear. Pretty much I’m thinking leather, but now that I think more about it, it might also be possible to do this with heavy duty canvas.

After searching for “leather handmade iphone6 case” and clicking through various links and references (again, carryology articles were great), I’ve lumped my choices into two broad categories: a sleeve design, or a bill fold approach. There is a third category of a shell or backing, such as Apple’s own leather case, but I don’t really see the point of that for my purposes. To protect the phone properly from getting dinked in a big pack or in a pocket, a backing has to be a bit bulky. Apple’s good looking leather backing leaves the screen wide open to scratches and dings.


A sleeve design has the advantage of being simple. You slide the phone in to keep it save, and slide it out to use it. If you slide it in upside down, you can get access to the headphone port and listen to music. The problem with a sleeve is that it looks like it can be quite tricky to design the sleeve properly. The phone has to slide in easily, but not so easily that it slips out accidentally. It has to say securely in the pouch, but must be easy to snag with just the tip of a finger. An errant, fat fingered tug is going to send the phone for a tumble to the floor. My guess is that the more expensive products tend to incorporate slight variations in the design to allow for easy access, as well as tight tolerances to make sure the phone fits just right. For example, the Judas has a slight notch in the top edge of the leather, probably to make it easier to grab the phone. the filzstuek has wool felt lining that will make it easier to slide the phone than raw leather. And the makr is a little hard to figure out without a picture with a phone in it, but I suspect that the two leather tabs at the top open up slightly below the top of the phone to allow for easy access.

Some excellent examples of sleeves:

There are many many more examples on Etsy to fit every taste and budget.


The wallet or bill fold style are the other approach I’ve been considering, and some of these apparently allow complete access to the phone’s functionality without having to extract the phone from its case.

  • The Grovemade cases look great. The cases are made of leather and wood. The leather wraps around the wood, and apparently even provides a handy stand, as shown in the pic below.
    the grovemade leather cover doubles as a stand

  • Pad & Quill offer two styles of cases that I’m considering. The Luxury Pocket Book is similar to the Grovemade ones, in that they are also made of wood and leather, and the leather wraps around the front of the phone and folds back for easy access to the phone’s active surface. However, unlike the Grovemade, the leather does not appear to create a stand. Still a very good looking case. My only quibble with the design is that fake bookmark.
    the pad&quill case looks like a little notebook

  • Pad & Quill also offer an all leather case called the Bella Fino. The ad copy appears to claim that they are using a nifty sticker tech to keep the phone stuck to the leather case. I’d like to see this and hold it in my hands to figure out what it does and whether it lives up to its claims, but it certainly looks like it makes for a low profile case.
    the pad&quill bella fino case

The deal breaker

When I was in Japan last spring, I had some time to kill in a department store while my daughters and wife inspected the luxe toilets. We were in the Ginza area of Tokyo, and this department store was holding a leather artisan event. I wandered around, and was really impressed by the beautiful wallets and card holders. I stood a bit too long in front of one gentleman’s stand, and he broke out some English that was better than my phrasebook Japanese. I really liked his card holder, but then my brain snapped on properly and I realized that 22,000円 was roughly $220! I politely asked for his card and shuffled away before I blew my travel budget.

When I got home, the cards and other souvenirs got dumped into a pile and forgotten. But a few weeks ago I came across this guy’s card, and went to his website at Now I’m stuck, because what I really want is something like this wallet or this one but sized a bit smaller to fit the iphone properly (they are a bit big). But I really don’t have $300 to spend on a wallet.
Can you imagine an iphone hiding in this wallet that looks like an envelope?

So the search continues. I’ve got a few weeks until the phone arrives. Perhaps I’ll find something when we’re wandering around NYC in 2 weeks.


My kids’ school is having a knitting fundraiser. They were going to call it knitapalooza, but I suggested knitapurlooza instead. I guess the idea is to knit squares, sew them up into blankets, and then mail them to a third world charity that will distribute them. Much better to have your kids hit up the neighbors for sponshorships to knit squares than it is to run laps as with a jog-a-thon or walk-a-thon.

Cast on

Over the weekend I cast on for a cowl using some yarn my sister got me from Germany. Cast on 300 stitches on my new circular 4.5mm needles (also from my sister’s trip), joined in the round, and started knitting in a 1×1 rib. My sister liked a cowl that she saw in this shop, knit on the same needles and with the same yarn, but the instructions she got (jotted down in German) were to cast on 78 or so stitches and then knit up in garter stitch. The problem with that is then at the end I’d have to graft together the beginning and the end, and the fact that I hate back and forth knitting in garter stitch—too boring for words.

My goal is to do a round or two a day, so 300 to 600 stitches per day.

I’m surprised how weak my hands are. Pinkies and ring fingers on both hands are griping about being sore.

A real-world use of PL/Perl

Last week I wrote a node.js program to parse and copy a CSV file into PostgreSQL. The data included several columns of detector data, and then a catch-all column called XML that was supposed to contain the raw read from the detector. The XML column was a big old ASCII escaped blob of text, and I just ignored it and stuffed it into its own table.

Unfortunately, as is always the case with these things, the XML column wasn’t XML at all. Instead, it contained what looked like a Perl object dumped using Data::Dumper. I couldn’t easily rewrite my node.js program to break up that Perl object, and I certainly didn’t want to rewrite my well-tested node.js program in Perl.

Enter PL/Perl.

I’ve never really had a need for PL/Perl. The PostgreSQL documentation page promotes the ability to use Perl’s string-munging facilities. But here I had an even simpler use case. I just want to call out to Perl, eval() the object, then stash the results.

The reason I’m writing this post is that I’ve never quite gotten the hang of how to use stored procedures in PostgreSQL. This is sort of a “note to my future self” in case I forget containing some of the things I figured out.

First, the initial program I wrote looks like this:

CREATE OR REPLACE FUNCTION perl_xml_segment_decoder (TEXT) RETURNS bt_xml_segment AS $$
    use strict;
    my $unescape = sub {
        my $escaped = shift;
        $escaped =~ s/%u([0-9a-f]{4})/chr(hex($1))/eig;
        $escaped =~ s/%([0-9a-f]{2})/chr(hex($1))/eig;
        return $escaped;
    }; # borrowed from  URI::Escape::JavaScript 

    my $chars = $unescape->( $_[0] );
    my $VAR1;

    # clean up some entries we are not using
    my $segment = $VAR1->{'segment'};
    $segment->{'ts'} = $segment->{'Timestamp'};
    my %bar = map { lc $_ => $segment->{$_} } qw{
    return \%bar;
$$ LANGUAGE plperl;

This takes in one of the “XML” strings, and returns a column type bt_xml_segment that is defined by:

CREATE TABLE bt_xml_segment (
  segmentid      integer primary key,
  fromlocationid integer REFERENCES bt_xml_location (locationid),
  tolocationid   integer REFERENCES bt_xml_location (locationid),
  route          varchar(128),
  groupby        integer,
  projectid      integer REFERENCES bt_xml_project (projectid),
  ts    timestamp with time zone not null,
  numtrips       integer,
  speed          numeric,
  distance           numeric,
  estimatedtimetaken numeric,
  traveltime         numeric

One thing I’ve never gotten the hang of is how to call functions. Following the docs, I can call this function as follows:

select * from  perl_xml_segment_decoder('%24VAR1%20%3D%20%7B%0A%20%20%27location%27%20%3D%3E%20%7B%0A%20%20%20%20%27Active%27%20%3D%3E%201%2C%0A%20%20%20%20%27LastCheckin%27%20%3D ... %20%20%27TravelTime%27%20%3D%3E%20%27356.285714285714%27%0A%20%20%7D%0A%7D%3B%0A');

and I would get back a lovely tabular output like this:

 segmentid | fromlocationid | tolocationid | route | groupby | projectid |           ts           |  numtrips |      speed       | distance | estimatedtimetaken |    traveltime    
      4558 |           3481 |         3472 | SR-39 |      15 |       672 | 2014-07-15 17:30:00-07 |       14 | 8.04274565301844 |      0.8 |                 86 | 356.285714285714
(1 row)

But the semantics of that call are strange to me. What the query says is to treat the function like it is a table. This is reasonable, but what I want to do is call the function on each row of another table, like so:

select perl_xml_segment_decoder( from perlhash as xml;

But that returns an array output:

 (4558,3481,3472,SR-39,15,672,"2014-07-15 17:30:00-07",14,8.04274565301844,0.8,86,356.285714285714)
(1 row)

This is more difficult to use in an INSERT clause. While I could contort that, and make it work, I decided to instead just keep the function as a function, and include the query to the XML data table within the function. Again, the excellent PostgreSQL docs are quite helpful, and explain how to query a table from Perl and then iterate over each returned row. My new function looks like this:

CREATE OR REPLACE FUNCTION perl_xml_segment_obs_decoder () RETURNS setof bt_xml_observation AS $$
    use strict;
    my $unescape = sub {
        my $escaped = shift;
        $escaped =~ s/%u([0-9a-f]{4})/chr(hex($1))/eig;
        $escaped =~ s/%([0-9a-f]{2})/chr(hex($1))/eig;
        return $escaped;
    }; # borrowed from  URI::Escape::JavaScript 

    my $sth = spi_query("SELECT * FROM perlhash");
    while ( defined( my $row = spi_fetchrow($sth) ) ) {
        my $chars = $unescape->( $row->{data} );
        my $VAR1;

        # clean up some entries we are not using
        my $segment = $VAR1->{'segment'};
        $segment->{'ts'} = $segment->{'Timestamp'};
        my %bar = map { lc $_ => $segment->{$_} } qw{
        $bar{data_ts}         = $row->{ts};
        $bar{radar_lane_id}   = $row->{radar_lane_id};
        $bar{station_lane_id} = $row->{station_lane_id};
        return_next \%bar;
    return undef;
$$ LANGUAGE plperl;

Because I'm actually following along my git commits, and because I was refactoring things and tuning my relational database tables as I developed, this function returns a different table type from before:

CREATE TABLE bt_xml_observation(
  segmentid      integer not null references bt_xml_segment(segmentid),
  ts    timestamp with time zone not null,
  data_ts timestamp with time zone not null,
  radar_lane_id integer,
  station_lane_id integer,
  numtrips       integer,
  speed          numeric,
  distance           numeric,
  estimatedtimetaken numeric,
  traveltime         numeric,
  primary key(segmentid,ts,data_ts,radar_lane_id,station_lane_id),
  foreign key (data_ts,radar_lane_id,station_lane_id) references smartsig.bluetooth_data(ts,radar_lane_id,station_lane_id)

I use this function within an insert statement, as follows:

insert into bt_xml_observation  (select  * from perl_xml_segment_obs_decoder()) ;

In some cases (when populating the segments and location tables, for example), the output of the function includes duplicates. Rather than handle them in the Perl code using a hash or something, I decided to keep the PL/Perl simple and use SQL to remove duplicates. My query for loading up the segments table (the 8 unique segments about which the data was collected) is:

insert into smartsig.bt_xml_segment  (select distinct * from smartsig.perl_xml_segment_decoder()) ;

Finally, I expanded my node.js code to make use of these functions. Each data file (representing an hour of data) was 18MB. My code loads up one file, saves the XML/Perl hash data into a “TEMP” table, and then uses that table to populate the observations. The insert statements use WITH clauses to query the functions, as well as to join those call with the existing data so as to avoid the error of inserting duplicates. Finally, my code is careful to populate the tables in order so that the various foreign key constraints are satisfied. (Note that I like to build my SQL statements as an array that I then “join” together. I do that in whatever language I’m programming in because it makes it easy to slot in dynamic variables, print diagnostic output, etc)

        // essentially, I have to do these in order:

        var insert_statements = []
            ,"a as ("
            ,"  select distinct * from perl_xml_project_decoder_from_location()"
            ,"b as ("
            ,"  select a.*"
            ,"  from a"
            ,"  left outer join bt_xml_project z USING (projectid)"
            ,"  where z.projectid is null"
            ,"insert into bt_xml_project (projectid,title) (select projectid,title from b)"
        ].join(' '))

            ["with a as ("
             ,"select aa.*,count(*) as cnt from perl_xml_location_decoder_from_location() aa"
             ,"left outer join bt_xml_location z USING(locationid)"
             ,"where z.locationid is null"
             ,"group by aa.locationid,aa.locationname,aa.latitude,aa.longitude,aa.projectid"
             ,"b as ("
             ,"select locationid,locationname,latitude,longitude,projectid,"
             ,"rank() OVER (PARTITION BY locationid ORDER BY cnt DESC) AS pos"
             ,"from a"
             ,"insert into bt_xml_location (locationid,locationname,latitude,longitude,projectid)"
             ,"(select locationid,locationname,latitude,longitude,projectid"
             ,"from b"
             ,"where pos=1)"].join(' ')
            "with a as (select distinct aa.* from perl_xml_segment_decoder() aa"
            ,"left outer join bt_xml_segment z USING(segmentid)"
            ,"where z.segmentid is null)"
            ,"insert into bt_xml_segment (segmentid,fromlocationid,tolocationid,route,groupby,projectid)"
            ,"(select segmentid,fromlocationid,tolocationid,route,groupby,projectid from a)"
        ].join(' '))
            'insert into bt_xml_observation  (select  * from perl_xml_segment_obs_decoder())'

        var q = queue(1);  // using queue (
                           // with parallelism of 1 to make sure each task 
                           // executes in order

        insert_statements.forEach(function(statement) {
                             ,function (err, result) {
                                 return cb(err)
            return null
        q.awaitAll(function(error, results) {
            //console.log("all done with insert statements")
            return callback()


And there you have it: a node.js program that runs SQL queries that use Perl code embedded in PL/Perl functions.

The gory details can be found in my github repo for this.