Another note to my future self on DBIx::Class

I’ve been writing a lot of javascript, and I really like node.js. I like lots of languages, but I find that node.js tends to work how I expect.

That said, sometimes I need to use perl. Last week, after some searching and testing out libraries, I was generally dissatisfied with the node.js packages available for parsing spreadsheets. The node.js way is to be non-blocking and streaming, but I couldn’t find a package that handled old and new spreadsheets that was either non-blocking or streaming (or both). Faced with that, I’d much rather use the tried, true, and extremely well tested Spreadsheet::Read perl module. It is also blocking, but at least it is pretty much guaranteed to work.

So using perl to parse a spreadsheet means I also had to dust off my database code to put the parsed results into my database. Since my last round of perl programming, I’ve gotten much more diligent about testing things as I hack, and writing much smaller modules. So I’m writing a small module to save a list of data to the database. Pretty simple with DBIx::Class.

Creating a test database from Perl

One wrinkle came in testing my code. What I normally do in node.js (with mocha) is to write a little “before” script that creates a database, and then a little “after” script that tears it down. Then all the testing code can write and delete without worrying about bombing the production db, and without requiring me to manually create and delete databases.

The missing link for me (and the purpose of this blog post) was how to create a database and slot in tables from perl and DBIx::Class.

My final solution is a hack of sorts. Instead of being creative, I just dropped down to DBD::Pg and issued a “create database” command directly. My code looks like this:

# create a test database

use DBI;

my $host = $ENV{PGHOST} || '127.0.0.1';
my $port = $ENV{PGPORT} || 5432;
my $db = $ENV{PGTESTDATABASE} || 'test_db';
my $user = $ENV{PGTESTUSER} || $ENV{PGUSER} || 'postgres';
my $pass =  '';

my $admindb = $ENV{PGADMINDATABASE} || 'postgres';
my $adminuser = $ENV{PGADMINUSER} || 'postgres';


my $dbh;
eval{
    $dbh = DBI->connect("dbi:Pg:dbname=$admindb", $adminuser);
};
if($@) {
    croak $@;
}
my $create = "create database $db";
if($user ne $adminuser){
    $create .= " with owner $user";
}
eval {
        $dbh->do($create);
};

That works fine, and is mirrored at the end of the test with a similar $dbh->do("drop database $db"); statement. Sadly, I can’t remember how to do before and after type blocks in perl tests. I seem to remember doing them long ago, but the semantics escape me. Like the subjunctive tense in Italian.

Creating test tables using DBIx::Class

Now the next step that tripped me up was populating the few tables I need for the tests. I have a large crufty db, and lazily used an automated script to create my DBIx::Class schema from the existing PostgreSQL tables. But running $schema->deploy() didn’t work because I have views and so on that muck things up. I really only need two tables for my current spreadsheet data save tests, so I only wanted to deploy() those two tables.

The documentation says:

Additionally, the DBIx::Class parser accepts a sources parameter as a
hash ref or an array ref, containing a list of source to deploy. If
present, then only the sources listed will get deployed.

That’s great, but I couldn’t find any examples of exactly what that meant. So I tried a few things, and one thing worked, and so here I am writing a note to my future self (and anyone else who lands on this page).

My database has multiple postgresql schemas, and so my DBIx::Class schema generation script took that into account. That needs its own documentation, but essentially what I did was:

{"schema_class":"Testbed::Spatial::VDS::Schema",

 "connect_info":{
     "dsn":"dbi:Pg:dbname=spatialvds;host=localhost",
     "user":"myuser"
 },
 "loader_options":{
     "dump_directory": "./lib",
     "db_schema": ["public","hsis","wim","newctmlmap","tempseg"],
     "debug":false,
     "moniker_parts":  ["schema", "name"],
     "moniker_part_separator":  "::",
     "naming": {"ALL":"v8", "force_ascii" : true}
 }
}

The super long class name of Testbed::Spatial::VDS::Schema is cruft from the distant past, but not too difficult to work with. The upshot is that my schema definitions are buried in a directory structure rooted at ./lib/Testbed/Spatial/VDS/Schema/Result/[Public,Hsis,Wim,...]. The two packages that I want to deploy for my tests are called Testbed::Spatial::VDS::Schema::Result::Public::WimStatus and Testbed::Spatial::VDS::Schema::Result::Public::WimStatusCodes.

So.

To deploy just these two tables, I first wrote the fully qualified package names as the “sources” option. But that didn’t work. Then I remembered that when using DBIx::Class, usually you just refer to the different classes (representing tables) by everything after the top level package name. So in this case, I could drop the Testbed::Spatial::VDS::Schema part in both names. My final, working bit of code is:

## deploy via DBIx::Class

use Testbed::Spatial::VDS::Schema;

my $schema = Testbed::Spatial::VDS::Schema->connect(
    "dbi:Pg:dbname=$db;host=$host;port=$port",
    $user,
    );

## deploy just the tables I'm going to be accessing during testing

my $deploy_result;
eval{
    $deploy_result =  $schema->deploy(
        { 'sources'=>["Public::WimStatus",
                      "Public::WimStatusCode"]});
};
if($@) {
    carp 'test db deploy failed';
    croak $@;
}

One final note to my future self. I never like passing passwords around in my programs. What I’ve found is that PostgreSQL uses a .pgpass file, documented here. So as long as the username, host, and database name match one of the lines in that file, it will pull out the correct password. Because this file is chmod 0600, it is less likely to get accidentally read by someone else, and also it will never get slurped up into a git repository. Because perl uses the PostgreSQL C libraries, it automatically inherits this behavior. So with Postgres, you should never be putting passwords into command lines or environment variables or source code.

how to yf sl 1, k2tog, psso

As I usually do after publishing a new post, I scanned my previous hits and such and found a new search term that somehow hit my blog: “how to yf sl 1, k2tog, psso”.

Well, I know how to do that, and I even have a feeling that the search string contains a typo! So to help out future people who are accidentally thrown here searching for how to knit, here’s my mini tutorial.

yf sl 1, k2tog, psso

Taken as a whole, this instruction will reduce three stitches down to one stitch in the next row. The “standard” way to do this is not to bring the working yarn forward (the yf part). This is a non-standard way of doing this three into one reduction. Obviously the pattern author has some reason for requesting that the yarn be brought forward, but for my money it doesn’t make much difference in the final product (see my pictures at the end of this article).

The next sections will break down each part of this instruction.

yf sl 1

Translated, yf sl 1 means “Yarn forward, slip one.” There are two parts to this “move”—first to bring the yarn forward, then to slip one.

yf or Yarn Forward

Ordinarily when one knits, the yarn is held in the back of the work. When doing purl stitches, the yarn is held in front of the work. So the instruction yf or “yarn forward” means to bring the yarn in front of your work as if you are about to purl.

Typo?

Usually, before a reduction like this, you do a yarn over, or “yo”, not a yarn forward.

sl 1 or Slip One

Slipping a stitch is simply moving one stitch from the left needle to the right needle without working it. There are two ways to slip. First, you can stick the right needle into the next stitch as if you are about to knit. Second you can stick the right needle into the next stitch as if you are about to purl.

The first way (sticking your needle into the next stitch as if you are about to knit) is the “non-standard” way to slip a stitch. This way of slipping results in a twisted stitch. If the pattern author wants you to slip in this way, it is usually indicated by a note that says “slip one as if to knit” or something similar. The only exception to this is when performing Barbara Walker’s SSK move, or “slip-slip-knit”. In that move, both the slips are done as if to knit.

The second way (sticking your right needle into the next stitch as if to purl) is the “standard” way to slip a stitch. If the instructions say “slip 1″ or similar, with no other explanation, you can safely assume that you should slip as if to purl. This way of slipping the stitch does not produce a twist in the stitch.

k2tog or Knit Two Together

Just as the instructions suggest, this move asks you to knit two stitches together. One thing that the instructions leave out is the fact that you’re supposed to move the working yarn back to the back of the work.

To knit together two stitches, you insert your needle into the second and next stitch at the same time, and then knit one stitch from the working yarn. Pictures are easier than words here. One tip to a novice knitter or one who happens to knit very tightly is that sometimes it is easier to do this move if you first put your right needle into both stitches purl-wise to loosen up both stitches.

psso or Pass Slipped Stitch Over

Take the left needle and stick it into the stitch you passed to the right needle earlier. Lift that stitch up and carefully move it over the stitch you just made. Drop it off the end of the needle and it will fall neatly into place around the stitch you just made, facing the opposite direction of the two stitches you knit together.

Typo.

So I tried this and it looks ugly. The yf is clearly wrong, or the designer is being extra cutesy for no reason at all. I think the instruction and the search string should have been yo, sl1-k2tog-psso.

Non-obvious fix to a dzil problem

Last Friday I decided to skip trying to use node.js to parse spreadsheet files and instead stick with my existing perl solution based on Spreadsheet::Read. Because the code was really old, I had no proper tests, so I just started over from scratch. Poking around Modern Perl 2014 I found a note about using dzil to setup packages.

So I followed along with the choose-your-own-adventure style documentation at http://dzil.org/tutorial/start.html and had good success with setting things up. I rewrote my old code using Moose and immutable state and all that great stuff, and wrote pretty thorough test coverage of the various conditions and edge cases I can think of at the moment.

All was going well until I tried to set up the [AutoPrereqs] plugin.
When asked to compute the dependencies automatically, dzil choked on my binary spreadsheet files stored in the ./t/files directory:

ParseStatusSpreadsheeets[master]$ dzil listdeps
Could not decode UTF-8 t/files/07-2009.xls; filename set by GatherDir 
(Dist::Zilla::Plugin::GatherDir line 215); encoded_content added by 
GatherDir (Dist::Zilla::Plugin::GatherDir line 216); error was: utf8 
"xD0" does not map to Unicode at /usr/lib64/perl5/Encode.pm line 176.

I can easily set the [Prereqs] configuration to list my dependencies manually, but I wanted to do it automatically. I couldn’t believe I was the only person to have binary files mucking up the AutoPrereqs plugin, but the documentation was not helpful at all. The only hints given were to use a custom FileFinder (), but no help on what exactly a FileFinder was or how to set one up in the config file.

Eventually I searched for “binary file” in the Github issues, and found this old bug: https://github.com/rjbs/Dist-Zilla/issues/407. The solution I found there is to tell dzil to ignore files as being binary by listing them in the config file. In my case, that fix works out to be:

[Encoding]
encoding = bytes
match    = xls    ; these are all spreadsheet test files

[AutoPrereqs]

(Another thing not mentioned in the dzil docs is that when they say that “match” is a regex, they don’t mean that you should write match = /xls/i because that won’t work! The config file isn’t perl, it is text that gets manipulated by perl.)

Anyway, with that fix to my dist.ini file, the AutoPrereqs plugin works as expected:

ParseStatusSpreadsheeets[master]$ dzil listdeps
Carp
Data::Dumper
DateTime::Format::DateParse
DateTime::Format::Pg
ExtUtils::MakeMaker
Moose
namespace::autoclean
Spreadsheet::Read
strict
Test::More
warnings

Recompiling glibc with a diff patch

Update: don’t do this, Pat did it for you!

Right after posting this, of course the official patched binaries hit the internet:

From: http://www.slackware.com/changelog/stable.php?cpu=x86_64

Wed Jan 28 19:23:00 UTC 2015
patches/packages/glibc-2.17-x86_64-10_slack14.1.txz: Rebuilt.

So the following is only to be used as a general howto, not as specific instructions.

How to

Yesterday I recompiled glibc with a few diff patches from the helpful post on linux questions here. Today I refreshed that page to see if anything else new was happening on that bug, and there was a comment that applying a diff patch was difficult to do.

So I was inspired to write this post because in fact applying diff patches to glibc in Slackware is actually fairly easy (on a scale of 1 being download a binary and 10 being compile and install glibc from scratch, I’d put it at a 3)

What makes it so easy is that Slackware uses build scripts for each package, and these scripts are included when you download the slackware sources.

So here are the steps I took.

Download the Slackware glibc source directory

The first step is to get the source code you need to compile. Slackware is available from a number of locations. I went to the slackware main site and picked a mirror from the list at http://mirrors.slackware.com/mirrorlist/. I chose http://mirrors1.kernel.org. Next using a browser, navigate around the source tree to find what you’re looking for—in this case, glibc sources. For 32-bit slackware, they are located at http://mirrors1.kernel.org/slackware/slackware-14.1/source/l/glibc/. For 64-bit Slackware (what I’m using) I went to http://mirrors1.kernel.org/slackware/slackware64-14.1/source/l/glibc/. (Of course, if you’re using a different mirror, the first bit with “kernel.org” would change)

Because I like rsync, and because the kernel.org mirrors support rsync, I used that to download the source directory:

mkdir -p Downloads/slackware/source/l
rsync -av rsync://mirrors1.kernel.org/slackware/slackware64-14.1/source/l/glibc Downloads/slackware/source/l/.

But you can also just right-click and save every file if you want. You can’t use wget as follows:

wget --mirror --no-parent http://mirrors1.kernel.org/slackware/slackware64-14.1/source/l/glibc/

because at least for this mirror, the robots.txt file disallows using wget.

Download the patches

Next you need the patches. Go to the page linked above , and download the patches:

cd Downloads/slackware/source/l/glibc
wget http://sf.net/projects/mancha/files/sec/glibc-2.17_CVE-2014-7817.diff
wget http://sf.net/projects/mancha/files/sec/glibc-2.17_CVE-2014-9402.diff
wget http://sf.net/projects/mancha/files/sec/glibc-2.17_CVE-2015-0235.diff

Then to make your life easier in the next step, use gzip to compress each diff file:

gzip glibc-2.17_CVE-2014-7817.diff
gzip glibc-2.17_CVE-2014-9402.diff
gzip glibc-2.17_CVE-2015-0235.diff

Modify the stock Slackware build script

The next step is to apply these patches by modifying the stock Slackware build script. Open up the file glibc.SlackBuild in your favorite text editor, and scroll down until you see the function that applies the patches. It should look like this:

# This is a patch function to put all glibc patches in the build script
# up near the top.
apply_patches() {
# Use old-style locale directories rather than a single (and strangely
# formatted) /usr/lib/locale/locale-archive file:
zcat $CWD/glibc.locale.no-archive.diff.gz | patch -p1 --verbose || exit 1
# The is_IS locale is causing a strange error about the "echn" command
...

Scroll to the very bottom of this function, and using the prior zcat ... statements as a guide, apply the three patches. I chose to apply them in the order listed in the posting on the Linux Questions post:

... (existing patches) ...
# various patches from GHOST alert thing
zcat $CWD/glibc-2.17_CVE-2014-9402.diff.gz | patch -p1 --verbose || exit 1
zcat $CWD/glibc-2.17_CVE-2015-0235.diff.gz | patch -p1 --verbose || exit 1
zcat $CWD/glibc-2.17_CVE-2014-7817.diff.gz | patch -p1 --verbose || exit 1
}

Notice that the patches are “unzipped” using zcat. That’s why I recommended gzipping the diff files. If you don’t do that, you have to mess around with the syntax of that line. We’re just lazily copypasting here, and changing a file name is the path of least effort. So gzip it, and use zcat just like all the other diff patches.

An additional edit is needed that will save you some pain. I like to run my build scripts using sudo, but apparently our fearless Slackware leader runs as root, or else has a different $PATH than I do. There are various calls to makepkg towards the end of the script. You should copy and replace to make them all read /sbin/makepkg. If your $PATH is set like mine, your regular user account won’t be able to see the binaries in /sbin. The first time I ran this script, I didn’t get any packages at all for this reason.

So somehow, using your editor, search and replace.

Be careful…the very last call to makepkg already has the /sbin/ part. A double /sbin/sbin/ is not going to work!

Finally, scroll to the very top of the build script, and look for the line that says

BUILD=${BUILD:-7}

Change the 7 to an 8.

Finally finally, make sure that Pat hasn’t already done this work. If you don’t see a 7 as the BUILD variable, then chances are the appropriate patches have already been applied, and you should go download the binary.

Build the packages

The next step is to build the packages. This step must be done as the root user.

sudo ./glibc.SlackBuild

Go have a refreshing drink, this takes a while.

Upgrade glibc

The last step is to upgrade glibc. At the very end of the build process, you should see a message that says something like: “glibc packages built in /glibc-tmp-3ea66757c2278dca4f7e829eb4a941f7″. That is where the packages are.

Because I am neurotic, I usually drop to runlevel 1 before upgrading glibc

So, as root

sudo /sbin/telinit 1

This will shut things down and drop you into a console mode. Log in as root, change to the glibc-tmp directory, and upgrade the packages:

cd /glibc-tmp
/sbin/upgradepkg glibc*t?z

Most likely you will see at least one issue, with the “debug” package not installing. upgradepkg will only upgrade packages that are already installed. That is a good thing.

If the upgrade went well (only the “debug” and maybe the “profile” packages refusing to install), then you’re done.

Reboot, and you should be good to go.

If the upgrade didn’t go well, you’re on your own. Best advice is to force install the existing (version 7) glibc packages by using the --reinstall --install-new options to upgradepkg, so that your system isn’t completely unstable. Then dig down, figure out what broke, and write your own blog post.

Final words

Hosing your system is a time-honored tradition for part-time sysadmins like myself. I’ve done it many times. All it means is that I have an afternoon of work ahead of me to rebuild things. Building glibc can fail, and restarting a system with a broken glibc can be challenging. Make sure you have a rescue disk on hand if this is your first time doing it.

If you’re new to building glibc, there are many ways to completely screw up. Read through the glibc.SlackBuild carefully, and see all the hints that Pat has left, especially the note about needing sanitized kernel headers towards the top.

But don’t be afraid to break your system. Failure is a lesson learned.

Obsessed with cases

Ever since I bought a Lemolo Daypack, I’ve become mildly obsessed with quality bags and cases. Part of the reason was that I was looking for a decent, good looking pair of bike panniers. There isn’t a large market for panniers, so they’re a little bit hard to find. And when I did find them they tended to be made out of waterproof nylon with plastic buckles and clips. As I searched the internet, I slowly found lots of beautiful panniers, and a lot more beautiful bags and wallets. (In the end I bought the Lemolo Daypack, and a Lemolo Toolroll, but I still don’t have a decent pair of panniers.)

So that brings me to this evening’s entertainment. Today I pulled the trigger and ordered a new iphone. So now I need a case for it that doesn’t look like a clunky chunky clippy thing my daughter would have liked when she was 3. Mind you, I haven’t touched any of these cases, so I know nothing at all about whether I really want to plunk down my cash on these, but I thought I’d collect today’s bookmarks and thoughts in one place for posterity.

First off, carryology is a great resource, but as of right now, if you type iphone6 in their search tool you get back zip.

Second off, I haven’t owned a phone since my tiny sony-ericsson phone from 2002 (or thereabouts).

Third, my reason for wanting a case (your reasons will be different, of course) is that I like to toss my things in my backpack (my Lemolo Daypack, of course) or in my pockets with my keys and change and other abusive items. The case will need to prevent casual scrapes, scratches, and dings, and will need to guard against the occasional busted fountain pen. A bonus would be if the case could protect against the occasional drop, but the more common role will be minimizing wear and tear. Pretty much I’m thinking leather, but now that I think more about it, it might also be possible to do this with heavy duty canvas.

After searching for “leather handmade iphone6 case” and clicking through various links and references (again, carryology articles were great), I’ve lumped my choices into two broad categories: a sleeve design, or a bill fold approach. There is a third category of a shell or backing, such as Apple’s own leather case, but I don’t really see the point of that for my purposes. To protect the phone properly from getting dinked in a big pack or in a pocket, a backing has to be a bit bulky. Apple’s good looking leather backing leaves the screen wide open to scratches and dings.

Sleeves

A sleeve design has the advantage of being simple. You slide the phone in to keep it save, and slide it out to use it. If you slide it in upside down, you can get access to the headphone port and listen to music. The problem with a sleeve is that it looks like it can be quite tricky to design the sleeve properly. The phone has to slide in easily, but not so easily that it slips out accidentally. It has to say securely in the pouch, but must be easy to snag with just the tip of a finger. An errant, fat fingered tug is going to send the phone for a tumble to the floor. My guess is that the more expensive products tend to incorporate slight variations in the design to allow for easy access, as well as tight tolerances to make sure the phone fits just right. For example, the Judas has a slight notch in the top edge of the leather, probably to make it easier to grab the phone. the filzstuek has wool felt lining that will make it easier to slide the phone than raw leather. And the makr is a little hard to figure out without a picture with a phone in it, but I suspect that the two leather tabs at the top open up slightly below the top of the phone to allow for easy access.

Some excellent examples of sleeves:

There are many many more examples on Etsy to fit every taste and budget.

Wallets

The wallet or bill fold style are the other approach I’ve been considering, and some of these apparently allow complete access to the phone’s functionality without having to extract the phone from its case.

  • The Grovemade cases look great. The cases are made of leather and wood. The leather wraps around the wood, and apparently even provides a handy stand, as shown in the pic below.
    the grovemade leather cover doubles as a stand

  • Pad & Quill offer two styles of cases that I’m considering. The Luxury Pocket Book is similar to the Grovemade ones, in that they are also made of wood and leather, and the leather wraps around the front of the phone and folds back for easy access to the phone’s active surface. However, unlike the Grovemade, the leather does not appear to create a stand. Still a very good looking case. My only quibble with the design is that fake bookmark.
    the pad&quill case looks like a little notebook

  • Pad & Quill also offer an all leather case called the Bella Fino. The ad copy appears to claim that they are using a nifty sticker tech to keep the phone stuck to the leather case. I’d like to see this and hold it in my hands to figure out what it does and whether it lives up to its claims, but it certainly looks like it makes for a low profile case.
    the pad&quill bella fino case

The deal breaker

When I was in Japan last spring, I had some time to kill in a department store while my daughters and wife inspected the luxe toilets. We were in the Ginza area of Tokyo, and this department store was holding a leather artisan event. I wandered around, and was really impressed by the beautiful wallets and card holders. I stood a bit too long in front of one gentleman’s stand, and he broke out some English that was better than my phrasebook Japanese. I really liked his card holder, but then my brain snapped on properly and I realized that 22,000円 was roughly $220! I politely asked for his card and shuffled away before I blew my travel budget.

When I got home, the cards and other souvenirs got dumped into a pile and forgotten. But a few weeks ago I came across this guy’s card, and went to his website at munekawa.jp. Now I’m stuck, because what I really want is something like this wallet or this one but sized a bit smaller to fit the iphone properly (they are a bit big). But I really don’t have $300 to spend on a wallet.
Can you imagine an iphone hiding in this wallet that looks like an envelope?

So the search continues. I’ve got a few weeks until the phone arrives. Perhaps I’ll find something when we’re wandering around NYC in 2 weeks.

Knitapurlooza

My kids’ school is having a knitting fundraiser. They were going to call it knitapalooza, but I suggested knitapurlooza instead. I guess the idea is to knit squares, sew them up into blankets, and then mail them to a third world charity that will distribute them. Much better to have your kids hit up the neighbors for sponshorships to knit squares than it is to run laps as with a jog-a-thon or walk-a-thon.

Cast on

Over the weekend I cast on for a cowl using some yarn my sister got me from Germany. Cast on 300 stitches on my new circular 4.5mm needles (also from my sister’s trip), joined in the round, and started knitting in a 1×1 rib. My sister liked a cowl that she saw in this shop, knit on the same needles and with the same yarn, but the instructions she got (jotted down in German) were to cast on 78 or so stitches and then knit up in garter stitch. The problem with that is then at the end I’d have to graft together the beginning and the end, and the fact that I hate back and forth knitting in garter stitch—too boring for words.

My goal is to do a round or two a day, so 300 to 600 stitches per day.

I’m surprised how weak my hands are. Pinkies and ring fingers on both hands are griping about being sore.

A real-world use of PL/Perl

Last week I wrote a node.js program to parse and copy a CSV file into PostgreSQL. The data included several columns of detector data, and then a catch-all column called XML that was supposed to contain the raw read from the detector. The XML column was a big old ASCII escaped blob of text, and I just ignored it and stuffed it into its own table.

Unfortunately, as is always the case with these things, the XML column wasn’t XML at all. Instead, it contained what looked like a Perl object dumped using Data::Dumper. I couldn’t easily rewrite my node.js program to break up that Perl object, and I certainly didn’t want to rewrite my well-tested node.js program in Perl.

Enter PL/Perl.

I’ve never really had a need for PL/Perl. The PostgreSQL documentation page promotes the ability to use Perl’s string-munging facilities. But here I had an even simpler use case. I just want to call out to Perl, eval() the object, then stash the results.

The reason I’m writing this post is that I’ve never quite gotten the hang of how to use stored procedures in PostgreSQL. This is sort of a “note to my future self” in case I forget containing some of the things I figured out.

First, the initial program I wrote looks like this:

CREATE OR REPLACE FUNCTION perl_xml_segment_decoder (TEXT) RETURNS bt_xml_segment AS $$
    use strict;
    my $unescape = sub {
        my $escaped = shift;
        $escaped =~ s/%u([0-9a-f]{4})/chr(hex($1))/eig;
        $escaped =~ s/%([0-9a-f]{2})/chr(hex($1))/eig;
        return $escaped;
    }; # borrowed from  URI::Escape::JavaScript 

    my $chars = $unescape->( $_[0] );
    my $VAR1;
    eval($chars);

    # clean up some entries we are not using
    my $segment = $VAR1->{'segment'};
    $segment->{'ts'} = $segment->{'Timestamp'};
    my %bar = map { lc $_ => $segment->{$_} } qw{
      SegmentID
      FromLocationID
      ToLocationID
      Route
      GroupBy
      ProjectID
      ts
      NumTrips
      Speed
      Distance
      EstimatedTimeTaken
      TravelTime
    };
    return \%bar;
$$ LANGUAGE plperl;

This takes in one of the “XML” strings, and returns a column type bt_xml_segment that is defined by:

CREATE TABLE bt_xml_segment (
  segmentid      integer primary key,
  fromlocationid integer REFERENCES bt_xml_location (locationid),
  tolocationid   integer REFERENCES bt_xml_location (locationid),
  route          varchar(128),
  groupby        integer,
  projectid      integer REFERENCES bt_xml_project (projectid),
  ts    timestamp with time zone not null,
  numtrips       integer,
  speed          numeric,
  distance           numeric,
  estimatedtimetaken numeric,
  traveltime         numeric
);

One thing I’ve never gotten the hang of is how to call functions. Following the docs, I can call this function as follows:

select * from  perl_xml_segment_decoder('%24VAR1%20%3D%20%7B%0A%20%20%27location%27%20%3D%3E%20%7B%0A%20%20%20%20%27Active%27%20%3D%3E%201%2C%0A%20%20%20%20%27LastCheckin%27%20%3D ... %20%20%27TravelTime%27%20%3D%3E%20%27356.285714285714%27%0A%20%20%7D%0A%7D%3B%0A');

and I would get back a lovely tabular output like this:

 segmentid | fromlocationid | tolocationid | route | groupby | projectid |           ts           |  numtrips |      speed       | distance | estimatedtimetaken |    traveltime    
-----------+----------------+--------------+-------+---------+-----------+------------------------+----------+------------------+----------+--------------------+------------------
      4558 |           3481 |         3472 | SR-39 |      15 |       672 | 2014-07-15 17:30:00-07 |       14 | 8.04274565301844 |      0.8 |                 86 | 356.285714285714
(1 row)

But the semantics of that call are strange to me. What the query says is to treat the function like it is a table. This is reasonable, but what I want to do is call the function on each row of another table, like so:

select perl_xml_segment_decoder(xml.data) from perlhash as xml;

But that returns an array output:

                                      perl_xml_segment_decoder                                      
----------------------------------------------------------------------------------------------------
 (4558,3481,3472,SR-39,15,672,"2014-07-15 17:30:00-07",14,8.04274565301844,0.8,86,356.285714285714)
(1 row)

This is more difficult to use in an INSERT clause. While I could contort that, and make it work, I decided to instead just keep the function as a function, and include the query to the XML data table within the function. Again, the excellent PostgreSQL docs are quite helpful, and explain how to query a table from Perl and then iterate over each returned row. My new function looks like this:

CREATE OR REPLACE FUNCTION perl_xml_segment_obs_decoder () RETURNS setof bt_xml_observation AS $$
    use strict;
    my $unescape = sub {
        my $escaped = shift;
        $escaped =~ s/%u([0-9a-f]{4})/chr(hex($1))/eig;
        $escaped =~ s/%([0-9a-f]{2})/chr(hex($1))/eig;
        return $escaped;
    }; # borrowed from  URI::Escape::JavaScript 

    my $sth = spi_query("SELECT * FROM perlhash");
    while ( defined( my $row = spi_fetchrow($sth) ) ) {
        my $chars = $unescape->( $row->{data} );
        my $VAR1;
        eval($chars);

        # clean up some entries we are not using
        my $segment = $VAR1->{'segment'};
        $segment->{'ts'} = $segment->{'Timestamp'};
        my %bar = map { lc $_ => $segment->{$_} } qw{
          SegmentID
          ts
          NumTrips
          Speed
          Distance
          EstimatedTimeTaken
          TravelTime
        };
        $bar{data_ts}         = $row->{ts};
        $bar{radar_lane_id}   = $row->{radar_lane_id};
        $bar{station_lane_id} = $row->{station_lane_id};
        return_next \%bar;
    }
    return undef;
$$ LANGUAGE plperl;

Because I'm actually following along my git commits, and because I was refactoring things and tuning my relational database tables as I developed, this function returns a different table type from before:

CREATE TABLE bt_xml_observation(
  segmentid      integer not null references bt_xml_segment(segmentid),
  ts    timestamp with time zone not null,
  data_ts timestamp with time zone not null,
  radar_lane_id integer,
  station_lane_id integer,
  numtrips       integer,
  speed          numeric,
  distance           numeric,
  estimatedtimetaken numeric,
  traveltime         numeric,
  primary key(segmentid,ts,data_ts,radar_lane_id,station_lane_id),
  foreign key (data_ts,radar_lane_id,station_lane_id) references smartsig.bluetooth_data(ts,radar_lane_id,station_lane_id)
);

I use this function within an insert statement, as follows:

insert into bt_xml_observation  (select  * from perl_xml_segment_obs_decoder()) ;

In some cases (when populating the segments and location tables, for example), the output of the function includes duplicates. Rather than handle them in the Perl code using a hash or something, I decided to keep the PL/Perl simple and use SQL to remove duplicates. My query for loading up the segments table (the 8 unique segments about which the data was collected) is:

insert into smartsig.bt_xml_segment  (select distinct * from smartsig.perl_xml_segment_decoder()) ;

Finally, I expanded my node.js code to make use of these functions. Each data file (representing an hour of data) was 18MB. My code loads up one file, saves the XML/Perl hash data into a “TEMP” table, and then uses that table to populate the observations. The insert statements use WITH clauses to query the functions, as well as to join those call with the existing data so as to avoid the error of inserting duplicates. Finally, my code is careful to populate the tables in order so that the various foreign key constraints are satisfied. (Note that I like to build my SQL statements as an array that I then “join” together. I do that in whatever language I’m programming in because it makes it easy to slot in dynamic variables, print diagnostic output, etc)

    this.perl_parser=function(client,callback){
        // essentially, I have to do these in order:

        var insert_statements = []
        insert_statements.push([
            "with"
            ,"a as ("
            ,"  select distinct * from perl_xml_project_decoder_from_location()"
            ,"),"
            ,"b as ("
            ,"  select a.*"
            ,"  from a"
            ,"  left outer join bt_xml_project z USING (projectid)"
            ,"  where z.projectid is null"
            ,")"
            ,"insert into bt_xml_project (projectid,title) (select projectid,title from b)"
        ].join(' '))

        insert_statements.push(
            ["with a as ("
             ,"select aa.*,count(*) as cnt from perl_xml_location_decoder_from_location() aa"
             ,"left outer join bt_xml_location z USING(locationid)"
             ,"where z.locationid is null"
             ,"group by aa.locationid,aa.locationname,aa.latitude,aa.longitude,aa.projectid"
             ,"),"
             ,"b as ("
             ,"select locationid,locationname,latitude,longitude,projectid,"
             ,"rank() OVER (PARTITION BY locationid ORDER BY cnt DESC) AS pos"
             ,"from a"
             ,")"
             ,"insert into bt_xml_location (locationid,locationname,latitude,longitude,projectid)"
             ,"(select locationid,locationname,latitude,longitude,projectid"
             ,"from b"
             ,"where pos=1)"].join(' ')
            )
        insert_statements.push([
            "with a as (select distinct aa.* from perl_xml_segment_decoder() aa"
            ,"left outer join bt_xml_segment z USING(segmentid)"
            ,"where z.segmentid is null)"
            ,"insert into bt_xml_segment (segmentid,fromlocationid,tolocationid,route,groupby,projectid)"
            ,"(select segmentid,fromlocationid,tolocationid,route,groupby,projectid from a)"
        ].join(' '))
        insert_statements.push(
            'insert into bt_xml_observation  (select  * from perl_xml_segment_obs_decoder())'
        )


        var q = queue(1);  // using queue (https://github.com/mbostock/queue)
                           // with parallelism of 1 to make sure each task 
                           // executes in order

        insert_statements.forEach(function(statement) {
            q.defer(function(cb){
                client.query(statement
                             ,function (err, result) {
                                 //console.log(statement)
                                 return cb(err)
                             })
            })
            return null
        })
        q.awaitAll(function(error, results) {
            //console.log("all done with insert statements")
            return callback()
        })

    }

And there you have it: a node.js program that runs SQL queries that use Perl code embedded in PL/Perl functions.

The gory details can be found in my github repo for this.

More with the GDAL/OGR perl bindings

So my last post talked about my struggles to finally get something saved in the database using the native perl bindings into the GDAL/OGR library. Once I got that working and pushed out the post, I immediately started loading up multiple files and playing around with the data. One thing I noticed was that it was impossible to separate different “trips” within the data without playing around with space and time. What I wanted was an easy way to flag each batch of points with a field identifying the run.

The auto-generated schema for the GPX data looks like this:

d testogr.track_points
                                              Table "testogr.track_points"
       Column       |           Type           |                               Modifiers                                
--------------------+--------------------------+------------------------------------------------------------------------
 ogc_fid            | integer                  | not null default nextval('testogr.track_points_ogc_fid_seq'::regclass)
 wkb_geometry       | geometry(Point,4326)     | 
 track_fid          | integer                  | 
 track_seg_id       | integer                  | 
 track_seg_point_id | integer                  | 
 ele                | double precision         | 
 time               | timestamp with time zone | 
 magvar             | double precision         | 
 geoidheight        | double precision         | 
 name               | character varying        | 
 cmt                | character varying        | 
 desc               | character varying        | 
 src                | character varying        | 
 link1_href         | character varying        | 
 link1_text         | character varying        | 
 link1_type         | character varying        | 
 link2_href         | character varying        | 
 link2_text         | character varying        | 
 link2_type         | character varying        | 
 sym                | character varying        | 
 type               | character varying        | 
 fix                | character varying        | 
 sat                | integer                  | 
 hdop               | double precision         | 
 vdop               | double precision         | 
 pdop               | double precision         | 
 ageofdgpsdata      | double precision         | 
 dgpsid             | integer                  | 
 speed              | double precision         | 
Indexes:
    "track_points_pkey" PRIMARY KEY, btree (ogc_fid)
    "track_points_wkb_geometry_geom_idx" gist (wkb_geometry)

There are three fields that are completely blank: src, desc, and name. I decided to use src to identify the source of the data as the file name it came from.

First I modified my previous program to parse the command line options using Getopt::Long. I don’t use all of its power in this example, but in the past I’ve been well served by starting with that in case the script grows and mutates.

With Getopt::Long, I understand there are ways to input a list of things into the arguments. You can have multiple invocations of the same option, for example, --file mydata.gpx --file moredata.gpx, or you can input them as a comma separated list and follow the recipe in the perldoc for the module. However, I wanted to use a glob, like –file data/*.gpx, so I instead decided to just stick all the files after a double dash on the command line. So really, in the following code, I’m only using Getopt::Long to parse out a –help command! However, it’s there if I need to expand functionality in the future.

use strict;
use warnings;
use Carp;

use Geo::GDAL;
use Data::Dumper;

use Getopt::Long;
use Pod::Usage;

my $man = 0;
my $help = 0;

my @files;

my $result = GetOptions(
    'help|?' => $help,
    ) or pod2usage(2);

pod2usage(-exitval => 0, -verbose => 2) if $help;

@files = @ARGV;
...

With that, I have all of my input files in an array, and I can loop over them and store the filename in the source field in the db by using $new_feature->SetField('src',$_);, as follows:

foreach (@files){

    my $ds = Geo::OGR::Open($_);

    my $layer         = $ds->Layer($layer_name);
    my $feature_count = $layer->GetFeatureCount();
    carp "$layer_name, $feature_count";
    if ( $feature_count < 10 ) {
        next;
    }

    carp "saving $_ to pg";

    # now append each feature
    my $x = 0;
    $pg_layer->StartTransaction();
    while ( my $feature = $layer->GetNextFeature() ) {

        my $new_feature = Geo::OGR::Feature->new($defn);
        $new_feature->SetFrom($feature);

        # write the filename as the src field, for making lines later
        $new_feature->SetField('src',$_);

        my $pgf = $pg_layer->CreateFeature($new_feature);

        $x += 1;
        if ( $x % 128 == 0 ) {
            carp $x;
            # uncomment the following to crash your program
            # $pg_layer->CommitTransaction();
            # StartTransaction() seems to auto commit prior transaction?
            $pg_layer->StartTransaction(); 
            $x = 0;
        }

    }
    if ($x) {
        carp "all done, $x remaining";
        $pg_layer->CommitTransaction(); # this one doesn't crash for some reason
        carp "last transaction committed";
    }
}

That does its magic, and the database now has distinct groups of points. Now if you want to make “lines” out of those points, you can do this in PostGIS:

SELECT ST_MakeLine(wkb_geometry ORDER BY track_seg_point_id ASC) AS linegeom, src
INTO table testogr.lines
FROM testogr.track_points
GROUP BY src;

Et voila

QGIS rendering the new lines table, on top of OSM lines data

QGIS rendering the new lines table, on top of OSM lines data

Of course, that isn’t at all helpful, as I want to see speeds, not just the lines. Next step is to try to figure out how to add a measure to each point, and then collect those (X,Y,M) type points into a line with a measure dimension. I guess that will be my next post.