Using the replicator database in CouchDB 1.1.0

I am testing out the new replicator db features in CouchDB 1.1 (documented here), and I came across a quirk that took me a while to figure out, so I thought I’d write it up. It isn’t a bug, and it is totally consistent with the rules, but for some reason it was counter-intuitive to me.

The fundamental problem is that I am using slashes in database names. This is fine and supported, but when used in URLs the slashes have to be escaped.

The database I am replicating between machines is called vdsdata/d12/2007. Ordinarily in CouchDB, because it uses HTTP for everything, I’d have to escape that as
“vdsdata%2fd12%2f2007”. For example, if I want to get the status of the database, I’d write

curl 127.0.0.1:5984/vdsdata%2Fd12%2F2007

which will return:

{"db_name":"vdsdata/d12/2007",
 "doc_count":109,
 "doc_del_count":3,
 "update_seq":151,
 "purge_seq":0,
 "compact_running":false,
 "disk_size":479323,
 "instance_start_time":"1308159829266270",
 "disk_format_version":5,
 "committed_update_seq":151}

So this habit of always escaping the slashes is ingrained in me, and I always call a URL escape routine in my programs to escape the database names. For example, in the R code I am working on I just call tolower(paste(components,collapse='%2F')).

However, this doesn’t work in the replicator database. As documented, the replicator database entries are of the format:

{
    "_id": "doc_bar_pull",
    "source":  "http://myserver.com:5984/foo",
    "target":  "bar"
}

or going the other way

{
    "_id": "doc_bar_push",
    "source":  "bar",
    "target":  "http://myserver.com:5984/foo"
}

The docs don’t mention the odd use case of putting slashes in the database names, so I just continued to call my escaping routines and created the following replicator database entry:

{
    "_id":"pull_broken",
    "source":"http://example.com:5984/vdsdata%2fd12%2f2007",
    "target":"vdsdata%2fd12%2f2007",
    "continuous":true
}

This spews illegal database errors in the log files, and if you create the replicator document via futon, once you save it you’ll immediately see errors in the document, as in:

{
    "_id":"pull_broken",
    "_rev":"2-3546101f93d73f8f7d3a185569b036d3",
    "continuous":true,
    "source":"http://example.com:5984/vdsdata%2fd12%2f2007",
    "target":"vdsdata%2fd12%2f2007",
    "_replication_state":"error",
    "_replication_state_time":"2011-06-15T11:24:06-07:00",
    "_replication_id":"3540d1d63ad94edd9f0731928ebaf2b1"
}

What is going on is that internally CouchDB is not using HTTP to access its databases, and CouchDB knows that its databases are named with slashes or other funny characters. So when I escape the database name in the replicator document, CouchDB is happily doing what I asked and looking for a database with “%2F” in its name. Instead my entry into the replicator database must have the slashes for the local db, even though it still must have the escape for the remote db, since that remote database is accessed over HTTP. The correct entry looks something like:

curl 127.0.0.1:5984/_replicator/vdsdata%2Fd12%2F2007_pull
{
    "_id":"vdsdata/d12/2007_pull",
    "_rev":"18-878ae6f27325ca11a1339d6bd1f68c39",
    "source":"http://example.com:5984/vdsdata%2fd12%2f2007",
    "target":"vdsdata/d12/2007",
    "continuous":true,
    "_replication_state":"triggered",
    "_replication_state_time":"2011-06-15T10:40:29-07:00",
    "_replication_id":"85b8199b41199783ef25048ca8913dad"
}

Now I’ve sorted that out, time to actually use replicating databases in my work!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s