From simple examples to complicated real world cases

I have a really irritating use-case for a CouchDB view. I have several hundred million documents representing hourly data for 4km grid cells in California, and I need to group them by areas. For example, grid cell i=100, j=223 is in Mendocino County, and in the “NORTH COAST” air basin. Of course I have the geometry of the grid cells and the geometry of the counties, air basins, and so on, in PostgreSQL/PostGIS, and I usually just shoot off a query to get the relationship and I’m done. This is CouchDB, however, and views cannot rely on external information lest they become idemimpotent (I made that up). Everything that the view needs must be in the view from the start.

Fair enough, I set up the SQL queries and generated my 9,800+ row JavaScript hash lookup table that maps grid cell to various areas of interest. Now I want to mix that into the view without pulling my hair out.

There is a really simple example in the CouchDB wiki. I’ve reproduced it below:

 {
   _id:"_design/test",
   language: "javascript",
   whatever : {
     stringzone : "exports.string = 'plankton';",
     commonjs : {
       whynot : "exports.test = require('../stringzone')",
       upper : "exports.testing = require('./whynot').test.string.toUpperCase()"
     }
   },
   shows: {
     simple: "function() {return 'ok'};",
     requirey : "function() { var lib = require('whatever/commonjs/upper'); return lib.testing; };"
   },
   views: {
     lib: { 
       foo: "exports.bar = 42;" 
     },
     test: { 
       map: "function(doc) { emit(doc._id, require('views/lib/foo').bar); }"
     }
   }
  }

So where the above example says foo: "exports.bar = 42;", I want to add in my massive hashtable. Obviously cutting and pasting so many lines is not the way to go. Instead, I’m using a couchapp tool.

The concept of a couchapp used to get more press that it currently seems to, but the basic idea is to use code to load up your design doc with attachments and views. In my case, I couldn’t care less about the attachments and the notion of a webapp stored and served by CouchDB. I just want to programmatically construct the view document, and push it to CouchDB. I chose to use node.couchapp.js. I could also have "rolled my own", and in fact I probably will this afternoon. I am playing around with grunt, so I used grunt_couchapp (after patching it a bit to use cookie based authentication).

The basic structure of my directory is the following


config.json
package.json
Gruntfile.js
app.js
lib
├── cellmembership.json
└── dump_membership.js
node_modules
├── ...
└── ...

The config.json file contains my database details, including my username and password. package.json contains the npm dependencies, mostly containing what was pulled in by the grunt_couchapp tool, and the node_modules directory holds all the node modules. I do not have an _attachments directory, so I make sure my design doc has no attachments!

Before getting to app.js, in which the design document is defined, I will first talk about what goes into it. The lookup table is stored as a JSON object in lib/cellmembership.json. The contents looks like:

{ "100_223":{"airbasin":"NORTH COAST","bas":"NC","county":"MENDOCINO","fips":"23","airdistrict":"MENDOCINO COUNTY AQMD","dis":"MEN"},
 "100_224":{"airbasin":"NORTH COAST","bas":"NC","county":"MENDOCINO","fips":"23","airdistrict":"MENDOCINO COUNTY AQMD","dis":"MEN"},
   ... 9,890 more lines like this ...
 "304_48":{"airbasin":"SALTON SEA","bas":"SS","county":"IMPERIAL","fips":"13","airdistrict":"IMPERIAL COUNTY APCD","dis":"IMP"},
 "98_247":{"airbasin":"NORTH COAST","bas":"NC","county":"HUMBOLDT","fips":"12","airdistrict":"NORTH COAST UNIFIED AQMD","dis":"NCU"}
}

The view code that uses this file is saved to lib/dump_membership.js, and looks like:

module.exports = function(doc){
    var lookup = require('views/lib/cellmembership').lookup
    emit(lookup[doc.cell_id].county, doc.value)
}

These two pieces are put together in app.js, that looks like this:

var couchapp = require('couchapp')
var cellmembership = require('./lib/cellmembership.json')
var mapfun = require('./lib/dump_membership')

var ddoc = {
    _id: '_design/calvad',
    rewrites: [{
      from: '',
      to: 'index.html',
      method: 'GET',
      query: {}
    },{
      from: '/*',
      to: '/*'
    }],
    views: {
        "lib":{
            "cellmembership":"exports.lookup="+JSON.stringify(cellmembership)
        },
        "test":{
            "map":mapfun
        }
    },
    lists: {},
    shows: {}
};


module.exports = ddoc;

So instead of "exports.bar=42;", I put in "exports.lookup="+JSON.stringify(...). The key insight that the simple example didn’t really convey is that you want your entire "library" module to be a string. So in this case that means saving my JSON lookup document as a string using JSON.stringify. I probably could have just loaded it directly using fs.readfile(), but I like this way, because it soothes my worries about malformed JSON. If the JSON is screwed up, the app.js won’t run, and the failure happens right away, not in the midst of cranking through hundreds of millions of documents.

The other bit that I didn’t get from the example was how to include an external function in the design document. What I did was pretty simple, and it worked. I just did "map":mapfun. This is exactly the opposite of what needed to be done with the views:lib:cellmembership.. construct. There the exports.lookup= statement needs to be a string inside of the JavaScript, whereas the assignment of the map function needs to be actual JavaScript code, not the string representation of that code.

This is exactly the kind of inconsistency that drives me nuts and that nobody ever thinks to document, because only crazies like me run into those edge cases.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s