Development server logs during development

In a prior post trumpeting my modest success with getting geojson tiles to work, I typed in my server address, but didn’t make it a link. That way robots wouldn’t automatically follow the link and my development server wouldn’t get indexed by Google indirectly.

What is interesting to me is that I still get the occasional hit from that posting. And this is with the server bouncing up and down almost continuously as I add functionality. Just now I was refactoring the tile caching service I wrote, and in between server restarts, someone hit my demo app.

And the GeoJSON tiler is coming along. In making the caching part more robust, I added a recursive directory creation hack which I explain below.

The setup is that I have roads (generated from Open Street Map highway relations in California) and detectors on some of those roads. The detectors go in and out of service. When a detector is in service, I assign it a segment of road from halfway to the upstream detector, to halfway to the downstream detector. Over time, as detectors pop in and out of service, this leads to a variable mapping between detectors and road segments.

If one wishes to answer a question like “How many cars were on this (arbitrary) stretch of roadway in 2008?” then one must pull up all the mappings in that time, get the associated detector data, identify any gaps in coverage, and generate a plot of the results.

However, this isn’t live, for the most part. All this data is in the past, and therefore once I fetch it from the database it is fair game for storing as a cached GeoJSON tile. If I want to get excessive, I can even cache a tile at the end of each day. But now I have to rewrite my original caching handler because I am looking for more than just zoom, column, and row; now my RESTful interface is looking for year and an optional month. But of course, my old service at map/index.html is still active, so I needed a way to parameterize what parameters my tile cacher might expect.

My server is running in Node.js, using the Connect middleware framework. By reading through the example apps and through a healthy dose of trial and error, I figured out that it is possible to stack request handlers as follows:


function datatesting(app) {

    // if there is an existing tile, serve it first
    app.get('/data/:zoom/:column/:row/:year/:month?/:type.:format?'
            ,connect.staticProvider(__dirname + '/public')
           );

    // if there isn't an existing tile, trigger the tiler
    app.get('/data/:zoom/:column/:row/:year/:month?/:type.:format?'
            ,tilingGeoservice({'root':__dirname + '/public/data'
                               ,'pathParams':['zoom','column','row','year','month']
                               , 'fileParam':'type'})
           );

    // go and get the data and render the tile
    app.get('/data/:zoom/:column/:row/:year/:month?/:type.:format?'
            ,data_area_service.pg_detector_area_data_service(
                {'db':database
                 , 'host':host
                 , 'username':puser
                 , 'password':ppass
                })
           );     

};

This pattern is similar to the Chain of Command (I think that is what it was called) controllers I used to write in Struts. The idea is that if any of the controllers can handle a request, then they do, and thus short circuit the rest of the controller stack. If they can’t handle the request, or if they are just tweaking things, then they can call next(), and the next handler in the chain will take a crack at the request.

Here the first handler just looks for the requested file in the file system. If I ever decide that my cached files are out of date, I can just comment this handler out, and all subsequent requests will hit the database directly.

The second handler is my data to filesystem cache writer. I think the proper term for this handler is a decorator, in that it tweaks the incoming response object, and inserts its own version. Any following handlers will not know about this subversion, and will send their responses as usual. The caching handler then intercepts that response, writes it out to the real response object, and also writes a copy to the filesystem.

The final handler does the actual work of hitting the PostgreSQL/PostGIS database for the data.

This app is queued up in the Connect framework with the following code snippet.


var server = connect.createServer(
    connect.logger()
    ,connect.gzip()
    ,connect.bodyDecoder()
    ,connect.cookieDecoder()
    ,connect.session({ store: memory, secret: 'public bl0g, s33kr1t cod3' })
    ,connect.router(geoservices)
    ,connect.router(datatesting)
    ,connect.errorHandler({ dumpExceptions: true, showStack: true })
);

(The geoservices app houses my other geojson tiling demo.)

For me, the tricky part was setting up an efficient way to parse the options so that one service could provide the tile caching for both apps, with different directories and so on. Enter JavaScript’s (relatively) new map and filter array methods, which I had seen here and there but had never used prior to this morning hacking.

The caching service used to presume that the url pattern was zoom/column/row.json, and so it would check if it needed to create the directories zoom and column before writing out the row.json file. However, in this new use case, I have zoom/column/row/year/month?/type.json. Not only are there more directories to parse in the query, there is also an optional month (and perhaps eventually an optional day). To handle this, I have the following code, which is much more flexible that my original (the original is not shown because it is just too embarrassing):


module.exports = function fileCacheGeoJSON(options){
    var root = process.connectEnv.staticRoot || options.root || process.cwd();
    var pathParams = options.pathParams || ['zoom','column'];
    var fileParam  = options.fileParam  || 'row';
    function getPath(req){
        var activeParams = pathParams.filter(function(a){return req.params[a];});
        var dirs = activeParams.map(function(a){return req.params[a];});
        var targetpath = [root,dirs.join('/')].join('/');
        return targetpath;
    }
    function getFile(req){
        var format = req.format || 'json';
        var filename = req.params[fileParam]+'.'+format;
        return filename;
    }
    return function fileCacheGeoJSON(req,res,next){
        var end = res.end;
        var targetpath = getPath(req);
        var filename = [targetpath,getFile(req)].join('/');
        console.log('in the tiler with '+filename);
        var localWriteEnd = function(doc){
            var writeOut = writeGeoJSON({file:filename,doc:doc});
            makeParentDir(targetpath,writeOut);
        };
        res.end=function(chunk,encoding){
            if(chunk && chunk.length){
                end(chunk); // send that off asap                                                                                                                                     
                localWriteEnd(chunk); // and then save to fs for next time                                                                                                            
            }
            res.end = end;
        };
        next();
    };
};

The key bit of code that I thought was pretty cool is buried in the getPath(req) function. First, because I don’t know which of the pathParams might be optional (becuase Connect.router allows optional values and even regex matching), I created a filter to remove those parameters that are not in the current request object.

    function getPath(req){
        var activeParams = pathParams.filter(function(a){return req.params[a];});
        ....

All the filter does is return the value of each possible pathParams member of the req.params object. If the member object exists, that is evaluated as true and the filter returns the original entry in the calling array. If the object does not contain a value for the named parameter, then the req.params[a] returns false, and that parameter is skipped.
So activeParams contains only those elements of pathParams that are active for the current request.

Next I created an almost identical map, as follows.

        ...
        var dirs = activeParams.map(function(a){return req.params[a];});
        var targetpath = [root,dirs.join('/')].join('/');
        ...

With a map, unlike a filter, the argument of the callback function is returned as an element in the new array. So here the call to req.params[a] is actually returning the value of the parameter for this call. If I hadn’t run the filter before the map, then the dirs array would possibly contain empty elements, and the resulting targetpath string would have annoying // pairs in it.

The final bit worth mentioning is the recursive directory creator. Recursive callbacks give me a headache, and I am never sure about the scope, so I was a little bit explicit here, but it works just fine:


function makeme (dir,cb){
    return function(){
        console.log('makeme: ',dir);
        fs.mkdir(dir,0777,function(err){
            if(err) { console.log(err) ;}
            if(cb) cb();
        });
    };
};

function recursive(path,next){
    return makeParentDir(path,next);
}
function makeParentDir(path,next){
    // recursively make sure that directory exists.                                                                                                                                   
    if(/(.*?)\/\w+$/.exec(path)){
        console.log('recursing: ',path);
        fs.stat(path,function(err,stats){
            if(err){
                console.log('no path ',path);
                return recursive(RegExp.$1,
                                 makeme(path,next));
            }else{
                console.log('have path, recusing ends at ',path);
                next();
            }

        });
    }else{
        console.log('regex failed on : ',path);
    }
    return;
}

Here makeme actually creates the directory, recursive is a possibly superfluous wrapper function around makeParentDir, and makeParentDir does the recursion back down the path until it finds a directory that exists. Because I’m never sure about recursion until I’ve proven to myself it works, I wrote the logging statements. The output from a successful call looks like this:

recursing:  /home/james/repos/jem/node-data-proxy/public/data/16/11310/26246/2008
no path  /home/james/repos/jem/node-data-proxy/public/data/16/11310/26246/2008
recursing:  /home/james/repos/jem/node-data-proxy/public/data/16/11310/26246
no path  /home/james/repos/jem/node-data-proxy/public/data/16/11310/26246
recursing:  /home/james/repos/jem/node-data-proxy/public/data/16/11310
no path  /home/james/repos/jem/node-data-proxy/public/data/16/11310
recursing:  /home/james/repos/jem/node-data-proxy/public/data/16
no path  /home/james/repos/jem/node-data-proxy/public/data/16
recursing:  /home/james/repos/jem/node-data-proxy/public/data
have path, recusing ends at  /home/james/repos/jem/node-data-proxy/public/data
makeme:  /home/james/repos/jem/node-data-proxy/public/data/16
makeme:  /home/james/repos/jem/node-data-proxy/public/data/16/11310
makeme:  /home/james/repos/jem/node-data-proxy/public/data/16/11310/26246
makeme:  /home/james/repos/jem/node-data-proxy/public/data/16/11310/26246/2008

And thus I’ve made one more incremental step to pushing data out in a useful form.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s