The choice between hard-coding and parameterization

I am revising some older code, and once again facing the choice of whether to hard-code database names and database tables in my code. Ordinarily I’d say no way, that’s stupid. But in this case, it might not be.

This post has zero pat answers, and is really just me thinking about things now so that my future self can revisit my thoughts and reevaluate my conclusions. However, I haven’t written a post in a while, and there is a slight chance someone else might find this useful in some way. Also, I’ve been chosen to present at this year’s PGConf US, and my talk is about testing SQL, so this is somewhat relevant to that.

So the code. First of all, a while back I wrote a general purpose query generator to grab shape data from PostgreSQL/PostGIS. See https://github.com/jmarca/shapes_postgis. That library allows me to do some fairly complicated queries from my express-based server through liberal use of parameterization. For example, in my tests, I have a handler defined as:

var app = express()
var vds_options=_.assign({
    'db':'osm'
    ,'table':'newtbmap.tvd'
    ,'alias':'tvd'
    ,'select_properties':{'tvd.freeway_id' : 'freeway'
                          ,'tvd.freeway_dir': 'direction'
                          ,"'vdsid_' || id"   : 'detector_id'
                          ,'vdstype'        : 'type'
                         }
    ,'id_col':'detector_id'
    ,'dynamic_where_clause':{'vdstype':{'lhs':'vdstype',
                                        'comp':'~*'
                                       }}
    ,'username' : config.postgresql.auth.username
    ,'password' : config.postgresql.auth.password
},config.postgresql)

var vdsservice = shape_service(vds_options)

app.get('/points/:zoom/:column/:row.:format'
        ,function(req,res,next){
            vdsservice(req,res,next)}
       )
var server=http
       .createServer(app)
       .listen(_testport,done)

This is cool because the query coming in from the client can specify which type of vds to return. For example (again pulling from the tests), the query can be:

request({url:'http://'+ testhost +':'+_testport+'/points/10/174/407.json?vdstype=\'ff\''
         ,'headers':{'accept':'application/json'}}
        ,function(e,r,b){
            if(e) return done(e)
                ...

By adding vdstype='ff' in my http query, the generated SQL query will limit the result to those entries that only match “ff”. The generated SQL is:

with bounding_area as (
    select geom4326 as geom
    from public.carb_airdistricts_aligned_03
    where dis='SC'
)
SELECT tvd.freeway_id as freeway,
       tvd.freeway_dir as direction,
       'vdsid_' || id as detector_id,
       vdstype as type,
       st_asgeojson(ST_Simplify(
            (ST_Dump(
                ST_Intersection(bounding_area.geom,tvd.geom)
            )).geom,0.1),1) as geojson
FROM newtbmap.tvd as tvd
JOIN bounding_area ON (st_intersects(tvd.geom,bounding_area.geom))
WHERE vdstype~*'ff'

I built this library to stop manually writing up these sorts of queries every time. The basic pattern was that I wanted some point, line, or shape data that fell within a larger area, say a county boundary. So I need a with statement to define the boundary, and I need to allow for different names for geometry columns, etc etc. I also allow a join table parameter object, so that one can query columns from join tables in the final query. For example, if the above query pulls out freeway to freeway ramp detectors, I might want to join those with a table containing the dates between which the detector was active, and perhaps another table containing the daily total volumes.

The problem I’m facing now is defining the database names, table names, and join table names. As I said at the outset, I tend to avoid hard-coding strings in my code. But in this case, the code clearly knows lots and lots about the database schema already, so perhaps it makes sense to expect it to also know the name of the tables and database.

The code in question I’m revising now is below. It defines a route handler that returns county data inside of the usual /zoom/column/row.json type web-mapping query. It looks like:

function carb_areas(options,app){

    var opts = {'db':'spatialvds'
                ,'table':'public.carb_counties_aligned_03'
                ,'alias':'counties'
                ,'host':options.host
                ,'port':options.port
                ,'username':options.auth.username
                ,'password':options.auth.password
                ,'select_properties':{'gid'           : 'gid'
                                      ,'a.fips'         :'fips'
                                      ,'cacoa_'       : 'cacoa_'
                                      ,'cacoa_id'     : 'id'
                                      ,'coname'       : 'coname'
                                      ,'a.name'         : 'name'
                                      ,'conum'        : 'conum'
                                      ,'display'      : 'display'
                                      ,'symbol'       : 'symbol'
                                      ,'islandname'   : 'islandname'
                                      ,'baysplinte'   : 'baysplinte'
                                      ,'cntyi_area'   : 'cntyi_area'
                                      ,'island_id'    : 'island_id'
                                      ,'bay_id'       : 'bay_id'
                                     }
                ,'id_col':['fips','gid']
                ,'geo_col':'geom4326'
                ,'join_tables':[{'table':'counties_fips'
                                 ,'alias':'a'
                                 ,'join' :'on (counties.name ~* a.name)'}
                               ]
               }
    var county_handler = shape_service(opts)
    // set the routes
    app.get('/counties/:zoom/:column/:row.:format?'
           ,function(req,res,next){
                return county_handler(req,res,next)
            }
           )
    app.get('/counties.:format?'
           ,function(req,res,next){
                return county_handler(req,res,next)
            }
           )
}

Clearly, as I write the code, I know the columns I want, and the ids to be used to join the counties_fips table with the main counties geometry table in order to also retrieve each county’s fips code. However the name of the database is different on my development server than on my production server, and if I want to write a test for this, I’m going to want to create a database on the fly with a different name again. Furthermore, I’ve been thinking about reorganizing my database to make better use of schemas, so instead of the table name being public.carb_counties_aligned_03, it might become shapes.carb_counties_aligned_03. And suppose down the road the table is updated from the 2003 aligned shapes to, say, 2016 aligned shapes. In short, I have a strong case to pass in the database name and table name as options. Something like:

function carb_areas(options,app){

    var opts = {'db':options.db // 'spatialvds'
                ,'table':options.table //'public.carb_counties_aligned_03'
                ,'alias':'counties'
                ,'host':options.host
                ,'port':options.port
                ,'username':options.auth.username
                ,'password':options.auth.password
                ,'select_properties':{'gid'           : 'gid'
                                               ...
                                     }
                ,'id_col':['fips','gid']
                ,'geo_col':'geom4326'
                ,'join_tables':[{'table':'counties_fips'
                                 ,'alias':'a'
                                 ,'join' :'on (counties.name ~* a.name)'}
                               ]
               }

However, if I allow the main table to be a passed-in parameter, then what do I do about the “join tables” option? That should also be a passed in parameter, because I might move counties_fips out of the public schema and into, say, a “metadata” schema. But that starts to open a can of worms. If I parameterize counties_fips as options.join_table, what about the join variables, and so on?

Conceptually, this options object var opt is just JSON, and I could stick it all in an external query.json file. Actually, I kind of like that result. The new function definition would look like:

function carb_areas(options,configfile,app){
    var query_config = ... async config file loading code ... ;

    // extend query_config with global db options
    var opts = _.assign(query_config,options)
    var county_handler = shape_service(opts)
    // set the routes
    app.get('/counties/:zoom/:column/:row.:format?'
           ,function(req,res,next){
                return county_handler(req,res,next)
            }
           )
    app.get('/counties.:format?'
           ,function(req,res,next){
                return county_handler(req,res,next)
            }
           )
}

Here there is a bit of pseudo-code to handwave loading the query.json config file just once at initialization/program load time, and I’m also using lodash “assign” function to overload the config file with global database options like host, port, username, and password.

The external query.json file would look like:

{'db':'spatialvds'
 ,'table':'public.carb_counties_aligned_03'
 ,'alias':'counties'
 ,'select_properties':{'gid'           : 'gid'
                       ,'a.fips'         :'fips'
                       ,'cacoa_'       : 'cacoa_'
                       ,'cacoa_id'     : 'id'
                       ,'coname'       : 'coname'
                       ,'a.name'         : 'name'
                       ,'conum'        : 'conum'
                       ,'display'      : 'display'
                       ,'symbol'       : 'symbol'
                       ,'islandname'   : 'islandname'
                       ,'baysplinte'   : 'baysplinte'
                       ,'cntyi_area'   : 'cntyi_area'
                       ,'island_id'    : 'island_id'
                       ,'bay_id'       : 'bay_id'
                      }
 ,'id_col':['fips','gid']
 ,'geo_col':'geom4326'
 ,'join_tables':[{'table':'counties_fips'
                  ,'alias':'a'
                  ,'join' :'on (counties.name ~* a.name)'}
                ]
}

So. Thanks to writing this post up, I’ve decided to stash the query definitions in external JSON files.

The danger now is that someone is going to faff about with the JSON and break the query, but that’s true of the code as well. I think it is a fair trade-off. This also somewhat solves my “test/development vs production” issue, in that I can overload any differences in table names, etc by using the second options object to change things like database names that are different in the different environments.

Now to stop writing and get coding.

Update: The end result is so much cleaner now: https://github.com/jmarca/spatial_routes/blob/master/lib/carb_areas.js

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s