Consistency isn’t a design goal for CouchDB

Okay, so figure 2.1 of the couch db book says that consistency isn’t a goal of CouchDB.  So my prior post worrying about the fact that there are no foreign keys or FK constraints, etc., could result in inconsistent statement isn’t something I should worry about.  Instead, I should expect that data from the database may be internally inconsistent, one record to the next, and try to minimize my reliance upon the DB to maintain consistency.

I understand that the figure probably doesn’t refer to consistency in the same way that I am, but so what?  If I have data in a postgresql db, then I can make sure that the city of Orange is in Orange County which is in District 12 by using join tables.  The join table data will always be consistent across db nodes, and will prevent me from making false statements about what district the city of Orange is in.  At the same time, as a side effect, these foreign keys allow me to do joins in queries that let me get all of the cities in District 12.  Or all of the VDS detectors inside of a city, and so on.

In CouchDB, the solution for the second problem, the query all detectors in D12, etc., is to stuff the path data you might want to search on into the document.  This is bad from a design standpoint, because it forces the app to maintain the consistency of each node’s path data.  Apps make mistakes.  And frankly, I don’t need that sort of path searching capability.  I would like to select every node in district 12, but not by city or by county or whatnot.  If I really want to build the full tree, the best way is to store just the parent node, and then rebuild the tree with a series of recursive queries.  The join-table side effect of consistent databases isn’t available, so I need to stop trying to use it.

Advertisements

4 thoughts on “Consistency isn’t a design goal for CouchDB

  1. Noah,

    I think you mistake what I am calling bad. I’m talking about *my* design, not couchdb’s design! I am criticizing my desire to graft on RDBMS approaches to a couchdb database. I don’t *need* joins, but I am forcing them on my database design by including the whole path as an attribute for a node in order to fake a traditional foreign key join. So I have design goals, and I can make the statement that my design is bad compared to them!

    As to the post, I am well aware of the post you reference. I just commented on it yesterday (after writing this post) on the couchdb user mailing list, with the thread branch starting about here:

    http://mail-archives.apache.org/mod_mbox/couchdb-user/200812.mbox/%3cF966D10F-78E0-43BB-BE02-BCEF6A2367B9@prima.de%3e

    The reason I say the design of my database is bad (with joins accomplished in the same way documented in the link you reference) is for two reasons. First, the presumption is that the path stored in the node will remain constant. In fact, the path may not remain constant, and I am not implementing any sort of checking to fix that state. In a relational database, I can accomplish the join by looking at foreign keys. If something changes in the database, the foreign keys will be updated accordingly. The information is *not* local to the node, and the expectation that the fk information will be accurate and consistent can be enforced by sticking various constraints on the join table in question. In my couchdb design, the local node claims that its parents’ hierarchy is “x->y->z” without actually checking if that statement is true. There isn’t any mechanism (aside from multiple queries, of course) to make sure that the statement is true. That is a bad design. If I really needed each leaf node to maintain knowledge of its parents’ hierarchy, and if I really wanted to be sure of that information, I would need some other mechanism to do it. If I didn’t care whether or not the document’s self-reported path was correct or not, then I would be happy with this solution (sticking the path in the document). Neither case holds for my application (I don’t need to know the path for most applications, and when I do need it, I must be sure that it is correct).

    The second reason I say it is a bad design is that I don’t need the join in practice, as I state in my original post. I’m using it because I used to use it because it was a cheap side effect from the RDBMS database. All I really need is the top level of the hierarchy and the immediate parent, not the entire tree. It is nice to know if a detector is on a such and such a freeway, or the data is meaningless. It is also nice to be able to extract aggregate data by district. But keeping track of county, city, post code, and so on in the leaf document is stupid because I don’t ever actually use that information in practice. And if I do need it, I have a PostgreSQL/PostGIS database I can use to fire off some geographic queries. So it is cruftiness left over from a prior iteration, and I need to cut it out.

    So, my original design plan is bad for a couchdb database.

  2. No problem. I need to work on my blog writing so it is less like my git commit notes and in-line comments, and more like public discourse!

    I do like the feel of couchdb, and my colleague and I are trying to figure out what it is by throwing different apps at it and taking its measure. Part of that process is fixing our old ways of thinking. That’s more or less what these blog posts are going to be about—documenting my confusion as I turn away from the reflections on the cave wall, and towards the light coming in from the outside.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s