Iterating view and doc designs multiple times

Just a quick post so that I remember to elaborate on this later.  I have found that whenever I have a large project to do in CouchDB I go through several iterations of designing the documents and the views.

My latest project is typical.

  1. First design was to push in really big documents.  The idea was to run map reduce copy the reduce output to a second db, and map reduce that for the final result.   But the view generation was too slow, I never got around to designing the second db, and the biggest documents triggered a bug/memory issue.
  2. Continue reading


Watching views build oh so slowly

I have an application that is taxing my PostgreSQL install, and I’ve been taking a whack at using CouchDB to solve it instead.

On the surface, it looks like a pretty good use case, but I’m having trouble getting it to move fast enough.

In a nutshell, I am storing the output of a multiple imputation process. At the moment my production system uses PostgreSQL for this. I store each imputation output, one record per row. I have about 360 million imputation stored this way.

Each imputation represents an estimate of conditions at a mainline freeway detector. That is done in R using the excellent Amelia package. While the imputation is done for all lanes at the site, because I am storing the data in a relational database with a schema, I decided to store one row per lane. Continue reading

Replicator database in practice

The replicator database in couchdb is cool, but one needs to be mindful when using it.

I like it better than sending a message to couch db to replicate dbx from machine y to machine z, because I can be confident that even if I happen to restart couch, that replication is going to finish up.

The problem is that for replications that are not continuous, I end up with a bunch of replication entries in the replicator database. Thousands sometimes. Until I get impatient and just delete the whole thing.

For the way I use it, the best solution is to write a view into the db to pick off all of the replications that are not continuous and that have completed successfully, and then do a bulk delete of those documents. But I’m never organized enough to get that done.

Here’s hoping such a function finds its way into Futon some day.