Contour Line

April 3, 2009

Bootstrap in a view

Filed under: couchdb, research, transportation — jmarca @ 11:35 am

Inspired by this post, I am playing around with implementing bootstrapping various statistics as a view in couchdb.  I am not a statistician, so my definition should not be used as gospel, but bootstrapping is a statistical method where one randomly samples from an observed set of data in order to determine some statistics, such as the mean or the median.  Most of the older sources I’ve read talk about using it for small to medium sized data sets, etc., and so the k samples are all of size n.  But I can’t do that—my input data is too big.  So I have to pick a smaller n.  So I’m going with 1,000 for starters, and repeat the draw 10,000 times.

(There’s probably a secondary bootstrap I can do there to decide on the optimal size of the bootstrap sample, but I’m not going to dive into that yet.) (more…)

Blog at WordPress.com.