Contour Line

August 17, 2009

rockwall

Filed under: Uncategorized — jmarca @ 12:58 pm

So Grace is signed up for the kids rockwall class.  Hopefully she has as much fun as she had Sunday.

August 7, 2009

Maven skipped out on me in Eclipse!?

Filed under: Uncategorized — jmarca @ 12:51 pm

Strange as it may seem, Maven decided to stop working in Eclipse. I was trying to get an old project up in Eclipse to edit it to use the new Sakai K1 code, and couldn’t import it as an existing Eclipse site.  So I used the Maven import function, but had problems (it kept insisting on making 5 projects instead of one with 4 sub projects).  I was also having problems with my pom.xml files, so I decided to turn off some options with Maven in the Eclipse settings.

That was my mistake.  Something about what I did with the options was very bad, and Maven entirely disappeared from my Eclipse install.  No menu, no Window-> Preferences -> Maven category, no resolution of Maven repository, nothing.  I tried ripping out maven and reinstalling it, but no joy.

So I just deleted .eclipse from my home directory, and have to start over.

Sometimes I wonder if Eclipse is worth it.

update, the sakai app builder might be the culprit here

June 17, 2009

Good times

Filed under: Uncategorized — jmarca @ 8:14 pm

I remember when the Good Times email virus hoax hit my old company.  Ah those were the days.  (more…)

June 8, 2009

Not very regular posting

Filed under: lace knitting, starlight lace — jmarca @ 12:28 pm

I haven’t been posting anything here.  I’ve been trying out Twitter.  I think I get the idea.  But I really don’t want to talk to anybody.

I knit up a hat for Emma accidentally in starlight lace (from Barbara Walker, vol 2).  Pics to follow when I get them off my camera.  (more…)

April 28, 2009

Cow Chap

Filed under: couchdb — jmarca @ 9:37 pm

Digesting Couchapp. (more…)

April 23, 2009

Such a tool …

Filed under: Uncategorized — jmarca @ 9:55 pm

After a long hiatus from programming Sakai tools, I once again find the code base an opaque nest of terms.  Gotta get back into the Sakai way of thinking, so I’m going to write up my thoughts to make it easier the next time I take a break and get back into it.

What I want to do is properly integrate my couch glossary with Sakai.  So what I need is a java wrapper around the couch access.  I want the wrapper to accept simple jsonp calls, and emit json responses, just as the current couchdb-native glossary does.  I’m even up for serving the widget from a doc attached to the design doc, just as in couchdb-native.

So this has to be available everywhere, so it has to be a service.  I think.  Here is where sakai terminology just numbs me.  There is nothing in the Sakai confluence site (nothing recent, that is) describing how to program a simple service.  There is lots of awesome stuff up there to make writing tools easier, but I don’t want a tool.  A tool gets stuck in a site.  A site exists all by itself.  I want a service with a public stub inside Tomcat, I guess like the library?  Except library can be seen always.  I want a real webapp.  Just not a tool.

So I think that is a good start for how to code this up in Sakai.  Use the app builder to make a tool, just get rid of all the tool stuff, and pay close attention to getting in and out of the app from the web.  Make sure all access is mediated by the authorization service, and that should do it.

April 10, 2009

Close still doesn’t count …

Filed under: couchdb, research, transportation — jmarca @ 3:13 pm

… except for nukes and bocci.

I can *almost* make bootstrapping work, but not entirely within couchdb.  I am going to have to do external processing.  Which is probably fine.  (more…)

More thoughts on using bootstrap

Filed under: couchdb, research, transportation — jmarca @ 9:19 am

Closer, but still not yet there using bootstrap sampling in Couchdb.   My prior post was mostly thinking out loud.  I’ve tried some things since, and this post is an attempt to organize my thoughts on the topic.

(more…)

April 3, 2009

Bootstrap in a view

Filed under: couchdb, research, transportation — jmarca @ 11:35 am

Inspired by this post, I am playing around with implementing bootstrapping various statistics as a view in couchdb.  I am not a statistician, so my definition should not be used as gospel, but bootstrapping is a statistical method where one randomly samples from an observed set of data in order to determine some statistics, such as the mean or the median.  Most of the older sources I’ve read talk about using it for small to medium sized data sets, etc., and so the k samples are all of size n.  But I can’t do that—my input data is too big.  So I have to pick a smaller n.  So I’m going with 1,000 for starters, and repeat the draw 10,000 times.

(There’s probably a secondary bootstrap I can do there to decide on the optimal size of the bootstrap sample, but I’m not going to dive into that yet.) (more…)

March 10, 2009

Time and space

Filed under: research, transportation — jmarca @ 8:47 am

It takes a finite amount of time to process loop data into my database, and the results take up a finite amount of space.  So no matter what, if I process and save results, it will take time and space.  We’ve ordered a faster, bigger machine, and that will help speed things up and make space less of an issue, but there are more loop detectors to process.

So the presumption is that it is actually *worth* the time and space to compute and store the data.  This isn’t necessarily the case.  In fact, what I really want access to are the long-term averages of the accident risk values over time.  Going forward, I always want to keep around a little bit of data, but the primary use case is to compare historical averages (sliced and diced in various ways) to the current values.

The problem is that it is difficult to maintain historical trends without keeping the data handy.  As I’ve said in prior postings and in my notes, I really like how CouchDB’s map reduce approach allows the generation of different layers of statistics.  By emitting an array as the key, and a predicted risk quantity as the value, the reduce function that computes mean and variance will be run for a cascading tree of the keys.   So just by writing a map with a key like [loop_id,month,day, 15_minute_period], I can ask for averages over all data, over just a single loop, over a loop for a month, over a loop for a month for a particular Monday, etc etc.

On the other hand, this is limiting.  If I change my mind and want to aggregate over days but without splitting out months, or if I want to put a year field in there to evaluate annual variations, I can’t.  I have to rewrite the map, perhaps using the same view, and the whole shebang has to be recomputed—not trivial when the input set is about 15G per week.

As CouchDB matures, perhaps it will do a faster job computing views.  The approach is certainly there to parallelize the computations, but at the moment I only see a single process thrashing through the calculations.

Finally, if I delete old data, it isn’t clear to me how I would still maintain the running computations of mean and variance.  Technically it is possible—all you have to do is combine partial compuations, knowing the number of observations that fed into each one.  But practically, I have a feeling that when I delete input data, the output will get blown away.

Perhaps the best approach is to maintain couchdb for just a day’s worth of data, and run a separate postgresql process to store the map reduce output.  Then as couchdb matures, I can eventually store longer and longer time periods, but at all times I have a record of past history.

I think a table storing 5 minute-rounded timestamp, loop id, as the key, and all the different mean, variance, and count values for all of the different risk predictions would be good.  This would then feed higher level aggregation tables (like day, year, and so on).  By keeping the 5 minute mean and variance, I can compute any other variance pretty quickly (average across all loops, average for that day, average for a year of that loop and 5 minute period, etc).

« Previous PageNext Page »

Blog at WordPress.com.