<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Contour Line</title>
	<atom:link href="http://contourline.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://contourline.wordpress.com</link>
	<description>Surround and define the edges of a subject, giving it shape and volume</description>
	<lastBuildDate>Wed, 18 Jan 2012 23:57:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='contourline.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Contour Line</title>
		<link>http://contourline.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://contourline.wordpress.com/osd.xml" title="Contour Line" />
	<atom:link rel='hub' href='http://contourline.wordpress.com/?pushpress=hub'/>
		<item>
		<title>How big is &#8220;too big&#8221; for documents in CouchDB:  Some biased and totally unscientific test results!</title>
		<link>http://contourline.wordpress.com/2012/01/18/how-big-is-too-big-for-documents-in-couchdb-some-biased-and-totally-unscientific-test-results/</link>
		<comments>http://contourline.wordpress.com/2012/01/18/how-big-is-too-big-for-documents-in-couchdb-some-biased-and-totally-unscientific-test-results/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 23:56:54 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[couchdb]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=547</guid>
		<description><![CDATA[I have been storing documents somewhat heuristically in CouchDB. Without doing any rigorous tests, and without keeping track of versions and the associated performance enhancements, I have a general rule that tiny documents are too small, and really big documents are too big. To illustrate the issues, consider a simple detector that collects data every [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=547&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have been storing documents somewhat heuristically in CouchDB.  Without doing any rigorous tests, and without keeping track of versions and the associated performance enhancements, I have a general rule that tiny documents are too small, and really big documents are too big.</p>
<p>To illustrate the issues, consider a simple detector that collects data every 30 seconds.  One approach is to create one document per observation.  Over a day, this will create 2880 documents (except for those pesky daylight savings time days, of course).  Over a year, this will create over a million documents.  If you have just one detector, then this is probably okay, but if you have thousands or millions of them, this is a lot of individual documents to store, and disk size becomes an issue.<br />
<span id="more-547"></span><br />
Without doing careful tests, I found that the compaction approach taken by CouchDB seems to compact per document.  Because compaction algorithms often achieve their gains by eliminating duplicated chunks of text, a very small document won&#8217;t give the algorithm much chance to remove duplicates, while a very large document will have lots of duplicates.  This is especially true if your documents are verbose and look like:</p>
<p><pre class="brush: jscript;">
{ 'volume':10, 'occupancy':0.01, 'timestamp':'2007-01-16 08:02:30 UTC' }
</pre></p>
<p>The compact algorithm can&#8217;t do much with just one such entry in a document.  But if a document contains hundreds or thousands of such entries, the compactor can usually do a pretty good job of making those verbose labels only a minor hit on the total disk storage size.</p>
<p>Given that I have years of old data laying around, I next tried to store a year of data per document.  I figured this would give the most bang for my compaction buck, and indeed it did.  However, the documents were <strong>too</strong> big.  The new problem I faced was that CouchDB views would mysteriously fail with OS process timeout errors.</p>
<p>My middle ground choice was to use one day per document, which also fit fairly well with my usage patterns.  While there were some cases in which I wanted to be able to draw just a few hours of a day out of the database, most of the time I wanted at least a day.</p>
<p>One day of data for my application consists of about 800 to 900 KB.  If I run gzip on a single day&#8217;s JSON file, the size will shrink by a factor of 4.  I created one database per detector, and each database (after compaction) eats up only about 150 to 200 MB.  </p>
<p>Another issue with large documents is view generation.  Because I really want details on each observation, my views emit one row per 30s.  This means each document is sliced up into thousands of little rows in the view.  Here I suspect that I have no choice but to swallow the resulting file size.  A typical view before compaction is about 49MB, and after view compaction, that same view is still 43MB, which isn&#8217;t much of a reduction.</p>
<p>The last issue that is only somewhat related to the size of the document is the performance penalty from using a non-trivial reduce function. Typically, my views would have a map that would do something (apply some model, etc) to the raw data, and then emit one row per observation.  Those rows would then be picked up by the reduce function and collated in some way.  Until yesterday, my semi-standard reduce function would find the minimum, the maximum, the summed value, and the total count of values, and return that four element array.  </p>
<p>While I know that using the built-in functions is faster, I used to believe that the built-in Erlang reduce functions (_sum, _count, and _stats) could only be used on single value outputs, and about three-quarters of my views needed to dump arrays of numbers, not just a single value.  However, the hard cold empirical fact that my view was too slow led me to try something else.  Because I have thousands of databases, I&#8217;ve written a node.js program to apply views and trigger view generation.  Over the long MLK weekend, I ran the program applying 3 different views to a single year of data, and when I checked in on the process Tuesday morning none of the 3 view-application jobs had finished.  </p>
<p>On a whim, I checked out the latest version of the CouchDB source, grepped for _sum and looked at the code.  I don&#8217;t know Erlang from Lisp, but it certainly seemed like there was a special case or two set up to handle lists of numbers, so I removed my long reduce and rereduce code and replaced it with <code>_sum</code>.  And it worked!</p>
<p>For the record, my views emit as keys some identifying characteristics, and as values the output of some function run on the raw data.  The output without running reduce looks like this:</p>
<p><pre class="brush: jscript;">
[&quot;2007-01-01&quot;, &quot;1201558&quot;, &quot;2007-01-01 00:19:30 UTC&quot;]  [0.000003690180399121373, 0.000006166469216729739, 0.000007345562901578842, 0.000005175674086671691]
[&quot;2007-01-01&quot;, &quot;1201558&quot;, &quot;2007-01-01 00:20:00 UTC&quot;]  [0.000003627454559477772, 0.000006325518959705681, 0.00000759499639553572, 0.000005182026970214938]
[&quot;2007-01-01&quot;, &quot;1201558&quot;, &quot;2007-01-01 00:20:30 UTC&quot;]  [0.000003599300956965342, 0.00000656187359400101, 0.000006868382743588068, 0.000005119872197890806]
</pre></p>
<p>I emit both the day and the timestamp because that way I can group by days in my reduce, get the output from each timestamp if I do not run reduce.  My reduce function is simple the plain, vanilla <code>_sum</code> function.  Reducing the above view output with <code>group_level = 2</code> gives:</p>
<p><pre class="brush: jscript;">
[&quot;2007-01-01&quot;, &quot;1201558&quot;]  [0.01011209205149, 0.01385255704737369, 0.02016985702928209, 0.009153296503432225]
[&quot;2007-01-02&quot;, &quot;1201558&quot;]  [0.01065079552287105, 0.01443181962454101, 0.01961418238906556, 0.01053126869622315]
[&quot;2007-01-03&quot;, &quot;1201558&quot;]  [0.01132393733829414, 0.01471054680281484, 0.01937133599383705, 0.0092336843625161]
</pre></p>
<p>While the views still take a few minutes each to apply, they are definitely faster than the old way.  In just 18 hours I was able to apply a single model to 3 years of data, whereas with the old view I wasn&#8217;t able to apply three views to one year of data even after a long weekend of computer time.  Not quite apples and apples, but the new approach is going much faster.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/547/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/547/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/547/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/547/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/547/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/547/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/547/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/547/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/547/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=547&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2012/01/18/how-big-is-too-big-for-documents-in-couchdb-some-biased-and-totally-unscientific-test-results/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
		<item>
		<title>I want never gets</title>
		<link>http://contourline.wordpress.com/2012/01/07/i-want-never-gets/</link>
		<comments>http://contourline.wordpress.com/2012/01/07/i-want-never-gets/#comments</comments>
		<pubDate>Sun, 08 Jan 2012 02:28:31 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=538</guid>
		<description><![CDATA[I want to buy an Atlantis from Rivendell . For some reason, I can never pull the trigger.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=538&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I want to buy an <a href='http://www.rivbike.com/product-p/f-atlantis.htm'>Atlantis from Rivendell </a>.  For some reason, I can never pull the trigger.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/538/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/538/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/538/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=538&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2012/01/07/i-want-never-gets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
		<item>
		<title>Iterating view and doc designs multiple times</title>
		<link>http://contourline.wordpress.com/2011/12/11/iterating-view-and-doc-designs-multiple-times/</link>
		<comments>http://contourline.wordpress.com/2011/12/11/iterating-view-and-doc-designs-multiple-times/#comments</comments>
		<pubDate>Mon, 12 Dec 2011 00:31:28 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[couchdb]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=541</guid>
		<description><![CDATA[Just a quick post so that I remember to elaborate on this later.  I have found that whenever I have a large project to do in CouchDB I go through several iterations of designing the documents and the views. My latest project is typical. First design was to push in really big documents.  The idea [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=541&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Just a quick post so that I remember to elaborate on this later.  I have found that whenever I have a large project to do in CouchDB I go through several iterations of designing the documents and the views.</p>
<p>My latest project is typical.</p>
<ol>
<li>First design was to push in really big documents.  The idea was to run map reduce copy the reduce output to a second db, and map reduce that for the final result.   But the view generation was too slow, I never got around to designing the second db, and the biggest documents triggered a bug/memory issue.</li>
<p><span id="more-541"></span></p>
<li>The second design was to push in really small documents.  I had an insight, and the small documents were better designed that the rows of the first version&#8217;s megadocuments.  But I was generating over 90 million docs spread over four databases.  The view took even longer to generate than the first go around, and CouchDB complained that my reduce wasn&#8217;t reducing fast enough.  I turned off that warning and soldiered on, but gave up after 24 hours and 1% view generation.</li>
<li>The third design collected 100 documents from design 2 into a single doc, and used essentially the same map/reduce (accounting for the slight difference in the document design).  Still had the &#8220;not reducing fast enough warning.  The idea is to reduce the marshalling of docs into and out of JSON by a factor of 100.</li>
<li>The fourth design switched to 50 docs rather than 100, because bulk uploading was crashing CouchDB repeatedly with the larger doc size.  I also figured out that I did not need the reduce step, as my client code can handle aggregating 4 or 5 documents per key without issue.</li>
<li>The fifth design changed the id of the doc to match the sha256 hash of the data, so that I wouldn&#8217;t duplicate data uploads.</li>
<li>This design fixed the code that generated the sha256 hash!  Plus tweaks to the view to optimize the JavaScript</li>
<li>The final design made sure that the timestamps of the original data were sorted prior to generating the 50 observation-long documents, so that I was guaranteed to always get the same output from the same input raw file, and therefore generate the same hash key/doc id, and therefore not upload data twice.  Plus code optimization attempts to the view JS.</li>
</ol>
<p>A few of those iterations were done on really small databases with just a few documents, but some of the problems only cropped up when I had my full set of data getting processed.  In the end however, I now have my data stored and ready to read and it now loads much faster than the equivalent data in PostgreSQL.</p>
<p>Finally, contrary to my old idea that a view is like a SQL query, I now think of it as a far more expressive version of a SQL index.  Like indexes, very simple views that generate a sorting of the data are more generally useful, but unlike a PostgreSQL index, it is possible to write exactly what you need for an exact query, and make it run super duper fast.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/541/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=541&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2011/12/11/iterating-view-and-doc-designs-multiple-times/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
		<item>
		<title>Watching views build oh so slowly</title>
		<link>http://contourline.wordpress.com/2011/12/08/watching-views-build-oh-so-slowly/</link>
		<comments>http://contourline.wordpress.com/2011/12/08/watching-views-build-oh-so-slowly/#comments</comments>
		<pubDate>Fri, 09 Dec 2011 06:50:20 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[couchdb]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=539</guid>
		<description><![CDATA[I have an application that is taxing my PostgreSQL install, and I&#8217;ve been taking a whack at using CouchDB to solve it instead. On the surface, it looks like a pretty good use case, but I&#8217;m having trouble getting it to move fast enough. In a nutshell, I am storing the output of a multiple [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=539&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have an application that is taxing my PostgreSQL install, and I&#8217;ve been taking a whack at using CouchDB to solve it instead.</p>
<p>On the surface, it looks like a pretty good use case, but I&#8217;m having trouble getting it to move fast enough.</p>
<p>In a nutshell, I am storing the output of a multiple imputation process. At the moment my production system uses PostgreSQL for this. I store each imputation output, one record per row. I have about 360 million imputation stored this way.</p>
<p>Each imputation represents an estimate of conditions at a mainline freeway detector. That is done in R using the excellent Amelia package. While the imputation is done for all lanes at the site, because I am storing the data in a relational database with a schema, I decided to store one row per lane. <span id="more-539"></span></p>
<p>The problem with that approach is that I have to run a nested select statement to use the data.  First I have to go through and collect all the lanes for a site for a given timestamp, and then average over the multiple imputations.  Then with the average of the imputations (or whatever is relevant to the problem), I would then sum up (or average or weight) the results across times for the detector.  So if I wanted to</p>
<p>Each imputation run computes an entire year, and saves the output to a CSV file.  I would then parse that file and save to the database.  My initial attempt to copy this to couchdb was a straight copy, with a slight twist.  I just saved the entire year of imputation output to the database as a document.</p>
<p>Unfortunately, for sites with 5 lanes for which I performed 5 imputations, this triggered a bug deep in the bowels of CouchDB version 1.2.x during view generation.  From reading the bug reports and fixes, I think what was happening was that the JavaScript engine wasn&#8217;t getting allocated enough space to handle the very large doc.  I played with some settings, but nothing helped so I quit for a few days.</p>
<p>Then when I next returned to it, I realized that CouchDB didn&#8217;t have the same schema restriction as PostgreSQL, so I could just save all of the lanes in a single row of data.  So that was version 2.  I saved one document per imputation per time stamp per detector, merging out the per lane part when loading up the database from the CSV dump.  This is why I know I have 360 million odd documents in PostgreSQL&#8230;I generated 90 million or so documents in CouchDB, and my view was taking an age to generate even with the documents split between 4 different databases.  I also ran into the &#8220;you&#8217;re not reducing fast enough&#8221; scold.  The problem is that sometimes a site only has one or two imputations, while other sites have 4 and 5.  But I really wanted to run the reduce step for convenience. I wanted to just get a document that summarized all of the multiple imputations, rather than forcing my app to do the summary work.</p>
<p>That said, it is only a handful of imputations per site, so in this case I think the reduce is an incredible waste of computation power.  Cleaner design, but death by 90 million cuts.</p>
<p>I also decided to try to bulk up the documents a bit.  Rather than one document per time stamp per detector per imputation, I am currently trying to store 100 imputations per document, with no attempt at all to order or sort those 100&#8230;I just slice them off the docs array and reformat things prior to calling bulk docs.  My idea is to try to speed up the view generation by dividing by 100 the number of times that CouchDB has to serve up, parse, and return a JSON document.</p>
<p>However, this seems to be triggering another bug in CouchDB, but it is hard to detect.  I have 4 processes sending data to the CouchDB server.  3 of them are on the same machine as the CouchDB server, and the other two are on two different machines, because I spread the imputation jobs around.  Now CouchDB will just crash and restart periodically.  I think my choice of 100 documents was perhaps too high, and the bulk docs server is eating up just a little bit too much RAM so that it eventually gets killed.</p>
<p>As soon as the uploading stops I will run my views and see how fast they go now with the bigger doc size and the lack of a reduce.  I ran it on 1000 docs and it was reasonably fast, but it is hard to tell from that few documents.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/539/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/539/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/539/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/539/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/539/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/539/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/539/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/539/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/539/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/539/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/539/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/539/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/539/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/539/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=539&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2011/12/08/watching-views-build-oh-so-slowly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
		<item>
		<title>Replicator database in practice</title>
		<link>http://contourline.wordpress.com/2011/12/07/replicator-database-in-practice/</link>
		<comments>http://contourline.wordpress.com/2011/12/07/replicator-database-in-practice/#comments</comments>
		<pubDate>Thu, 08 Dec 2011 07:19:54 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[couchdb]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=534</guid>
		<description><![CDATA[The replicator database in couchdb is cool, but one needs to be mindful when using it. I like it better than sending a message to couch db to replicate dbx from machine y to machine z, because I can be confident that even if I happen to restart couch, that replication is going to finish [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=534&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The replicator database in couchdb is cool, but one needs to be mindful when using it.  </p>
<p>I like it better than sending a message to couch db to replicate dbx from machine y to machine z, because I can be confident that even if I happen to restart couch, that replication is going to finish up.</p>
<p>The problem is that for replications that are not continuous, I end up with a bunch of replication entries in the replicator database.  Thousands sometimes. Until I get impatient and just delete the whole thing.</p>
<p>For the way I use it, the best solution is to write a view into the db to pick off all of the replications that are not continuous and that have completed successfully, and then do a bulk delete of those documents.  But I&#8217;m never organized enough to get that done.</p>
<p>Here&#8217;s hoping such a function finds its way into Futon some day.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/534/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=534&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2011/12/07/replicator-database-in-practice/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
		<item>
		<title>When R and JSON fight</title>
		<link>http://contourline.wordpress.com/2011/11/09/when-r-and-json-fight/</link>
		<comments>http://contourline.wordpress.com/2011/11/09/when-r-and-json-fight/#comments</comments>
		<pubDate>Wed, 09 Nov 2011 22:01:58 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[couchdb]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=530</guid>
		<description><![CDATA[I have a love hate relationship with R. R is extremely powerful and lots of fun when it works, but so often I spend hours at a time wondering what is going on (to put my irritation in printable prose) Today I finally figured out a nagging problem. I am pulling data from CouchDB into [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=530&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have a love hate relationship with R.  R is extremely powerful and lots of fun when it works, but so often I spend hours at a time wondering what is going on (to put my irritation in printable prose)</p>
<p>Today I finally figured out a nagging problem.  I am pulling data from CouchDB into R using the excellent RJSONIO and RCurl libraries.  JSON has a strict requirement that unknown values are called null, while R has a more nuanced concept that includes NA as well as NULL.  My original usage of the RJSONIO library to save data to CouchDB had to account for this fact, by using a regular expression to convert NA to proper JSON null values.  (I think the latest version of RJSONIO might actually handle this better, but I haven&#8217;t checked as my current code works fine since the regex is conditional).</p>
<p>Now coming the other way, from CouchDB into R, RJSONIO&#8217;s <code>fromJSON()</code> function will happily convert JSON null values into R NULL values.  My little <code>getCouch()</code> function looks like this:</p>
<p><pre class="brush: r;">
couch.get &lt;- function(db,docname, local=TRUE, h=getCurlHandle()){

  if(length(db)&gt;1){
    db &lt;- couch.makedbname(db)
  }
  uri &lt;- paste(couchdb,db,docname,sep=&quot;/&quot;);
  if(local) uri &lt;- paste(localcouchdb,db,docname,sep=&quot;/&quot;);
  ## hack to url encode spaces
  uri &lt;- gsub(&quot;\\s&quot;,&quot;%20&quot;,x=uri,perl=TRUE)
  fromJSON(getURL(uri,curl=h)[[1]])

}
</pre></p>
<p>The key line is the last one, where the results of RCurl&#8217;s <code>getURL()</code> function are passed directly to RJSONIO&#8217;s <code>fromJSON()</code> and then returned to the caller.</p>
<p>In my database, to save space, each document is a list of lists for a day.<br />
<pre class="brush: jscript;">
{
   &quot;_id&quot;: &quot;1213686 2007-02-28 10:38:30&quot;,
   &quot;_rev&quot;: &quot;1-c8f0463d1910cf4e89370ece6ef250e2&quot;,
   &quot;data&quot;: {
       &quot;nl1&quot;: [9,12,12, ... ],
       &quot;nr1&quot;: [ ... ],
       ...
       &quot;ts&quot; : [ ... ]
   }
}
</pre></p>
<p>Every entry in the <code>ts</code> list has a corresponding entry in every other array in the data object, but that entry could be null.  This makes it easy to plot the data against time (using d3, but that is another post) or reload back into R with a timestamp.</p>
<p>But loading data into R isn&#8217;t quite the one-liner I was expecting, because of how R handles NULL compared to NA.  My first and incorrect attempt was:</p>
<p><pre class="brush: r;">
alldata &lt;- doc$data
colnames &lt;- names(alldata)
## deal with non ts first
varnames &lt;-  grep( pattern=&quot;^ts$&quot;,x=colnames,perl=TRUE,invert=TRUE,val=TRUE )
## keep only what I am interested in
varnames &lt;-  grep( pattern=&quot;^[no][lr]\\d+$&quot;,x=varnames,invert=TRUE,perl=TRUE,val=TRUE )
data.matrix &lt;- matrix(unlist(alldata[varnames]),byrow=FALSE,ncol=length(varnames))
</pre></p>
<p>First I grab just the <code>data</code> object, pull off the variables of interest, then make a matrix out of the data.</p>
<p>The problem is that the recursive application of <code>unlist</code> buried in the matrix command.  The <code>alldata</code> object is really a list of lists, and some of those lists have NULL values, so recursive  application of <code>unlist</code> <strong>SILENTLY</strong> wipes out the NULL values  (So IRRITATING!)</p>
<p>Instead what you have to do is carefully replace all numeric NULL values with what R wants:  NA.  (And this is where learning how to do all that callback programming in javascript comes in handy, as I define a callback function for the <code>lappy</code> method inline and don&#8217;t get worked up about it anymore.)</p>
<p><pre class="brush: r;">
  ## first, make NULL into NA
  intermediate &lt;- lapply(alldata[varnames],function(l){
    nullmask &lt;- unlist(lapply(l, is.null))
    l[nullmask] &lt;- NA
    l
  })
  ## then do the unlisting
  data.matrix &lt;- matrix(unlist(intermediate),byrow=FALSE,ncol=length(varnames))
</pre></p>
<p>Most of the time the simple way worked fine, but it required special handling when I slapped the timeseries column back onto my data. What I ended up having to do (when I was just hacking code that worked (TM)) was to drop timestamps for which all of the rows of data I was interested in were all NULL.  And yes, the logic was as tortured as the syntax of that sentence.  </p>
<p>But every once in a while the data would be out of sync, because sometimes there would be different numbers of NULL values in the variables I was extracting (for example, the mean would be fine, but one of the correlation coefficients would be undefined).  In those cases the loop would either work and be wrong (if the odd numbers of NULL data was perfectly aliased with the length of varnames), or else it would crash and get noted by my error handler.  </p>
<p>With the new explicit loop to convert NULL to NA, the loading function works fine, with no more <code>try-error</code>s returned from my <code>try</code> call. And even better, I no longer have to lie awake nights wondering whether some data was just perfectly aliased with missing values so that it slipped through.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/530/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/530/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/530/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/530/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/530/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/530/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/530/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/530/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/530/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/530/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/530/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/530/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/530/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/530/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=530&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2011/11/09/when-r-and-json-fight/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
		<item>
		<title>Slacking on the Couch</title>
		<link>http://contourline.wordpress.com/2011/11/04/slacking-on-the-couch/</link>
		<comments>http://contourline.wordpress.com/2011/11/04/slacking-on-the-couch/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 22:18:32 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[couchdb]]></category>
		<category><![CDATA[slackware]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=525</guid>
		<description><![CDATA[I run Slackware. I also use CouchDB. Seems like a natural fit, but the slackbuild on SlackBuilds.org is stuck at 0.11. That&#8217;s okay, it is a good script and works well with the latest version. However, I don&#8217;t want to run the latest release of CouchDB, I want to run 1.2.x from the git repository, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=525&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I run Slackware. I also use CouchDB. Seems like a natural fit, but the slackbuild on SlackBuilds.org is stuck at 0.11.</p>
<p>That&#8217;s okay, it is a good script and works well with the latest version. However, I don&#8217;t want to run the latest release of CouchDB, I want to run 1.2.x from the git repository, because I really like the new replication engine for my work.</p>
<p>So, I had to do some tinkering with the SlackBuild script. <span id="more-525"></span> First go to SlackBuilds and get the slackbuild script for CouchDB and its dependencies. Okay, now I&#8217;m seeing that this is perhaps more involved that I thought when I started writing this!</p>
<p>Most of the dependencies are slightly out of date on SlackBuilds. I&#8217;d help out, but my slackbuild-fu is zilcho. That&#8217;s okay, so is yours or you wouldn&#8217;t still be reading, and it&#8217;s easy to get around this stuff.</p>
<p>First, Erlang. Get the slackbuild for Erlang, then click on the homepage URL and download the latest source and man page (14B04 at the time of htis writing). Unzip the slackbuild, move the source and man packages into the directory, and change into the directory</p>
<p><pre class="brush: bash;">
tar xvf erlang-otp.tar.gz
mv otp_src_R14B04.tar.gz erlang-otp/.
mv otp_doc_man_R14B04.tar.gz erlang-otp/.
cd erlang-otp
</pre></p>
<p>Then fire up your favorite editor and edit the erlang-otp.SlackBuild so that you can use the new version. Change the line:</p>
<pre>   VERSION=14B02</pre>
<p>to read</p>
<pre>   VERSION=${VERSION:-14B02}</pre>
<p>This change lets you specify the version on the command line. Then run the script with the incantation:</p>
<p><pre class="brush: bash;">
sudo VERSION=14B04 ./erlang-otp.SlackBuild
</pre></p>
<p>Do similar stuff to install js and icu4c:</p>
<ul>
<li>The js version on <a href="http://slackbuilds.org/repository/13.37/network/js/">SlackBuilds</a> is 1.8.5-rc1, but you should install js-1.8.5-1 from <a href="http://ftp.mozilla.org/pub/mozilla.org/js/?C=M;O=D">mozilla</a>.</li>
<li>The icu4c from <a href="http://slackbuilds.org/repository/13.37/libraries/icu4c/">SlackBuilds</a> should be upgraded to the latest from <a href="http://site.icu-project.org/download">icu4c</a></li>
</ul>
<p>When these packages are built and installed (or upgraded), then it is time to turn to CouchDB.</p>
<p>As with the other packages, get the SlackBuild, but don&#8217;t bother getting the source this time.  Instead go to github and clone it from there.</p>
<p>
tar xvf couchdb.tar.gz
cd couchdb
</p>
<p>Then clone the source from github and make a local checkout that mirrors the upstream 1.2.x branch.  (Never build from trunk or master or whatever unless you are testing stuff).</p>
<p><pre class="brush: bash;">
git clone -o github https://github.com/apache/couchdb.git source
cd source
git checkout -b github/1.2.x
</pre></p>
<p>Then modify the slackbuild to use this as the source of the build, rather than some tarball.<br />
But in doing that, I found that I needed to add a hard checkout as well, because over time older stuff (prior builds, messing around, you know, stuff you do with git-based code) builds up and can break the build (I fought with this today 6 times before I wised up!).  </p>
<p>The edits need to go in a few places, as follows.  First tweak the version as before.  Set the default to 1.2.x, but make it so you can set it on the command line too (1.3 is coming soon, I&#8217;m sure).</p>
<p><pre class="brush: bash;">
PRGNAM=couchdb
VERSION=${VERSION:-1.2.x}
BUILD=${BUILD:-1}
TAG=${TAG:-_SBo}

PKGSRC=apache-couchdb
</pre></p>
<p>Then skip down over the super-useful  architecture and bail out stuff, until you see a line all by itself saying <code>set -e</code>.  Instead of unzipping a tgz file, I deleted that line and instead use rsync to copy the git repository.  Yes I could use git, or I could use plain old copy, but I&#8217;m using rsync.  Also, I really want to keep the .git stuff, because CouchDB&#8217;s build script checks if it is being build from a git repository and appends the git revision hash to the couch version, which is pretty cool.</p>
<p><pre class="brush: bash;">
set -e

rm -rf $PKG
mkdir -p $TMP $PKG $OUTPUT
rm -rf $TMP/$PKGSRC-$VERSION
cd $TMP
rm -rf $PRGNAM-$VERSION
mkdir $PRGNAM-$VERSION
cd $PRGNAM-$VERSION
rsync -av  $CWD/source/. .
</pre></p>
<p>Then it is really important to add the following two lines.</p>
<p><pre class="brush: bash;">
git reset --hard
./bootstrap
</pre></p>
<p>The git reset &#8211;hard &#8220;Resets the index and working tree. Any changes to tracked files in the working tree since  are discarded.&#8221;  In other words, you can rest assured the source is in a clean state and all your trials and errors are dropped.</p>
<p>The bootstrap command is how CouchDB builds up its configure command and other stuff it needs.</p>
<p>One final change to the stock slackbuild is to modify the configure command, as follows:</p>
<p><pre class="brush: bash;">
CFLAGS=&quot;$SLKCFLAGS&quot; \
CXXFLAGS=&quot;$SLKCFLAGS&quot; \
  ./configure \
  --prefix=/usr \
  --sysconfdir=/etc \
  --mandir=/usr/man \
  --localstatedir=/var \
  --libdir=/usr/lib$LIBDIRSUFFIX \
  --build=$ARCH-slackware-linux
</pre></p>
<p>All I did was delete the erlang and js options, as they aren&#8217;t necessary in the latest version of CouchDB, at least not on my machine.</p>
<p>Then run the slackbuild and install</p>
<p><pre class="brush: bash;">
sudo ./couchdb.SlackBuild
sudo /sbin/upgradepkg --install-new /tmp/couchdb-1.2.x-x86_64-1_SBo.tgz
</pre></p>
<p>After that, make sure the file permissions and ownerships are correct.<br />
<pre class="brush: bash;">
sudo chmod 0770 /var/lib/couchdb 
sudo chmod 0770 /var/run/couchdb 
sudo chmod 0770 /var/log/couchdb 
sudo chmod 0770 /etc/couchdb
sudo chown couchdb:couchdb /etc/couchdb -R 
sudo chown couchdb:couchdb /var/run/couchdb -R 
sudo chown couchdb:couchdb /var/log/couchdb -R 
sudo chown couchdb:couchdb /etc/couchdb -R
</pre></p>
<p>Then make sure to inspect /etc/couchdb/default.ini.new and copy over changes to /etc/couchdb/default.ini, and you should be good to go.</p>
<p>Slacking on the couch.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/525/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/525/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/525/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=525&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2011/11/04/slacking-on-the-couch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
		<item>
		<title>super useful page for html escape codes</title>
		<link>http://contourline.wordpress.com/2011/10/27/super-useful-page-for-html-escape-codes/</link>
		<comments>http://contourline.wordpress.com/2011/10/27/super-useful-page-for-html-escape-codes/#comments</comments>
		<pubDate>Thu, 27 Oct 2011 20:06:58 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=519</guid>
		<description><![CDATA[CouchDB wants its fancy startkey and endkey values properly escaped. So that means I have to look up &#8216;[' and ']&#8216; and so on for their hexadecimal equivalents. I usually turn to this super useful page, even though it is way down on the search results. The others look like spam websites. So, tune your [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=519&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>CouchDB wants its fancy startkey and endkey values properly escaped.  So that means I have to look up &#8216;[' and ']&#8216; and so on for their hexadecimal equivalents.  I usually turn to this super useful page, even though it is way down on the search results.  The others look like spam websites.  </p>
<p>So, tune your linkages to <a href="http://web.cs.mun.ca/~michael/c/ascii-table.html">http://web.cs.mun.ca/~michael/c/ascii-table.html</a></p>
<p>Update: or as that anonymous comment says below, man 7 ascii</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/519/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/519/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/519/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=519&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2011/10/27/super-useful-page-for-html-escape-codes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
		<item>
		<title>keys() is an Object method</title>
		<link>http://contourline.wordpress.com/2011/10/27/keys-is-an-object-method/</link>
		<comments>http://contourline.wordpress.com/2011/10/27/keys-is-an-object-method/#comments</comments>
		<pubDate>Thu, 27 Oct 2011 17:24:26 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=513</guid>
		<description><![CDATA[For a little while I didn&#8217;t really *get* the difference between underscore and async in node.js. Yesterday I wrote up some code to copy some data out of PostgreSQL and into CouchDB. At some point, I have a big object whose keys are the document IDs in my CouchDB database, and I needed to fire [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=513&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>For a little while I didn&#8217;t really *get* the difference between <a href="http://documentcloud.github.com/underscore/">underscore</a> and <a href="https://github.com/caolan/async">async</a> in node.js.</p>
<p>Yesterday I wrote up some code to copy some data out of PostgreSQL and into CouchDB.  At some point, I have a big object whose keys are the document IDs in my CouchDB database, and I needed to fire up <a href="https://github.com/mikeal/request">request</a> to update each document in turn.  Because I&#8217;m lazy, I usually use <code>_.keys(object)</code> to get an object&#8217;s keys  so my server-side and client-side javascript follow the same conventions.  To apply a function to the object key value pairs, I would normally use <code>_.each(object,function(value,key){...})</code>, but in this case, where I want a little more control over how many simultaneous GETs then PUTs I fire off at CouchDB, underscore&#8217;s each is a little awkward to use.</p>
<p>In the past I&#8217;ve hacked up self-made limiters, but as I use async more I&#8217;ve been learning about useful ways to combine its functions.  In this case, I made an async.whilst loop that <code>splice</code>s out 30 or so ids, then uses <code>async.forEach()</code> to fire off request operations for each of these document ids.  The request operations themselves are nested&#8212;I usually try to use pipe whenever I can (pipe the get into the put), but I haven&#8217;t yet tested what happens when you modify the document in between the get and the put.  </p>
<p>In short, my current approach is that when I want simple iterators I use underscore, but if there is a whiff of blocking in the call, I will use async instead.  As I follow this convention, it begins to get more and more useful.  In underscore, the function just runs.  If it is something like a request call that will return right away and go do something asynchronously, then I have to program my own solution to figuring out when that call is done.  In contrast, async makes liberal use of callbacks.  <code>async.forEach()</code> will also fire off lots of simultaneous request objects, but it passes its own callback function to each one of them, and I can trigger them all in the final callback in my request invocation.  Very handy.  And then async has a third optional argument that is a function to execute when all of the parallel <code>forEach</code> calls are done.  Again, very handy, and much cleaner than hacking up my own solution.</p>
<p>Which brings me back to my title.  Because I&#8217;m not using underscore in this case, I suddenly didn&#8217;t want to use <code>_.keys(object)</code> to get the list of keys. Naively I tried <code>object.keys()</code>, but that is an error.  The proper semantics is <code>Object.keys(object)</code>, and I learned something super basic at the same time that I am settling far more complicated usage patterns.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/513/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/513/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/513/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=513&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2011/10/27/keys-is-an-object-method/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
		<item>
		<title>Overcoming shy programmer syndrome</title>
		<link>http://contourline.wordpress.com/2011/09/10/overcoming-shy-programmer-syndrome/</link>
		<comments>http://contourline.wordpress.com/2011/09/10/overcoming-shy-programmer-syndrome/#comments</comments>
		<pubDate>Sun, 11 Sep 2011 06:33:11 +0000</pubDate>
		<dc:creator>jmarca</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://contourline.wordpress.com/?p=499</guid>
		<description><![CDATA[I write a lot of programs, but I never publish them for others to use. Now with git and github, there aren&#8217;t any more real excuses. Because I have been documenting like mad and cleaning up code, I am also taking the opportunity to push up working packages to github. So far I&#8217;ve pushed up [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=499&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I write a lot of programs, but I never publish them for others to use.  Now with git and github, there aren&#8217;t any more real excuses. </p>
<p>Because I have been documenting like mad and cleaning up code, I am also taking the opportunity to push up working packages to github.  So far I&#8217;ve pushed up two node.js utilities I am using.  One is called   <a href="https://github.com/jmarca/makedir">makedir</a>, and the other is called <a href="https://github.com/jmarca/cas_validate">cas_validate</a><br />
<span id="more-499"></span></p>
<p>I previously wrote about an <a href="http://contourline.wordpress.com/2011/09/01/more-progress-figuring-out-asynchronous-programming-in-node-js/" title="More progress figuring out asynchronous programming in node.js">earlier version of makedir</a>, but I am no longer using regular expressions, but rather am using the node path library.</p>
<p>I also previously wrote up that I forgot that I had written <a href="http://contourline.wordpress.com/2011/08/26/short-term-memory-loss/" title="Short term memory loss">cas_validate</a>.  </p>
<p>I almost didn&#8217;t publish cas_validate because it doesn&#8217;t have any tests.  But it works for me, so that is one sort of test.  For makedir I just copied the test approach used by a number of other node.js authors whose packages I use a lot.  But cas_validate requires a bit more effort, because it has to properly handle one-time keys and post messages and so on.</p>
<p>Still, I am tired of not pushing stuff out because they aren&#8217;t quite polished.  The core algorithms are there, and sometimes pretty cool.  And putting things out there will perhaps inspire me to do things like document and write tests more often.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/contourline.wordpress.com/499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/contourline.wordpress.com/499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/contourline.wordpress.com/499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/contourline.wordpress.com/499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/contourline.wordpress.com/499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/contourline.wordpress.com/499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/contourline.wordpress.com/499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/contourline.wordpress.com/499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/contourline.wordpress.com/499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/contourline.wordpress.com/499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/contourline.wordpress.com/499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/contourline.wordpress.com/499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/contourline.wordpress.com/499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/contourline.wordpress.com/499/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=contourline.wordpress.com&amp;blog=718724&amp;post=499&amp;subd=contourline&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://contourline.wordpress.com/2011/09/10/overcoming-shy-programmer-syndrome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0304452093efd125e7dac9cab0d0be5e?s=96&#38;d=monsterid" medium="image">
			<media:title type="html">jmarca</media:title>
		</media:content>
	</item>
	</channel>
</rss>
