CouchDB Bulk Load Performance

Yesterday I wrote a very long (sorry!) post about my technique for bulk loading data into CouchDB. I didn't want to make that post any longer, so today I'm going to talk about how well the bulk loading performs. All of these numbers come from my machine which is a Core2 Quad Q9000 @ 2Ghz with 8GB RAM and a single 7200 RPM drive.

As you may recall, the main problem I had was getting too much stuff in memory and having everything blow out with an out of memory exception. The solution I posted yesterday keeps memory under control and seems to keep all four processors busy. Here's a screenshot of task manager while it is running.

Armed with a stopwatch and the CouchDB web interface, I did some crude timings.  I may go back at some point and wrap my loader with some timing code so that I can generate some minute-by-minute graphs, but this will do for now.

Elapsed Time (minutes)Database Size (gigabytes)Document Count
1.3319,000
2.749,000
31.089,000
41.4134,000
51.8181,000
103.6503,000
155.1843,000
206.71,201,000
257.91,473,000
309.11,759,000

I was really impressed with the numbers! After 30 minutes, I was averaging 977 documents/second and and 5.2 megabytes/second. Keep in mind this is all running local on my machine, but the numbers sure are encouraging.

No comments yet. Be the first.

Leave a reply