A couple of month ago I was writing about a C coding project; that code is now wired into the 4.1 release of Visual Net, which comes out sometime early next year, and there’s an interesting optimization lesson or two buried in there.
As It Was · When I first built Visual Net (back in 1999) we wanted to be as un-threatening as possible, so we offered people who wanted to load data two alternatives: a tab-delimited file and an SQL database. This worked fine (it’s late 2003, and we’re still here) but wasn’t perfect. First of all, nobody wanted to talk to our database, they all wanted to use the file. Of course, tab-delimited files generally suck, and we had i18n problems; the right answer is clearly XML, so as of release 4.1, we’re dumping the SQL option and adding an XML input file option.
The big problem, though, was performance. Since we offered the SQL option, the way we worked was to always load all the data into the RDBMS (MySQL usually) and then extract it again to build the Visual Net structures. When the customer had more than five or ten million records, it would really take a lot of time to load and unload the database. You see, we had some of the fields indexed—for performance, of course—and, well, it ain’t pretty. The database that drove us around the bend had only a few million records, but lots of big long metadata fields, which really gave the database loader heartburn. It was taking over a day of elapsed time to do the Visual Net load.
VNB · This is the new Visual Net Builder, a few thousand of lines of C code that read the XML and construct the Visual Net data structures directly with no database, but relying on some really big in-memory data structures.
The good news? It now takes 29 minutes to load up that database that used to take over a day. The other good news? It doesn’t seem to use any more memory than the previous DBMS-assisted code; proof, once again, that intuitions about where the bottlenecks are going to be in complex systems are usually wrong.
The bad news? None, really, except for I had to put in a couple of weeks of C coding, which does feel kind of primitive these days. Oh, and when VNB is running, it pegs the CPU meter right over and leaves it there, you don’t want to be trying to use that computer for much else.
The Moral of the Story · Databases suck. Except when they don’t; ongoing’s own MySQL database cut the publish-one-essay lag down from the best part of a minute to under five seconds. But if I were willing to write a few thousand lines of C, I betcha it’d be a lot faster. Hey, I’m joking.