Recently, in discussion of a design for a comments system, I noted that I wasn’t planning to use a database, and I even allowed my self a little fun sneering at the idea. I got several reasonable-sounding emails from reasonable-sounding people saying “Why on earth wouldn’t you?” Here’s why.
Current Setup ·
ongoing actually uses a database right now, MySQL to
be precise. But not at run-time; it only gets involved when I hit
the Publish button. And I don’t really
need it; it just lets me push some work down into the SQL engine that I’d
otherwise have to write a few lines of code for, as in the fragments on the
front page:
SELECT uri FROM Essays ORDER BY updated DESC LIMIT
$maxOnHomePage;
Also in randomizing the picture:
SELECT essay, dir, src, alt, UNIX_TIMESTAMP(potd) FROM Pix ORDER BY
RAND();
But at run-time, all the data you’re seeing comes out of static files, pulled in
either with img
or link
, and a bit via
XMLHttpRequest
.
This is good. It means that the little Athlon that runs
ongoing (and some other sites) can serve 130G/month,
200K transactions/day, and can stand up to a Slashdotting without breathing
hard.
Comment Storage · I may yet use a database for comments; but not at the runtime end. Which is to say, the comments themselves will live in flat files, and the per-article comment counts will too, the latter being pulled in by XMLHttpRequest in the obvious way.
I suspect that it’ll be handy to have a database to look things up by IP
address as part of the spam-control effort. But really, I won’t do that
unless it saves work. You could perfectly easily do per-IP-address data lookup in
a little directory tree where the data for 12.110.110.204
lives
in a file named ip-info/12/110/110/204
.
Yeah, it might not run as fast as a MySQL lookup, and I’d bet it definitely
wouldn’t be as fast as a BerkeleyDB lookup, but it’ll be pretty damn fast, and
it won’t interact toxically with anything else the database might be
doing.
On the other hand, it’ll probably be less work to just make a database table,
and “less work” trumps “runs fast” except when it doesn’t.
Why? · I like the semantics of the Unix filesystem, and I also really like the fact that whether you’re talking ufs, ext3, zfs, or whatever, this is some of the world’s most thoroughly-debugged and battle-hardened code. Also, most modern operating systems are really quite clever at noticing when part of the filesystem is getting hammered and caching the whole thing in memory, so you may never go near a disk.
I’m not religious—I’ve deployed runtime databases where I’ve had to. But there is a psychology out there in our profession,which says: if you have data that you want to store and retrieve, that means you need a database. But sometimes you don’t. And sometimes you come out ahead on one or both of the less-work and runs-fast metrics by not having one.