At some point in the transition to Debian Sarge, something broke in the
the ongoing
software. The perl code reads text using
an XML processor and various
pieces of it get stashed in a Mysql database.
Only somewhere along the line, non-ASCII UTF-8 characters were getting
trashed. I tried all sorts of stupid dodges, and was whining away at Sam Ruby
via instant messenger, and he said “of course, you could do it all as
seven-bit ASCII via 몾
... or you could rewrite it in
Ruby and It Would Be Much Better”. I shrieked “Get thee behind me foul
tempter!” and have now jammed everything into 7-bit ASCII as it comes out of
the XML parser, and of course all
the problems have gone away. Actually, the code got simpler, lots of
XML escaping/unescaping calls are no longer necessary. This is
one of the nice things about XML I guess, it allows you to be a good
internationalization citizen even when your software infrastructure isn’t.
It still feels evil.
Anyhow, the whole site’s been republished, let me know if anything’s busted.
(By the way, if you’re reading this in my RSS feed and all the entries show up
as new, switch to
the Atom feed and that problem will go
away, because Atom actually has unique IDs and datestamps that work.)
[Updated:
Tony Coates (interesting
new blog there, BTW) reports
that Opera 8.02 gets it backwards, which means that it’s one of the
rare pieces of software that respects guids in RSS, but that it’s doing Atom
1.0 wrong.]