There’s been a flurry of debate over in the PEAW mailing list about how to deal with broken feeds. Simultaneously, Aaron Swartz asserts Postel’s Law Has No Exceptions. Herewith a bit of back-fill on the relevant history and tribal knowledge, an excursus into Athenian jurisprudence, and opinions on what PEAW should do.
Draconianism · In the Annotated XML 1.0, I wrote:
Dracon (c.659-c.601 B.C.E.) introduced the first written legislation to Athens. His code was consistent in that it decreed the death penalty for crimes both low and high. Similarly, a conforming XML processor must "not continue normal processing" once it detects a fatal error. Phrases used to amplify this wording have included "halt and catch fire", "barf", "flush the document down the toilet", and "penalize innocent end-users".
The rest of that note provides a useful introduction to the issue.
[Excursus: I have since learned that the assertion above about Dracon is pretty flimsy, since Dracon is only barely historical. He and Solon feature as the Givers of Law in the traditions of Athenian history, but only very small fragments survive (having to do with involuntary homicide) which can plausibly be attributed to Dracon, and they’re not very Draconian.]
As the note says, this issue provoked what is probably the single most intense technical debate of my professional career. Enthusiasts can relive it, via several hundred emails to be found in the April and May 1997 archives of what was then known as the W3C SGML Working Group. But don’t try to read it unless you have a couple of hours to spare.
Since I was arguably the leader of the “Draconian” forces in that debate I’m hardly objective, but I think that XML’s cleanly-defined error-handling has been a net positive. Of course, we can never re-run history to find out for sure.
I will say, though, that it’s become awfully damn easy to test an allegedly-XML document for well-formedness: try to open it in either IE or Mozilla, and you’ll know right away. Anyone who finds this too much effort deserves little sympathy.
What Should PEAW Do? ·
The range of applications where PEAW will be put to work is
pretty wide, and different apps will have different error-handling
requirements.
If, for example, I’m reading
one of my favorite blogs, and the
aggregator turfs an entry because the (required)
<modified>
is missing, I’m going to be
irritated.
On the other hand, when I’m reading a feed describing my credit-card
transactions, if a charge comes through without a date-stamp I want the
aggregator to scream loudly and let me know; something here is gravely amiss,
either with the credit card, the bank, or the software.
So if I were writing the spec, I’d do as XML does and divide the errors into two classes, fatal and non-fatal. I’d use SHOULD to encourage agents to report even non-fatal errors in the interests of the system working better, and I’d allow aggregators to turf entries on the basis of non-fatal errors, because this will be a requirement in some applications.
So What’s a Fatal Error? · This hasn’t been discussed that much, but it’s not obvious which, if any, violations of the semantics or structure of a feed should constitute fatal errors, i.e. those where the client software is required to stop trying to work with the data.
However, I would absolutely require basic XML well-formedness. Here’s why:
If you require well-formedness, you require basically sane Unicode handling, which opens the gates of syndication to the vast majority of people in the world who don’t live in ASCII. I can’t emphasize this enough: if you can count on well-formed XML, you are empowered to handle the languages of the world. If you try to work around what look like illegal characters, you guarantee huge amounts of irritation getting internationalized later.
As regards everything but the content, it’s just not very hard
to create well-formed XML: escape <
and &
and '
and "
and >
and you’re done.
Except for that Unicode stuff; you need to know what encoding your
data is in, so that for example when you see a Euro sign (€) you know
enough to emit €
, not some Microsoft Code Page byte
that’s
guaranteed not to work on lots of browsers.
This can be tricky.
But the alternative is, you’re a parochial bigot.
As regards your content, you need to know whether
or not you can guarantee that it’s well-formed.
If you can’t, PEAW provides the mode="escaped"
hatch.
If your software can’t manage to escape five special characters and fill in end-tags and quote attributes, it’s failing to meet such a very low barrier to entry that it’s probably pretty lame anyhow. And if developers are not willing to put in the effort to enable the non-white people of the world to use their software, I don’t think PEAW should condone or reward them.
The transition from RSS to PEAW is a line in the sand. Granted that the RSS legacy necessarily required the use of liberal parsers, but hey, that was then, we have better tools now. I just find it really hard to believe that someone sitting down to write a PEAW generator in A.D. 2003 can’t manage to generate well-formed XML, with content-escaping if (sigh) necessary.