There are two big problems with RSS that aren't going away and are just
going to have to be fixed to avoid a train-wreck, given the
way this thing is
taking off.
They are first, what can go in a <description>
, and
second, the issue of relative URIs.
(Warning: yet another incestuous self-referential post by a blogger
about blogging, of interest only to syndication geeks.)
(Substantially updated 11AM Pacific time)
Breakage ·
This essay tries to illustrate the problems it talks about.
In its RSS description, it tries to mention the
<description>
tag, with the angle brackets visible,
and it contains a relative reference to another ongoing article; one or both of
these may have failed in your aggregator.
Fixing <Description>
·
What provoked this was a complaint from
fellow-TAGger Norm Walsh that he
could see the HTML markup in the ongoing feed in his (linux-based) RSS aggregator.
Well, yeah, all the HTML is escaped because I went and looked at other
people's feeds (Udell and Pilgrim I believe) and copied the way they did it;
that's how the Web's supposed to work.
After Norm's complaint, I decided to (sigh) RTFM. The RSS2 spec, marvel of informality that it is, notes in passing that “(entity-encoded HTML is allowed)” with no words about what this might mean or how such HTML might be interpreted. This underspecification (inherited from many previous versions of RSS) leads to really stupid behavior even in good software:
<
rather than <
in some text,
I'm saying “please ignore the semantics of this character!”
That's what escaping is for.<description>
tag.”
Now it turns out that the ongoing generator is anal enough to do
double-escaping, which worked in at least one RSS reader, but there's
a word for this: stupid.These days, the
preferred
method for dealing with this seems to be
an <html:body>
element,
in which markup need
not be escaped.
This seems to work, but I don't see why RSS should make me do this.
Second, it seems like I'm lying, the text in the RSS entry isn't the body of
the ongoing essay, it's what <description>
seems to be
designed for (since many ongoing pieces are over a thousand words and studded
wiith pictures, there's no way I'm putting the whole thing in the RSS
for every RSS scraper to grab whether or not the user is interested.)
I'm not 100% sure what the right solution is, but either
<description>
should be totally plain text - no HTML
markup - or it should allow well-formed HTML markup; in which case it would
be OK for aggregators either to act on or ignore it.
Relative URI References ·
If I, in an ongoing essay, want to refer to another ongoing essay, the natural,
correct, robust, flexible, concise way to do this is with relative
reference. So I encode a link to my
Colophon as
<a href="/ongoing/misc/Colophon">Colophon</a>
, and
the browsers know how to deal with this and everything just works.
Also, it works identically both in production and on my staging site, which
isn't at www.tbray.org
.
Of course, if I want to copy the first paragraph of my essay into my RSS
feed, apparently I have to parse the hyperlink and make the reference
absolute, which as a side-effect makes it less portable, more fragile, and
longer.
There's a word for this: wrong.
When you have a chunk of markup that looks like this:
<item><title>Wrong</title>
<link>http://example.com/114</link>
<description>My <a href="/113">note yesterday</a> about RSS was
wrong.</description>
</item>
Then the only sane interpretation of /113
is as
http://example.com/113
.
RSS needs to say this, and software needs to implement it.
Not A Toy · Because, boys and girls, RSS is no longer a science experiment, it's becoming an important part of the infrastructure, which means that a lot of programmmers are going to get the assignment of generating and parsing it, and they need better instructions.