I cooked up a RelaxNG schema for Pie/Not-Echo or whatever you want to call it, in its 0.1 snapshot form. Which, as a side-effect, generates a W3C XML Schema. This note includes specific conmmentary on this schema, general commentary on schemas (summary: Why would you ever use XML Schema?), and some recommendations for pruning Pie/Not-Echo.
Pie.rnc
v0.1 ·
The schema is available at
http//www.tbray.org/ongoing/pie/0.1/pie.rnc;
as the snapshot versions advance, I’ll try to make sure there are
snapshot schemas under directories named by the version number; so the 0.2
schema will be in
http//www.tbray.org/ongoing/pie/0.2/pie.rnc
, and so on.
While I’ve fooled around with RelaxNG, this is my first attempt to take on something substantial from scratch. It’s perfectly possible that I’ve done this in a way that is stupid or wrong or suboptimal, I’d be delighted to get feedback and will incorporate to the extent possible. I’ve created a discussion page at the Wiki; feedback there, please.
Here are a list of points with reference to the existing schema, in no particular order:
The schema is written in RelaxNG’s
Compact Syntax
(tutorial
here);
thus the extension .rnc
; I’ll refer to it “the RNC”
from here on in.
Using James Clark’s wonderful
Trang tool, I have
generated a
W3C XML Schema; I’ll go on doing
this, and the XSD can always be found at the same place as the RelaxNG
source, with the .xsd
extension.
The XSD version doesn’t apply some of the same controls as the RNC
version; I’m not enough of an XSD expert to know whether XSD just
can’t do this stuff, or whether Trang doesn’t know how to generate
the XSD. In particular the XSD doesn’t do the selection magic based on
the mode=
and src=
attributes of
<content>
.
I’d welcome feedback on the quality of the XSD as well as the RNC.
I can’t get Trang to generate a DTD, because there are just too many things in the RNC that have no remote equivalent in DTD’s.
I changed the namespace, because
the snapshot uses
one based in example.com
, and it’s just not OK to use that
for anything but an example. So for the moment I’m using
http://www.intertwingly.net/wiki/pie/
.
This version of the schema forces the top-level version=
attribute to have the value 0.1
. Accepting 1.0
here would just be incorrect and dangerous.
I tried to follow the snapshot as closely as possible.
The elements inside <feed>
and <entry>
are
allowed to appear in any order.
I’m not sure this is cost-effective.
Since these things are usually going to be machine-generated, it might
be a good idea to lock down the order of the elements.
It might also be a good idea to force any foreign-namespace elements off into
a ghetto at the end of the parent element.
It would provide another level of sanity-checking and simplify the lives of
those who are doing quickie jobs with regular expressions or whatever.
For <content mode="xml">
(the default), the most common
contents will be XHTML.
So for the moment, there’s a rule that allows any mixture of elements in
the XHTML namespace, with any attributes at all.
This means that you have to have a topmost XHTML element (for example
<div>
or <span>
or <body>
immediately inside the <content>
element.
This will be useful anyhow because you have to have somewhere
to declare the XHTML namespace. Alternately, if you had declared a prefix
for the XHTML namespace higher-up in the feed, you could just plunge into
mixed XHTML content with all the elements prefixed.
If there’s demand for that scenario it would be easy enough to re-write
the schema.
But requiring a top-level element feels cleaner anyhow to me.
For this cut, I didn’t put in support for embedding other things like
the <ent:topic>
found in the example.
This is trivially easy to add later with RelaxNG, let’s get the base
language right first.
I used the
Jing tool to
validate
a slightly-modified version of the example in the snapshot (namespace name,
version, and so on).
I’m not planning to post the modified version, anyone who is close
enough to the problem to care is capable of grabbing Jing and fixing it up
themselves.
I will also intermittently create a Pie version of the ongoing feed at
http://www.tbray.org/ongoing/ongoing.pie;
the one there right now validates with pie.rnc
.
The RNC makes use of the XSD preclared datatypes anyURI
and
dateTime
, which are now built-in to Jing.
What Needs Fixing in Pie · The elements and attributes that are in the 0.1 snapshot are OK, except there are too many of them. The following need removal forthwith, simply because previous generations of syndication technology got by without them just fine, and we’re not here to invent stuff:
subtitle · Exactly what can we not do if we don’t have this? What prior art demonstrates its necessity?
weblog/homepage · The debate over in the Wiki had, I thought, some
crushing arguments in favor of just having a <web>
field
per-person; the extras are at best un-necessary and in some cases actively
harmful.
content* ·
Why do we ever need more than one <content>
element per
entry?
This has never been proved necessary in previous syndication formats, and now
is the wrong time to invent it.
We have the ability to embed XML in the <content>
element,
and XML provides many nice mechanisms for marking-up lists of things, so
anybody who really needs this functionality can work out the bugs in that
sandbox until we know what needs to go in at the Pie level.
<content src= ·
Content-by-reference is a bold new idea, and we don’t need bold new
ideas, we need to write down what already works.
Once again, <content>
can contain XML, and XML provides
excellent ways to insert hyperlinks to other things.
Work it out there and when you prove that you understand the issues, then
it’s a candidate for first-clss citizenship in the syndication format
itself.
...But the Glass is Half-Full · These gripes aside, the Pie format feels reasonably well-baked to me. All we have to do is lose the superfluous bits, find it a name, sort out a pure-HTTP API and derive XML-RPC and SOAP versions from that (let the market sort ’em out), figure out a neutral, long-lived home for the spec, and declare victory.
RelaxNG vs W3C XML Schemas ·
I invite people, even those who don’t think they’re schema weenies
(I for example am definitely not a schema guy) to have a look at
that RelaxNG compact-syntax schema.
It’s readable, it only took me two hours to get it working (that includes
downloading the Jing and Trang software, downlaoding and installing Java 1.4
from Apple, rebooting, and sorting out the usual CLASSPATH
hell).
It does some pretty magical things with the allowed content of
<content>
, based on attribute values.
It calls out to precooked definitions of dates and URIs, and it generates
XML Schema files for free.
I’d really like to see a best effort from an XML Schemas maven which duplicates the functionality of the RNC as closely as possible, as readably as possible; and maybe does some more things that the RNC can’t do.
Until I’ve seen that, my provisional conclusion is that XML Schemas are basically second-rate in terms of functionality and usability, and you can get them for free by starting with Relax NG.
So, why would you use anything else?