In a recent essay I offered, given demand, to author some XML-writing software. There’s been quite a bit of feedback, and the consensus seems to be that the Java community is fairly well-served with XML writing software, but that this would be real useful at the C level. So that’ll be my coding fun for the month of February. The rest of this essay lists some of the Java options that people told me about, and introduces some issues around the C implementation.
From Java · Elliotte Rusty Harold pointed out XMLStreamWriter in StAX and also David Megginson’s XMLWriter (which isn’t being maintained, but it shouldn’t need much).
Henri Sivonen recommends GNU JAXP with some reservations about the accompanying GNU DOM package.
Rogers Cadenhead pointed to his own article on Elliote Rusty Harold’s XOM (hmm, which ERH didn’t).
And of course a couple of people recommended JDOM. The upshot is, it seems this community is well-served. But have a glance at what I propose for the C interface and see if the Java one covers all the bases.
From C · In the C domain, several people pointed to xmlwriter from libxml2 as being the best option. The trouble for a person generating a syndication feed is that xmlwriter has way more stuff than you need, including entry-points that for this kind of simple application would be actively harmful. Also, it doesn’t seem to guarantee well-formedness.
On the other hand, Daniel Veillard is a smart guy and the interface looks very sound, just too big. So I’ll use a small subset of calls with (I think) the same semantics but a different prefix.
Actually, I’ll add a semantic; if any call into this new library would cause the creation of non-well-formed XML, it will abort, return a special distinguished error code, and optionally raise an exception.
I think there will be two versions of the interface: one that uses
unsigned char *
and accepts only UTF-8, and another that uses
wchar_t
and accepts U+0000
-terminated arrays of
integer Unicode codepoints. Output will always be UTF-8.
Botched UTF-8, illegal XML characters, misnested tags, duplicate attributes
and all other artifacts of ill-formedness will be considered errors.
I think this should require exactly zero libraries aside from
stdio
and
whatever .h
you need to get wchar_t
declared.
Name? · Anyone who can think of a snappy name for a basic C-language XML generation library will earn my eternal gratitude.