This is the permanent status page for Genx (tarball · docs). Genx is a library, written in the C language, for generating XML. Its goals are high performance, a simple and intuitive API, and output that is guaranteed to be well-formed; the output is also guaranteed to be Canonical XML, suitable for use with digital-signature technology. There is a Python wrapper. Genx comes with a GPL-Compatible but non-viral Open-Source license. Latest news: In production, carrying hundreds of thousands of subtitles per day; thinking of taking off the “beta” stamp.
2004/07/25 · Simon Steele writes to say he’s using Genx in both his Programmers Notepad Open-Source project, and in his day job at Softel in their messaging stack, where the systems, he says, run hundreds of thousands of XML messages per day: “The systems using these stacks provide subtitle (captioning) transmission and also interactive television content transmission for major and minor broadcasters around the world. The messages carry both automation control commands and also live caption data.”
And, having applied that kind of stress, he found, and supplied a patch
for, a memory leak; carelessness in
genxDispose
.
The tarball’s refreshed, and I added a make
target, make
cpp
, which makes sure that g++
can compile without
warnings.
So I think we’re close. If someone sends me one more story about Genx being in real production somewhere, I’ll take the “beta” stamp off and announce the rewards-for-bugs program.
2004/06/13: Ruby Wrapper · Garrett Rooney (clever home page) writes:
I figured since you're listing the Python bindings for genx you might be interested in GenX4r, my Ruby wrapper for it. I just finished the first cut at it over the weekend, and it's not nearly finished (still need to test it on versions of Ruby other than 1.8.1, still need more tests, total lack of documentation, etc), but it's far enough along for other people to start looking at it, so I figured I'd drop you a line about it.
You can get a copy of version 0.01 at:
http://electricjellyfish.net/garrett/genx4r/
Or grab the current dev version from my Subversion repository [write Garrett if you want that address].
2004/05/30: Python Wrapper Docs · Mick Twomey cooked up a very nice manual for Pygenx, his Python wrapper for Genx.
2004/05/07: 25 Years Is Not Enough ·
And that’s how long I’ve been programming in C.
But somehow it escaped my attention that if you say
typedef utf8 unsigned char *
, and then later you say
foo(const utf8 bar)
, then the const
applies to the
whole typedef, so the optimizer thinks you’re passing a pointer that can’t
change, so it optimizes away all sorts of code... sigh.
Getting this right is going to change some of the API declarations, sorry.
Thanks to Mick Twomey, Evan Jones, and Artem Khodush for bashing away on this.
Fixed, changed the rev to “beta6”. As a result, the casting is actually cleaner and clearer than it was, not to mention correct. A lot of declarations have changed; once again, sorry.
2004/05/07: Optimization Blowup ·
Mick Twomey reports, and I have verified, severe breakage in Genx
exposed by using the gcc
optimizer. On Debian, you have to go
to -O3
to see it, while on OS X any -O will do it.
The symptom is violently incorrect behavior in low-level routines like
genxCheckText()
, you can step through it and watch it go
hurtling off the rails.
It’s not hard to reproduce, just run tgx
.
The last time I encountered severe breakage in a C compiler I was so
surprised I fell over a dinosaur... my default assumption is that my code is
hosed in some subtle way. The first suspect would be memory management, but
diligent application of libdmalloc
reveals nothing.
I’m poking around and Mick’s gonna visit Windows-land.
I suppose I could look at the generated assembler code, except for trying
to relate the output of -O
back to source level is, uh, well,
I’d rather not, and then I can’t remember x86 assembler and I’ve never had
the vaguest insight into PowerPC architecture.
Sigh. I’d call this serious suckage.
2004/05/05: Pygenx Refreshed · Mick has done a refresh. Python + Genx = goodness, if y’ask me.
2004/04/25: Beta 4 · First of all, I think I’m more or less finished. I totally don’t plan to change the API any more, and I’d be pretty surprised if there is any more refactoring required to get correct performance with good efficiency. I’m now thinking about taking the “beta” out of the version and declaring 1.0.
This one involved a lot of work. Elliotte Rusty Harold sent me the Canonicalization Test Suite, consisting of ten appallingly twisted, perverted, evil sample documents he dreamed up himself, plus the standard XML Conformance Test Suite with a few hundred canonicalizable examples done. This is a major contribution: muchas gracias!
To make a long story short, Genx now passes all of ERH’s sicko tests, plus 516 of the XML Conformance tests. By “pass”, I mean that when you use David Tolpin’s XG program to read them in with Expat and write them out with Genx, you get a correctly canonicalized version. This doesn’t work for any of the tests which use a validating parser and have it normalize attribute values or the like, because Expat doesn’t do those things.
The API has changed, slightly; there are restrictions as
to when you can use the
genxAddNamespace
call.
There’s a new section in the docs,
Performance, that talks
about that, and about how to exercise good practice and be rewarded with good
performance.
This was really only made difficult by my being determined that the common use-cases (where you don’t change prefixes for a namespace, or undeclare the default namespace) should run really fast, while still being correct if you insist on bad practices.
The distribution includes a snapshot of Tolpin’s XG which you probably shouldn’t use because I hacked it pretty brutally in a couple of places to get it working with the C14n tests, and he will probably want to think about these and get it properly checked into his CVS and so on.
I didn’t include the C14n test suite, though, because it’s pretty big and I don’t think it’s interesting unless you’re debugging Genx.
Beta 4 also has a fix for a buglet turned up by Ed Davies.
2004/04/15: Refactoring · Making Genx do arbitrary namespace-twiddling (switching prefixes in midstream, declaring and undeclaring default namespaces) has proved to require major refactoring, which is going slow since I’m kind of busy at Sun.
I’m willing to live with the added complexity, first of all, because (with some refactoring) the original Genx methods, where you just declare what you need and use it by reference, will run as fast as they did previously. And, being able to run the whole C14n test suite will be, I think, a very powerful QA tool; one that would take me months of work to equal.
2004/03/17: Reworking Namespaces · Elliotte Rusty Harold sent me a copy of the C14n test suite and that, plus stern lectures from David Tolpin, has forced me into a rewrite of the namespace handling in Genx. This is only halfway done, but there are enough bugfixes and useful improvements to do a snapshot.
You can now declare any namespace to have an empty prefix
(""
) so that it is the default namespace when in effect.
Also, you can redeclare a namespace from one prefix to another as long as
you’re not in the scope of a declaration. (This restriction will be
lifted soon).
There are new calls
genxAddNamespace
for explicitly inserting a namespace declaration, and
genxUnsetDefaultNamespace
for
removing the current default.
Finally, there is
genxGetNamespacePrefix
,
which is useful for creating QNames-in-content where Genx generated
the prefix.
This release also includes some bug-fixes from
Martin Kenny, including a horrible
segfault that I’m astounded that neither I nor Purify spotted, and a more or
less complete failure of genxScrubText
to work.
Which shows that the test suite was busted, and Martin even provided additions
to it to fix that too.
Thanks Martin!
2004/03/08: Tolpin’s XG · David Tolpin has cooked up XG; I quote:
A simple glue for Expat and GenX. It reads XML, parses internal DTD, resolves system entities, defaults attributes (everything what Expat does) and then calls Genx to write canonical XML.
Something like cpp for XML.
The idea is once a preprocessor exists, it becomes possible to write powerful tools with simple XML parsers inside.
It’s not obvious to me whether this should be rolled into the Genx distribution or hosted at David’s site. David has already made some changes since he sent me a snapshot, which I’ve posted for the curious here even though, as I said, it’s a moving target.
I think the right thing to do down the road is to get Genx a nice public CVS’ed home at SourceForge or equivalent so that multiple people can maintain pieces.
By the way, for connaisseurs of code, have a peek inside
xg.c
; David’s style is extremely, well, extreme.
Not at all like mine, much more fun to look at.
2004/03/01: Python Wrapper · Michael Twomey writes to say he’s blogged about his Python wrapper for Genx. He writes It's probably also a good example of someone eschewing the norm and picking about every slight variation of a software project's standard tools (I'm using GNU arch to manage the source, scons to build it and as I mentioned it's pyrex based).
2004/02/29: SVG Pretties · Anthony J. Starks, who suggested the name Genx, wrote a demo program that generates cute SVG graphics:
Check out the Anthony’s “Observations” at the end of his piece for some interesting findings on file sizes.
2004/02/25: Interface Change Exposes UTF-8 Decoder ·
The behavior of genxScrubText
is not quite what Martin needs;
he’d prefer to replace malformed UTF-8 bytes with ?
rather than
just suppressing them.
Which seems reasonable, but perhaps not what everyone wants.
He also suggested a good solution: expose the internal routine that Genx uses to unpack and validate UTF-8; then you can write your own variant text-scrubber.
So there’s a new utility routine
genxNextUnicodeChar
.
Interestingly, this is the third time I’ve written this routine; once for a
private project in 1996, once for Antarctica in ’99 or so, and again now.
I hope it’s the last.
Plus, fixed some docs bugs that Joe Gregorio turned up.
Since I changed the interface, I bumped the version to
beta2
.
2004/02/25: C++ ·
Dialogue with and patches from Martin
Kenny on the subject of making Genx a good citizen of the C++
world.
There were a few problems, most notably the facts that namespace
is a reserved word, and of course C++’s refusal to believe that a const
char *
and unsigned char *
and char *
are
all the same thing for the purposes of strcpy
and friends.
After some quite tedious effort, the Genx code can be compiled
cleanly as
C++ and genx.h
can be included by C++ programs.
On OS X, g++
compiles the test driver with hundreds of warnings
but the tests run fine; on Windows, the warnings are errors and the test
driver has to be compiled as C; but the tests are still fine.
2004/02/24: Bugfix ·
Martin Kenny turned up a horribly
broken assumption which causes an infinite
loop in genxScrubText
, and also reveals an inexcusable hole in
the test suite. Fixed.
2004/02/24: More Windows ·
Robert Sayre writes to point out that the project files in the
Windows
directory only work with Visual Studio.NET 2003, not
with the 2002 or any earlier versions.
Simon Steele, who did the Windows port, reports that building Genx
can take several minutes if you opt for “Maximum Speed.”
A bit of back-and-forth revealed that the culprit was charProps
,
which has hundreds of calls to rangeProp
; adding a
__declspec(noinline)
to that function makes the problem go away,
if you care.
2004/02/23: Windows · Simon Steele got Genx running on Windows, and also used Fortify to track down another memory leak.
The latest tarball includes a Windows
directory with Visual
Studio.NET solution and project files
to build a lib and the tests project.
Haven’t reved the version though, I’ll only do that for caller-visible
changes.
Thanks Simon!
2004/02/22: Sketching Genh · I’m thinking about Genh, a very thin layer on top of genx that generates XHTML. No validity checking, but it tries to be intelligent about predeclaring the elements & attributes you’re sure to use and lazy about the others. It will of course use the namespace defaulting for the XHTML namespace, which is why it’s layered on top of Genx.
Along similar lines, I should cook up predeclarations of
xml:space
, xml:lang
, and xml:base
,
they could go in their own teeny little library but there’s a case they
should be right in Genx itself.
2004/02/22: beta1 Release ·
Given that the thing seems to more or less do what I say it does on
multiple systems, I’ve called this the beta1
release.
The changes include:
There’s a new genxGetVersion
function.
I looked at the FSF’s list of GPL-compatible licenses (thanks Mark) and noticed that the Expat license qualifies. If it’s good enough for James it’s good enough for me, so Genx now has that license.
Fixed the bugs that Garrett and Joe found, plus a nit reported by
Jeremy Dunck and one I found.
I don’t have Purify here, but it runs fine with -Wall
, which it
didn’t before.
Purify still complains about unclosed and unfreed things in the
test suite, I see no reason to fix them.
Introduced Garrett’s first steps towards making this Windows-friendly.
However since neither Garrett nor I know the Windows equivalent of the
srand()
and random()
calls in the
checkStress
test, and since Genx really shouldn’t be
relied on till it runs checkStress
, this awaits the
ministrations of a real expert before we assert that Genx does
Windows. (Hmm, but now I see Joe says it runs fine on XP?)
The major to-do before I rev again is to get it checked into CVS somewhere.
2004/02/19: News from the field: ·
Simon Willison reports a clean run on Red Hat 9.
Elliotte Rusty Harold and Mark Pilgrim pour cold water on the Apache 2.0 License idea; Mark points me to http://www.fsf.org/licenses/license-list.html.
Garrett Rooney runs Genx on Solaris 8,
then breaks out Purify, finds some uninitialized data and (wait, there’s
more) gets it going on Windows Visual Studio (minus the
checkStress
test)!
Joe Gregorio reports that he’s working on a Python wrapper, and finds
a const
-declaration boo-boo.
Thanks to one and all!
2004/02/17 · Posted the docs and mis-posted (sigh) the tarball.