[This fragment is available in an audio version.]
What happened was, I needed a small improvement to Expat, probably the most widely-used XML parsing engine on the planet, so I coded it up and sent off a PR and it’s now in release 2.3.0. There’s nothing terribly interesting about the problem or the solution, but it certainly made me think about coding and tooling and so on. (Warning: Of zero interest to anyone who isn’t a professional programmer.)
Back story · As I mentioned last month, I took a little programming job partly as a favor to a friend, of writing a parser to transmute a huge number of antique IBM GML files into XML. It wasn’t terribly hard but there was quite a bit of input variation so I couldn’t be confident unless I checked that every single output file was proper XML (“well-formed”, we XML geeks say).
Fortunately there’s an Expat-based command-line tool called xmlwf
that can scan XML files for errors and produce
useful human-readable complaints, and it operates at obscene speed. So what I wanted to do was run my parser over a few hundred
GML files and then say, essentially, xmlwf *
in the output directory.
Which didn’t work because, until very recently, xmlwf
would just stop when it encountered the first
non-well-formed file. So I added a -k
option (“k” for “keep going”) so it could run over a thousand or so files
and helpfully complain about the two that were broken.
Lessons from the PR ·
Most important, I hadn’t realized how great the programming environment is inside Amazon. It’s all git, but there’s no need
for branches or PR’s. You make your changes, you commit
, you use the tooling to launch a code review, you argue, you make
more changes, you (probably) commit --amend
(unless you think multiple commits are more instructive for some reason),
and this repeats until everyone’s happy and you push into the CI/CD vortex.
Obviously other people might be working on the same stuff so you might have to do a git pull --rebase
and there
might be pain sorting out the results but that’s what they pay us for. (Right?)
Anyhow, you end up with a nice clean commit sequence in your codebase history and nobody ever has to think about branches or PR’s. (Obviously some larger tasks require branches but you’d be amazed how much you can live without them.)
Finding: Pull requests · Now that I’m out in the real world, it’s How Things Are Done. For good reasons. Doesn’t mean I have to like them. As evidence, I offer How to Rebase a Pull Request. Ewwww.
Finding: Coding tools · The last time I edited actual C code, nobody’d ever heard of Jetbrains and “VS Code” would have sounded like a mainframe thing. I found the back corner of my brain where those memories lived, shook it vigorously, and Emacs fell out. The thing I’m now using to type the text you’re now reading. Oh, yeah; that was then.
It worked fine. I mean, no autocomplete, but there was syntax coloring and indentation and whole cubic centimeters (probably) of brain cells woke up and remembered C. Dear reader, back in the day I wrote hundreds and hundreds of thousands of lines of the stuff, and I guess it doesn’t go away. In fact, the number of syntax errors was pretty well zero because the fingers just did the right thing.
Finding: The Mac as open-source platform · It’s not that great. Expat maintainer Sebastian Pipping quite properly drop-kicked my PR because it had coding-standards violations and a memory leak, revealed by the Travis CI setup. I lazily tried to avoid learning Travis and, with Sebastian’s help, figured out the shell incantations to run the CI. Only on the Mac they only sort of worked, and in particular Clang failed to spot the memory leak.
The best way to deal with this is probably to learn enough Docker (Docker Compose, probably) to make a fake Linux environment. I was well along the path to doing that when I realized I had a real Linux environment, namely tbray.org, the server sending you the HTML you are now reading.
(Except for it’s a Debian box that couldn’t do the
clang-format
coding-standards test but that’s OK, my Mac
could manage after I used homebrew to install
coreutils
and
moreutils
and gnu-sed
and various other
handsome ecosystem fragments.)
I mean, I got it to go. But if I do it again, I’ll definitely wrestle Docker to the ground first. Which is irritating; this stuff should Just Work on a Mac. Without having a Homebrew dance party.
C ·
Well, yeah. We shouldn’t diss it too much, basically every useful online service you interact with is running on it. But
after my -k
option was added, clang found a memory leak in xmlwf
. Which I tracked down and yeah, it
was real, but it had also been there before my changes. And it wouldn’t be a problem in normal circumstances, until it suddenly
was, and then you’d be unhappy. Which is why, in the fullness of time, most C should be replaced by Go (if you can tolerate
garbage-collection latency) and Rust (if you can’t). Won’t happen in my lifetime.
Anyhow · Thanks to Sebastian, who was polite in the face of my repeated out-of-practice cluelessness. And hey, if you need to syntax-check huge numbers of XML files, your life just got a little easier.
Comment feed for ongoing:
From: TimD (Mar 25 2021, at 10:54)
Get virtual box and stand up a real linux distribution, if you're going to do more of this. Nothing like using the real thing.
[link]
From: Thom Hickey (Mar 25 2021, at 15:00)
I suppose it's nice to have the functionality, but I would have done with an external script.
--Th
[link]