I just made a bunch of changes to the site here, which should make it run faster without visible effect. The details might be of interest to Web-tech and publishing-tech geeks. Plus, words on being sentimental about Perl code.
The H&J history · Back in 2011 I right-justified the text here, and for that to work you need hyphenation, which I did with Hyphenator.js on the grounds that “it makes perfect sense to run this sort of publishing busywork on the Web’s billions of underworked client systems, rather than on its millions of often-overworked servers.” [Pop quiz: What’s wrong with that thinking?]
That was fine, except for some combinations of Hyphenator.js and Internet Explorer apparently didn’t get along. To which it’s easy to say “get a better browser dammit” but lots of people don’t get to pick their at-work browser and who am I to interfere with people refreshing themselves with ongoing during working hours?
Answer to the pop quiz: Well, suppose you do the hyphenation on the
server-side once, at publication time. Enter
TeX::Hyphen.
As of now, the text is published preloaded with soft-hyphen (U+00AD
a.k.a. ­
) characters.
It was something of a pain in the ass as a consequence of the fact that
Perl and Unicode
have
relationship problems (I recommend following that link).
I’ve been unable to figure out how to get the
UTF-8 out of the upstream XML files and through the Perl/MySQL pipeline and
into your browser without breaking it, so an early stage of the pipeline turns
all non-ASCII characters into XML character references,
e.g. ­
, and then TeX::Hyphen wanted to insert hyphens
into those, so a certain amount of backing and filling was required.
The Effects ·
Well, all the pages here are quite a bit bigger because they are now loaded
with 6-byte instances of ­
(those should be 2-byte UTF-8
characters but as I said, Perl and UTF-8, sigh). On the other hand, you don’t
have to fetch and run the 57K or so of Hyphenate.js, so I think it’s a win.
And one of these years I’ll wrestle the Perl/UTF-8 combo to the ground. Plus,
it should work in IE.
There is a bug: If you copy text out of ongoing, it’ll be festooned with those soft-hyphen characters, which shouldn’t actually cause any problems when you paste it in elsewhere, but are displeasing to the eye. I’ll have to cook up some JS to remove them; or rather, probably steal it from Hyphenate.js which I noticed takes care of that.
Of course the right answer is CSS hyphenation, but apparently that’s not ready for prime-time yet.
A lesson · Hm, I moved some reasonably complex application logic out of procedural code in the browser into server-generated declarative markup. This may be an unfashionable old-fart opinion, but I believe that’s generally a good thing to do.
Sentimental · Once a year or so I find myself fiddling with the publishing system here, which was whacked out quickly in 2002 and has thus been in production for a dozen years; 2,792 lines of Perl plus a supporting cast in JS and Ruby and SQL. The original design was ad-hoc, and then bits keep growing on the side here and there. On a couple of occasions I’ve been able to cut it down, but usually not.
Nobody would call it beautiful; in particular, the code that generates the When and What structures looks like I was sleep-deprived and having relationship problems or something when I wrote it.
But hey, it’s mine; an old friend now, flaws and all, and there are lessons there; not that I have any urge to share it. Maybe in a decade or two, at which point it’ll be a playground for software archaeologists.
Republishing the whole site from scratch takes a little under 10 minutes elapsed and burns 3:20 of CPU.
Comment feed for ongoing:
From: David Magda (Mar 16 2014, at 12:03)
Of course instead of doing all of the above, you could simply go to a left-justified, ragged-right layout.
[link]
From: Vladimir Stepanov (Mar 16 2014, at 12:47)
The „bug“ with soft hyphens makes it impossible to look up word definitions from within a page for any (?) browser on iOS. This bugs me for quite some time, for consistent vocabulary search across all apps is arguably the single best iOS feature :-)
[link]
From: Amy! (Mar 16 2014, at 12:54)
Interesting. There are some glitches to work out; it amuses me that one of the rather rare hyphens on the page as displayed in my browser occurs in "server-side".
That is:
"
[...] server--
side [...]
"
But I dunno how you would manage to special-case that one; what you actually want there instead of a 'shy' is invisible permission to break without inserting a hyphen.
[link]
From: Michael Zajac (Mar 16 2014, at 13:47)
Amy, doesn’t a regular hyphen already have permission to break? So the solution would be to avoid adding a soft hyphen in that place.
[link]
From: Rick Levine (Mar 16 2014, at 13:59)
A quick search for the post I remembered reveals your first dabbling with python 10 years ago. As a fellow gray-hair, I can understand that reimplementing working perl code is not something to be encouraged. But, the time you'll spend trying to pound the perl UTF8 issues into submission, might be the same as that for a quick port. Your UTF8 problems might not go away completely, but they'd be different. :-) (And there's at least one impl of a TEX-style hyphenator out there...)
[link]
From: Pete Forman (Mar 16 2014, at 14:40)
How many search engines handle soft hyphens correctly?
[link]
From: Jonathan Hollin (Mar 16 2014, at 15:48)
I use CSS H&J on my website without any issues. Works fine in modern browsers and on mobile devices.
Where it doesn't work it falls back gracefully.
I have used soft-hyphens in the past but quickly discarded them as I encountered several issues with their use (although I don't remember what those issues were now).
I'd go the CSS route every time. Less complexity, graceful degradation, easy to implement and just as easy to remove if necessary.
[link]
From: Peter Flynn (Mar 16 2014, at 16:23)
Ironically the one place that copy-and-paste of your text works perfectly is the one place that doesn't need it: TeX :-)
[link]
From: Norman Walsh (Mar 17 2014, at 05:37)
I get "geek-s" in the first para and "character-s" later on on the N5 this morning.
[link]
From: Paul Rodriguez (Mar 17 2014, at 09:15)
Before you give up completely on client-side hyphenation, consider Hypher.js: it does the same job as Hyphenator.js but is much smaller and *drastically* faster.
[link]
From: Kevin Reid (Mar 17 2014, at 10:08)
Thought about the pop quiz and came up with a different answer: Many of those billions of devices are battery-powered, which makes “underworked” the wrong idea.
[link]
From: Simon Griffee (Mar 17 2014, at 11:06)
The hyphenated ragged-right text reads terribly and doesn't look good. Ragged-right, please.
[link]
From: Paul Cotton (Apr 03 2014, at 12:48)
Thanks for fixing the Hyphenator.js and IE combination problem since I was one of the people that reported it.
/paulc
[link]