It’s all over the news these days, because it’s A Good Thing: the Web will be smarter and faster and better. And for other reasons involving politics and vituperation. I love parts of HTML5, but it’s clear that other parts are a science project. And as a sometime standards wonk, I’m puzzled by aspects of the way the spec (not the language, the spec for the language) is put together.
What’s Good · I suspect I agree with most external observers: what’s cool are the new elements like video, audio, and canvas. And since I’m a protocols guy, the closely-related Web Socket work; more on that below. I’ve also enjoyed how the video element has shone a remorseless and very useful light on the patent-troll infestation standing in the way of better Web multimedia.
Progress is well under way on implementing the pleasing parts of HTML5, and there are people thinking seriously that it may soon remove the need for compiled “native” applications on a variety of platforms.
That’s good!
What’s Bad · The process is clearly hard to manage. On a couple of occasions I’ve tried to take a sip or two of the HTML5 waters, and instantly been overwhelmed by the volume and intensity of the conversation; “drinking from a firehose” applies. It’s something that you really have to do full-time to do at all, I think.
It’s also self-evidently troubled. This week we have HTML5 Editor Ian Hickson publicly accusing Adobe of placing a “secret block” on the HTML5 spec. Adobe hotly denies it. Simon St. Laurent writes up the story and then hostilities break out in his comments.
Not a pretty picture.
Is it possible that they’ll fight through all this swampy stuff and get a good result? We’ll see.
The Networked-Object-Model Experiment · One of the distinguishing features of the Web is that it has never specified APIs or Object Models. Interoperability has been at the level of syntax: I send you these bits, here’s what they are defined to mean, in response you send me those bits, here’s what they’re defined to mean. And so on.
I have always felt that this is why the Internet and the Web took off so well, exceeding by orders of magnitude the deployment of other attempts to build networked application frameworks (CORBA, DCOM, Jini) that were based on objects and APIs. The lesson, it seems to me, is that we just don’t know how to do that, and interoperability should happen at the level of syntax.
The HTML5 draft seems to disagree. It provides detailed algorithms for parsing HTML, even in the face of severe syntax errors, and specifies how the results of parsing should be used to construct the Object Model. Thus, the syntax is ephemeral; the Object Model, interoperable across the network, is what matters.
The theory is that if all the User-Agent providers implement all these algorithms exactly as specified, complete interoperability will be achieved and people who build Web applications need no longer concern themselves with the differences between User Agents. Which would of course be wonderful.
Will it work? Nobody knows; it’s a science experiment. Just because nobody has ever succeeded in specifying a workable networked object model doesn’t mean this project will likewise fail. But it does mean that when considering the future of HTML5, we should recognize that this is a very hard problem, and there’s no guarantee that that part of it will come off.
Which may not matter that much; User-Agent implementors are increasingly subject to market pressure to be compatible, plus Web application authors increasingly work at a higher level, thinking in terms of things like Rails or jQuery constructs, thus insulating themselves somewhat from the compatibility nasties.
So for my money, I see little harm in the speculative parts of HTML5 if we get those tasty new elements, even at the current imperfect level of interoperability.
How To Spec? ·
[Note: At this point, I launch into a detailed discussion of the
design of specifications for network protocols; the content will be of
interest to a very small group of people, including almost nobody who just
wants <video>
to be here and work today.]
This was provoked by Joe Gregorio’s recent (amusing) Joel-in-a-box, calling out the excellence of the Web Socket protocol spec, which was produced by the same group and editor as HTML5, and is in a similar style. Joe admired the way it was “clearly directed at someone that is going to be implementing the protocol”, finding it refreshing compared to many other current RFCs. By the way, Joe did an outstanding job as co-editor of RFC5023.
So I went and read the Web Socket protocol and my reaction was more or less the opposite. I like the protocol and I gather it’s already been implemented and works. But I found the spec hard to read, amazingly long and complex for such an admirably simple protocol, and missing information that seemed important.
Like HTML5, it doesn’t just specify the bits to be interchanged and what they mean, it provides detailed algorithmic guidance, and I quote for flavor from Section 4.2 Data framing: “the user agent must run through the following state machine for the bytes sent by the server”. I assumed “must” meant “MUST”, and was relieved to find in Section 2. Conformance: “Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent.” Thus, we understand that the algorithms are provided for their explanatory value.
Let me deep-dive on a couple of the sections to examine the difference between styles of specification. I’ll start with that state machine that was mentioned earlier.
Framing · The section describing the data framing has six numbered top-level sections, three steps for receiving data and another three for sending them. The receiving-data part has two sub-lists of seven and five steps respectively. It’s all in a almost-pseudocode style and extends across a page and a half.
Here’s how framing’s done:
Messages sent by either side have to consist of Unicode characters encoded in UTF-8. They have to be framed by a leading 0x00 byte and a trailing 0xFF byte.
Either side has to accept (but discard) message frames where the first few bytes, with the high bit set, give the message length. Thus, a frame beginning with 0x81 0x82 0x83 0x04 has a length of (1 * 128 * 128) + (2 * 128) + 3, or 16643 bytes. (Presumably these are artifacts of an earlier version of Web Sockets?)
When clients see busted UTF-8, they replace the damaged text with U+FFFD REPLACEMENT CHARACTER. When servers see busted UTF-8, the behavior is undefined.
That’s all.
Headers · If you go to the first illustrative example at the top of the spec 1.7 Writing a simple Web Socket server, you find:
Listen on a port for TCP/IP. Upon receiving a connection request, open a connection and send the following bytes back to the client:
48 54 54 50 2F 31 2E 31 20 31 30 31 20 57 65 62
20 53 6F 63 6B 65 74 20 50 72 6F 74 6F 63 6F 6C
20 48 61 6E 64 73 68 61 6B 65 0D 0A 55 70 67 72
61 64 65 3A 20 57 65 62 53 6F 63 6B 65 74 0D 0A
43 6F 6E 6E 65 63 74 69 6F 6E 3A 20 55 70 67 72
61 64 65 0D 0A 57 65 62 53 6F 63 6B 65 74 2D 4F
72 69 67 69 6E 3A 20
Send the ASCII serialization of the origin from which the server is willing to accept connections.
For example:
|http://example.com|
Continue by sending the following bytes back to the client:
0D 0A 57 65 62 53 6F 63 6B 65 74 2D 4C 6F 63 61
74 69 6F 6E 3A 20
At this point, I was wide-eyed; exactly what is going on here? Maybe I’m just not supposed to bother my pretty little head about what I’m sending down the pipe? So I poured the hex into a little scrap of Ruby to find out what I’d be sending:
HTTP/1.1 101 Web Socket Protocol Handshake Upgrade: WebSocket Connection: Upgrade WebSocket-Origin:
server to accept connections fromWebSocket-Location:
script to run
Gosh, that sure looks familiar. And, in fact, it turns out that the Web Socket protocol handshake is a lot like HTTP, in that the messages back and forth begin with a request or status line just like HTTP’s and and continue with CRNL-separated name/value pair headers just like HTTP’s.
So, if I were an implementor, my first question would be “Can I use my existing HTTP header library to read and generate headers?”
The answer turns out to be “probably”. Web Sockets forbid continuation lines (good!) and in some cases require that headers appear in a particular order. It’s possible that your HTTP library might do continuations or store the headers up in a hash and spit them out in a different order.
In fact, if you go to Section 4.1 Handshake, you’ll find a algorithm with 24(!) steps detailing header handling. Steps 15 through 21, with conditionals and GOTOs, detail how to pick apart standard HTTP-header syntax: Name, colon, optional space, value, CRNL. Um, wow.
For the Purposes of Comparison · Creating a spec in the HTML5 style seems like a whole lot of work. The Web Socket draft is long, and contains mind-numbing amounts of detail, much of it replicated in multiple places in the draft.
I was a little uncomfortable that the draft leaves many of its design decisions unexplained (e.g. why client and server must read and discard message frames with a leading byte-count). I’m wondering if the extreme difficulty of writing a spec in this style leads to a certain amount of triage on such niceties.
More evidence of the difficulty is that although this is labeled as draft number 75, it’s still at what I’d call an early/middle state of maturity. There are obvious quality/consistency issues here and there: monocasing of header names, whether to give messages as hex byte sequences or ASCII literals, fractured text about error handling, fuzzification in the counted-frame description. Nothing terribly serious (I’ll submit reports in one of the appropriate places), and since there are apparently interoperable implementations, the spec empirically seems to work.
But I still found it strange and counter-intuitive. I think this argument between the traditional and HTML5 style of specification is interesting, maybe important, and will be with us for a while.
So, as a contribution to the discussion, I whipped up an alternate version with the procedural specifications replaced, where I thought reasonable, by declarative versions. I omitted all the sections which are the same as in the original version and all the closing material. The top-level section numbers are the same, and also the subsections of Section 1, except for I added a Web Sockets and HTTP Headers subsection.
Please note:
I am not proposing to replace or amend the Web Sockets draft; this is purely for comparison of specification styles.
I am not claiming that my version of the spec is complete or accurate; I only put a couple of hours into this. On the other hand, if there are things I got really wrong, that’s useful information because I’m experienced at reading these things, thus others are likely to go similarly astray.
The declarative version is much shorter. Is it better or worse? No, that’s the wrong question. For what kinds of specification is the HTML5 style better and for what kinds does the declarative style win?
Comment feed for ongoing:
From: Dorian Taylor (Feb 17 2010, at 00:27)
Wow. Just… wow.
I have probably mentioned before in numerous places that I am wholly unimpressed with how HTML5 (and Hickson) first won its place, and second, assembled the most insane Katamari-ball pastiche of whizz-bang features to soak the panties of the children who pump out endless reams of AJAX. Every time someone airs another foetid bolus of this Roarkian disaster, as you just did with Web Sockets™ (I wonder if The Editor ever bothered to consult Dr. Fielding), I thank myself for not attempting to consume its (copious) bulk in its entirety.
HTML5, by its own admission, is designed to address the "vague subject" of Web Applications. Address it does, to the point of copious over-specification and a lingering stench of second-system syndrome. XHTML2, the specification that was supplanted, was subject to the critical flaw of having been designed without adequate input from prospective users, but at <em>least</em> it was a tidy vocabulary capable of describing dense semantic hypertext, and nothing else. The Editor complained, and was handed the keys. Widgets are now more important than semantics. Those, and whatever other ramblings the purportedly benevolent dictator sees fit.
There are two specific lobes of the HTML5 hairball which I find completely untenable. The first is the expansion of client-side state to an SQLite database (!@#$!#%), and the second is the heavy dependency on JavaScript.
In the first case, there is already a perfectly good data source: the markup. As a user of the Web it is absurd to have to go digging around in the innards of my browser for state data, a position abused already by cookies, and still of questionable value. But shipping code wins, and I understand the browsers have already implemented this. That is a treat.
In the second case, I wonder if The Editor has ever heard of a little thing called the halting problem. That is, it is impossible to infer the behaviour of a segment of Turing-complete code (such as JavaScript) without executing it. A site that requires me to execute its JavaScript in order for it to work is asking me exactly to permit it to execute any behaviour JavaScript is capable of — which thanks to HTML5 is now no small amount. The irony is most of what this JavaScript does in the wild could be accomplished with stricter, decidable grammars. (Canvas is my example of choice here; SVG is a long-standing, perfectly-usable decidable grammar which is <em>augmented</em> by <em>optional</em> JavaScript.)
The biggest disappointment of HTML5 is that it not only encourages bad behaviour, it defines it. HTML4 and XHTML1 did great work to reform the lingua franca of the Web; HTML5 by contrast <em>specifies how</em> it ought to be sloppy. This will not end well.
I believe XHTML2 bears another examination if not an outright resurrection. Again, it is a clean and strict, no-nonsense, no-legacy (actually low-legacy) vocabulary for describing hypertext, with explicit provisions for semantic annotation (RDFa), presentation (@role attribute) and most crucially missing from previous HTMLs, transclusion (@src attribute in each element). It's just not a suitable successor to HTML4. Good thing HTTP doesn't care (unless Hixie got his ha
[link]
From: Stefan Tilkov (Feb 17 2010, at 00:43)
I really don't like how Hixie's version simply assumes everyone knows what a byte is. I think the spec should clearly mention the individual bits instead.
[link]
From: kll (Feb 17 2010, at 01:06)
Hixie writes for browser implementors, and those few people gave very positive feedback about style of the spec.
There was even a case when one of the people opposing algorithmic style proposed declarative version which caused some to misunderstand a detail.
We've had so many screwed up implementations of basically everything in the browsers, that a really, really detailed spec may be worth a shot.
Ian does know about Turing-completeness and everyone in the WG agrees that pages should work without JS where possible, but you have to accept that some features are useless without JS and are better done as JS interfaces.
WebSockets are not HTTP. The handhake is made to look like HTTP to sneak past deep-inspecting firewalls.
[link]
From: Ian Hickson (Feb 17 2010, at 01:28)
The main reason I prefer the more verbose style is that it doesn't leave anything undefined. As you say, whether this is a good thing or not is a science experiment. So far that experiment has run the longest with CSS2.1, where a similar — though not quite as thorough — approach has been in use since around 2001. I think so far that experiment has shown excruciating detail to be successful in getting interoperability, though of course the detail is only half the problem: you also need a comprehensive test suite.
When you don't use the verbose style, it's much easier to miss things that aren't obvious. For example, looking at wsock-00, what are the lines to be terminated by? 0x0A? 0x0D? both? Is the case of the headers important? Is the space after the colon significant? If not, how many spaces are to be ignored? What if a line has no colon?
Also, there are parts of your draft that leaves things ambiguous. When you say "the port need appear in the URI only if does not match the defaults", does that mean it's ok to include the ports in all cases, or not? When you say to convert to lowercase, what does that mean? What's the lowercase version of an uppercase dotless I? Is it a lowercase dotless I as in Turkish, or a lowercase dotted I as in English?
These are all questions whose answers come basically for free if you go down to the level of pseudo-code, because even if you don't think about it, you're very likely to have make an unconscious decision one way or the other.
Another benefit of the verbose approach is that you can use "must" and "may" (and "should") much more precisely. For example, a lot of the requirements in wsock-00 aren't invoking RFC2119, they're just statements of fact. I prefer to make sure that you can pin every normative conformance requirement to a specific "must".
Now that's not to say that you have to be as verbose as I am. I might go too far; I'm just more comfortable when I can implement a spec just by copying the spec into code almost line for line. Bits are basically free, so I don't think making the spec 20 pages long vs 40 pages long is a particularly big win, especially when the downside is potentially more ambiguous text.
In practice, I've found that writing specs this way actually takes me a whole lot less time than writing specs the more conventional way. With the more conventional way, you spend less up-front time writing, but you spend orders of magnitude more time on the back-end actually working with implementors getting the niggly details and edge cases figured out. At this point, in fact, I'm so used to writing specs this way that I can probably hammer them out quicker in this style than the more conventional style.
Regarding some of your other comments:
- the second framing type (with the length) is for a future version of the protocol, to make sure current clients and servers are forwards-compatible.
- the handshake is likely to change quite a bit in the near future, based on discussion on the hybi list (I'm still going through all that e-mail figuring out what the requirements for the new design, if any, should be).
- the draft is numbered 75 because every change I make (even typo fixes) get published. HTML5 itself is up to revision 4700 or so!
- The Web _has_ had APIs and Object Models specified, ever since the late 90s. There's a whole raft of DOM specifications: DOM Core, DOM Events, DOM HTML, CSSOM, etc. The problem with that was that because they weren't integrated with the language they modelled, lots, and I mean _lots_, of errors crept in, and a huge number of things were never defined. What happens in HTML4 when you use DOM2 HTML to change the "type" attribute of an "input" element from "text" to "file"? It's not defined. HTML5 defines that, by making sure that the DOM, syntax, and so forth are all defined together.
HTML5 isn't the first spec to do that, by the way; SVG has been doing this for a while.
Cheers.
[link]
From: Maciej Stachowiak (Feb 17 2010, at 01:51)
I think some of your statements are inaccurate:
> One of the distinguishing features of the Web is
> that it has never specified APIs or Object Models.
The Web has been specifying object models since at least 1998: <http://www.w3.org/TR/REC-DOM-Level-1/> More recent syntax specs (SVG 1.1, MathML 3.0, XForms 1.0) have pulled their specific APIs into the syntax spec, and HTML5 follows their lead. HTML5 also specifies some of the APIs that have historically been implemented by browsers and are required for handling real-world Web content, but have never previously been written down in a formal specification.
> The HTML5 draft seems to disagree. It provides
> detailed algorithms for parsing HTML, even in the > face of severe syntax errors, and specifies how
> the results of parsing should be used to
> construct the Object Model. Thus, the syntax is
> ephemeral; the Object Model, interoperable
> across the network, is what matters.
The first sentence is true, but the second does not follow, and isn't accurate. What's sent over the network is syntax. The DOM (Document Object Model) does not operate over the network. Strictly speaking, it is a specification formalism that allows most of the spec to define requirements in terms of an abstract model instead of concrete syntax. This
Note that XML has an abstract model that can be used to define things at a higher level than bytes or characters, namely the XML Infoset:
http://www.w3.org/TR/xml-infoset/
HTML5 directly states how text/html parses into a DOM. For the XML serialization, it relies on the Infoset and the existing Infoset to DOM mapping: <http://www.w3.org/TR/DOM-Level-3-Core/infoset-mapping.html>. In this way, it can support both "classic" error-tolerant HTML serialization and well-formed XML serialization without having to specify everything twice, since there is a common model that can be used, much like an Abstract Syntax Tree in a compiler abstracts away unimportant lexical details.
[link]
From: Rimantas (Feb 17 2010, at 02:26)
Ok, Dorian, we get it: you don' like Ian, you don't like HTML5, and you don't know what web application is.
I am curious, however, how are you going to implement client side, off-line capable web app with cookies and markup only.
[link]
From: Jon Gretar (Feb 17 2010, at 07:26)
Oh how I do not agree with Dorian here. HTML5 goes very far in supporting what has been needed for web applications and the direction they are going.
It has reduced the xml disaster and taken a step away from the enterprise mediocracy some wanted to be the future of the web. Instead HTML5 took a look at the real web as it is and found out what really was missing and met those requirements in a non-strict manner that allows continued evolution and unforeseen improvements. Sure we can argue on specific implementation on things like WebSockets but the main point is that it's here and it's useful for when it's needed. Also the choice of SQLite may be wrong for some but at least we have a local datastore standard.
The strange idea some people had of implementing strict rules to improve interoperability and semantics were misguided and ensured less of both interoperability and semantics in the same way as the SOAP standard ensured it's goals would not be met.
I'm glad about HTML5. It's about the web in it's current state and the direction the web is going. It's not about the direction some want to force it to go to with a sledgehammer.
[link]
From: Edward O'Connor (Feb 17 2010, at 07:31)
I wrote a bit about Web specs and talking about things in terms of syntax v. object models on my blog: http://edward.oconnor.cx/2009/12/RDFa-and-the-DOM
[link]
From: Jonathan (Feb 17 2010, at 08:19)
Re: SQLite local storage and Dorian Taylor's remarks, I do appreciate not wanting to add another complex system to browsers. But, web applications exist, and client state/data has to live somewhere. Keeping it "in the markup" just means pumping out more reams of panty-soaking AJAX, to re-turn a phrase. A persistent, queryable client side data store seems like a nice tool to encourage clean, REST-ful applications in comparison, doesn't it?
[link]
From: Ted Han (Feb 17 2010, at 08:27)
So, as a consumer of standards (most recently the SMIL standard), one thing i absolutely have to remark upon is this:
Long standards are horribly daunting to get into.
I know that's an inherent problem for web standards, but there are steps that can be taken to mitigate the pain of people sticking a toe in.
In particular, PLEASE, PLEASE, *PLEASE* make the standards documents correct, semantically meaningful mark-up.
This is freaking 2010, we have the technology to let standards readers (who are often programmers) interrogate marked up documents.
There is *no* reason why it should be so difficult to generate accurate summaries of a semantically marked up standards document. This is important, because if you can generate summaries and references directly from a standards document, you can guarantee their correctness with respect to their standard.
Instead, I've spent the better part of two weeks trying to write a small ruby program with Nokogiri which extracts the modules, elements and attributes for SMIL3 and their relationship to each other, only to discover, due to semantic inconsistencies that the SMIL standard simply makes it impossible to programmatically generate summaries.
So now i've got this hybrid man-machine SMIL3 reference generator, which i will have to continue to maintain (with which i generated this graph).
I haven't read through the HTML5 markup enough to know whether their work fits the bill or not. I desperately hope it does. All standards should.
[link]
From: Jeff Schiller (Feb 17 2010, at 09:03)
If been pondering this for awhile too.
I think the level of specificity in the spec is required for implementors. I think it is the right approach.
I think the level of specificity in the spec is distracting/confusing to anyone else.
Is it as simple as providing informative and normative sections? No. I think this would make the spec even longer and even more confusing.
I've come to think that the best approach would be to provide two different documents: user agent requirements and an authors/developers/users guide.
And I think these documents should be provided by two different editors. This would be to ensure that the approach is feasible both from a browser and from an author standpoint - and writing skills can be matched to the document.
[link]
From: Tim (Feb 17 2010, at 09:05)
Ian, just one note of amplification on the headers part of the spec. I put in a normative reference to RFC2616, which I think covers the CR/NL and casing and similar issues quite comprehensively and has good demonstrated interoperability.
Then I added a specification of the additional constraints that WS imposes beyond those in 2616.
I suggest this as an alternative to redoing all that work, and perhaps more helpful to implementors who want to re-use existing well-debugged software.
[link]
From: Jeffrey (Feb 17 2010, at 09:14)
I have a hard time with "we just don’t know how to do that" as a valid reason for leaving something out of a spec, particularly when "we" really refers to the web-standards process that existed in 1999. Just because something is difficult or unknown doesn't mean we shouldn't pursue it; a prescriptive algorithm for parsing seems like it would solve real compatibility problems that have plagued browser makers since the start.
[link]
From: Scott Ferguson (Feb 17 2010, at 09:14)
As someone who's implemented my fair share of specifications, the websockets draft has been one of the most unhelpfully-written specs I've run across.
The length could have been easily described with a grammar instead of the mess in the current draft.
len ::= [\x80-\xff]* [\x00-\x7f]
[link]
From: Kevin H (Feb 17 2010, at 10:53)
Tim - I know your "What's Good/What's Bad" comparison was just an intro to this piece, so I apologize in advance for focusing in on that while ignoring the true topic of your post.
But I think it is worth noting: If you choose to include the most recent headline-grabbing news that has escaped from the Working Group as part of your "What's Bad", it should be balanced by a mention of the steady, hard-work, good-faith effort being put in by people like Sam, Maciej, and Shelly.
I'm only a casual observer of what has been going on with HTML5, but it has been my observation that every time some new dust-up makes the rounds through the blogosphere, when I dig into the details I find what I consider to be pragmatic people making pragmatic decisions. It is this fact that has sustained my optimism about the future potential of HTML5, even in the face of all the bad press.
[link]
From: Ian Hickson (Feb 17 2010, at 12:26)
Tim, if I have to read all of HTTP as well, then it's not shorter! :-)
In general I try to avoid having references to other specs that are not immediately obviously required to implement the spec (for example in the case of wsock-00, it's not obvious to me whether HTTP says anything useful... but one could miss some pretty important things, for example HTTP says things about character encodings and continuation lines, are they relevant?). The main reason for avoiding such references is that browser vendors don't read them, so they aren't a good way to get interoperability. For example, see this e-mail from one of the Mozilla developers:
http://lists.w3.org/Archives/Public/public-html/2010Feb/0158.html
If Mozilla devs aren't reading the specs, I assure you none of the other browser vendors are either. (Boris is the one guy I would have guessed _does_ read specs!)
[link]
From: Val (Feb 17 2010, at 12:36)
Just a couple of notes: @Ian, bits may be virtually free, but attention and comprehension are not. A well-written, accurate 20-page spec is much more attractive to me than a well-written, accurate 40-page spec.
@Tim, I like your draft a lot, especially the parts that show ASCII instead of hex (I'm with you; I don't want to send bytes I don't understand down the wire) and explain *why* a feature is specified, rather than *how*.
One minor nit: at the end of section 4.1.2, avoid the double negative, as it hinders comprehension. If the statement is correct (@kll disagrees), rewrite the sentence to read "...the Web Socket protocol is considered an HTTP API."
Back to spectating...
[link]
From: Ryan Williams (Feb 17 2010, at 13:46)
Speaking as a recent websocket implementor, I feel that the most important question that isn't concretely addressed by the websocket spec is whether we can use existing http libraries or not. It *seems* like we should be able to, but it is quite difficult for each reader of the spec to decide for himself.
This question is so important because there are major application design and deploy decisions that hinge around whether or not websockets can be integrated with existing webservers. If you have to open up a new port and start monitoring a new daemon, then you're going to think twice about designing your application around websockets. I believe, based on my reading of the spec, that it is indeed possible to use websockets with normal web servers, but this is something that could use clarification.
I also could not figure out what is up with the length-specified message format; it seems that it's documented but not intended to be used, which is a strange situation for such an explicit spec. It would be great if such guidance was provided (or the section removed, if it's not intended to be used). I appreciate the effort Tim went through to discover the answer to this question. :)
[link]
From: Mark Nottingham (Feb 17 2010, at 14:10)
Ian, the problem is that (as we've discussed many, many times, and as Tim points out above), your interpretation of HTTP isn't compatible with some existing tools, and will be very brittle in the face of proxies and firewalls. That makes WebSockets more -- not less -- brittle than Comet, etc.
If you actually made the reference to HTTP, you'd be compatible with it, and people would know what to do with it. As a bonus, you'd be able to use HTTP redirects instead of inventing your own.
[link]
From: Larry Masinter (Feb 17 2010, at 17:30)
I started to write about a related but somewhat different topic around under- and over- specification.
http://masinter.blogspot.com/2010/01/over-specification-is-anti-competitive.html
[link]
From: Agh, my eyes (Feb 17 2010, at 18:42)
A lot of what's new in HTML5 is meant to appeal to designers--something you very clearly are not.
[link]
From: Carlos Pacheco (Feb 17 2010, at 21:28)
Ok ok... One quick question to all you pro markup gurus,(and I apologize if it may seem off the topic) does SGML play a role in any of this? Does it have a future at all? HTML5 seems to lack extendability.
[link]
From: Steven (Feb 18 2010, at 00:49)
@Carlos I remember XHTML like XML had emphasis on creating your own tags, but HTML5 as markup is closer to the idea of HTML. Not meant for eXtending.
As said before in the comments, there's not so much emphasis on semantics. The markup part of the HTML5 is too simplified, I think.. If the goal is to have every browser display HTML5 in the same way, then the video tag is a complete afterthought despite it being one of the most hyped tags of the spec. How can a spec claim that autobuffer true/false _can_ be ignored?
So even if I specify the video tag not to autobuffer, WebKit can (and does) choose to ignore it and autobuffer anyway. Why? What if I place three videos on one page? Why can such a critical option be optional according to the spec? WebKit also won't work with the Lazyload jQuery script. With it too, WebKit loads everything on the page when the script tells it not to. Do I see a pattern? Does WebKit have a policy of wasting bandwidth and HTML5 will make something that prevents that _optional_?
Plus, I would very much like to code the same way I do for XHTML, but those Zeldman-esque days seem to be over. <video controls> should be <video controls="true">, unless the concept here is that for any tag with true params, I can remove the true value. And for false, I just don't specify it at all?
[link]
From: Graham Klyne (Feb 18 2010, at 01:35)
It seems to me that a pseudo-code specification is the worst of all worlds.
On one hand, it's hard for a (human) engineer to unravel the constraints that the implementation needs to satisfy from the embedded details of how to create a particular style of implementation. On the other, it's not possible to test using a computer.
If one wants to go down the route of algorithmic specification, then I'd advocate creating an *executable* implementation in a very high level programming language, which can be tested to understand what it achieves separately from how it does so. An example of the kind of thing I mean here is the Haskell implementation of URI parsing [1] [2], which I implemented while RFC3986 [4] was being deliberated, and used to provide feedback to the specification writers. The implementation very closely follows text in the specification. (I'm not suggesting this is a replacement for the actual spec, just an example of how a high-level executable description might be constructed.)
Several other W3C working groups have created prose descriptions coupled with a suite of test cases, which is a very successful way of linking a human-understandable description of the constraints with a computer-testable specification to iron out the ambiguities of prose.
And on the point of normatively referencing RFC2616, I think it's false to claim a WebSockets implementer would have to read all of the HTTP spec. That document is well organized and indexed, and it's quite easy to locate the small section that deals with line termination [3].
#g
--
[1] http://hackage.haskell.org/packages/archive/network/2.2.1/doc/html/Network-URI.html
[2] http://hackage.haskell.org/packages/archive/network/2.2.1/doc/html/src/Network-URI.html
[3] http://tools.ietf.org/html/rfc2616#section-2.2
[4] http://tools.ietf.org/html/rfc3986
[link]
From: Dan Brickley (Feb 18 2010, at 02:32)
Ian Hickson wrote "These are all questions whose answers come basically for free if you go down to the level of pseudo-code, because even if you don't think about it, you're very likely to have make an unconscious decision one way or the other."
Makes me wonder - why don't we go the whole way and specify in real code? W3C used to host libwww, a library used by several early browsers. Of course with real code you'd have to pick a real language (although the standard version could be transliterated). Not that I'm advocating this, but since the HTML5 work seems more about specifying behaviour than static document formats, why not take that direction to it's natural conclusion?
[link]
From: Dave (Feb 18 2010, at 06:10)
Glad to hear that I'm not the only one that thinks the WebSocket spec is written in an unnecessarily over-complicated style. Hope they take this information to heart and simplify it!
[link]
From: len (Feb 18 2010, at 06:44)
Insofar as SGML and XML are rules for markup languages that adhere to those rules, the HTML5 design makes it clear that it has it's own parsing rules and breaks those rules as desired. HTML5 is a new variant on an old approach to generic coding. It resembles XML/SGML languages but is not one.
Experience says that is It Is A Bad Thing even where some of the science experiments are good. I'm sure there are defenders of the decision to make HTML5 convoluted due to trying to maintain parse independence while at the same time attempting to remain compatible with some XML application tools. I'd like to hear them because so far decisions such as element content in attributes are widely held to be bad practice.
Something tells me that HTML5 may not fare better than XHTML in the long run given it is looking more technologically messy and publicly in political difficulty.
[link]
From: Ian Hickson (Feb 18 2010, at 11:59)
Writing it in code (i.e. writing a reference implementation rather than a spec) makes it very hard to delineate what is a requirement vs what is supposed to be an implementor choice. For example, HTML is meant to be media-independent, but you have to pick a medium to write an implementation.
[link]
From: Graham Klyne (Feb 19 2010, at 10:21)
@Ian: writing in pseudo-code suffers the same problems, in my view. This was very much behind my comments on an earlier draft of WebSockets [1].
#g
--
[1] http://www.ietf.org/mail-archive/web/hybi/current/msg00503.html
[link]
From: Nick Johnson (Feb 27 2010, at 12:11)
Maybe I read a different version of the spec to everyone else, but the version I read was quite clear on why there was a length-prefixed framing type: To permit transmission of unescaped binary data as a potential future frame type, without breaking existing clients.
[link]