We seem to have pretty widespread consensus, these days, that HTTP, or perhaps the RESTful approach it exemplifies, offers a pretty sweet substrate for pushing and pulling data around at Web scale. We got further evidence this week when a bunch of smart people stepped slightly outside its sweet spot, into deep tangly weeds.
First, Joe Gregorio of Google, in How to do RESTful Partial Updates, proposed overloading PUT to update resources in a way that’s constrained using URI templates.
He got push-back from Sam Ruby and (more severely) James Snell. There is thoughtful commentary from Mark Nottingham, Rob Sayre, and Dare Obasanjo. I have three observations.
Wikipedia No · I think Joe’s analogy with Wikipedia is bogus. At Wikipedia, each entry is a resource. So is each section and sub-section and sub-sub-section and so on. There’s nothing at all in REST to say that resources can’t be part of other resources, and there’s no ambiguity at all what’s going on at Wikipedia. Unfortunately, at Wikipedia there’s no way to do what Joe’s trying to: update more than one disjoint section at a time.
It’s Important · The problem Joe’s trying to solve keeps coming up. Despite my skepticism about the analogy, I think his URI-templates approach is potentially a winner. Like Sam, I’m not crazy about verbs in URIs, but maybe, seen correctly, these are adverbs.
There’s some history here, private and public. Way back in 2004 I was trying to get Google to buy into the APP idea, and had a meeting with Adam Bosworth and Mark Lucovsky, who are obviously both heavyweights. They both were friendly to the idea, but had an immediate question: You’ve got a decent way to create or update a Web resource, but what if you want to update ten thousand, can you avoid doing an HTTP exchange for each? What about the idea of POSTing an Atom feed rather than an Atom entry?
I thought that sounded plausible; Google joined the Atompub WG and in fairly short order the POST-a-feed idea came up. And in fairly short order after that, it got shot down. People thought up all sorts of corner cases and problems. The two that stick to my mind were authorization and error handling. Can you specify different auth data for each of the ten thousand entries in the feed, and how do you react when twenty-seven of the ten thousand fail to update for some reason or another?
I was disappointed, but thought “Well, YAGNI, maybe.” Since then, there’s been Yaron Goland’s Web3S proposal, whose raison d’etre was update at other than the singular-resource level. (Hmm, wondering what happened to Web3S?)
So this is something that obviously keeps coming up.
It’s Hard · I have the advantage here of having spent three years co-chairing an IETF Working Group, a role whose chief function is to discern or decree the existence of consensus. Speaking on the basis of that experience, I can tell you authoritatively that we do not have consensus here; rough, smooth, or any other kind.
But it’s totally worth arguing about.
Comment feed for ongoing:
From: Mark (Feb 16 2008, at 19:44)
> Hmm, wondering what happened to Web3S?
"The purpose of this document is to define a protocol that is a good fit for Live Contacts. ... it is critical that our protocol be as simple as possible. ... We have introduced, however, a new method -- UPDATE." And 8 open issues (of the form "hmm, whitespace; hmm, search; hmm, serialization") in a "specification" last updated the day it was released.
Yeah, I can't imagine why that didn't catch on.
[link]
From: Phil (Feb 16 2008, at 21:02)
> how do you react when twenty-seven of the ten thousand fail to update for some reason or another?
There's always WebDAV's "207 Multi-status" response, but the question of the structure of the response body always made me a little uneasy.
[link]
From: Roger (Feb 17 2008, at 04:28)
re: Mediawiki --- articles are just blobs; sections are updating parts of that one blob, not individual records. So Joe's analogy seems accurate.
[link]
From: Jean-Jacques Dubray (Feb 17 2008, at 06:47)
Tim, don't you think that there is a fundamental difference between expressing the changes you made to a resource representation as part of an inter-action and the expression of what you want the resource to look like once it is updated?
[link]
From: len (Feb 18 2008, at 08:23)
No argument that http offers a sweet spot for textual/image static web content or media designed to download and play locally.
On the other hand, industrial live streaming media servers don't seem to fit into web servers and streaming media increasingly a requirement for systems I'm seeing on the radar.
Multiple-bit-rate (MBR) video, multicast, video indexing, real-time vs bursty, all are media servers issues.
UDP isn't going away.
[link]
From: Donovan Preston (Feb 19 2008, at 10:26)
I think the argument "OMG 10000 requests" is bogus. What possible argument could there be saying that using a single request is more efficient? The 10000 requests, if indeed that much data is actually being updated, would actually interleave much better, and scale much better through the use of partitioning some of the requests onto different physical machines, than the "here's one big-ass request that changes 10000 things!" approach.
I think web developers perceive many requests as a problem because most web architectures are still stuck in the stone age. Modern web applications written using erlang or stackless python or any other language that supports high-level coroutines along with non-blocking IO (or, more painfully, just non-blocking io) can easily handle 10000 requests, taking advantage of http keepalives and pipelining, and I bet it'd beat the pants off partial updates in performance and simplicity.
[link]
From: John (Feb 19 2008, at 18:01)
I'd separate the two concerns if possible. BATCHing (James Snell's proposal) lets you take care of separate auth and separate responses without doing separate round trips. PATCH lets you do general updates (combination of PUT and POST) to a restricted set of resources (those for which a server can promise a transactional update).
[link]
From: Nick Sydenham (Feb 27 2008, at 08:23)
Having to edit the source document with xml:id is do-able but somewhat cumbersome and using URI templates seems to be introducing another potential confusion. And HTTP POST vs PATCH - tbd.
Seems that part of the heat in this argument is around trying to change the existing protocol. Would it be easier to consider the solution (at least initially) as a pattern that works with existing standards and work from there?
What about something like:
POST /atomupdate
Host: www.tbray.org
<update auth="?">
<doc src="atomfeed1.rss">
<node location="...">content</node>
</doc>
<doc src="atomfeed2.rss" auth="?>
<node location="...">content</node>
<node xpath="..." action="add">content</node>
<node location="..." action="delete"/>
</doc>
</update>
Location would normally be an XPath for XML/XHTML documents, but is extensible for non-XML resources as well.
Missing things are obviously namespaces, authorisation details, etc.
The question is whether a pattern can be made universal for all XML content without either overloading the existing spec or introducing a new verb.
[link]