Which is to say, It’s Sunday and I just wired up my little publishing empire here to the new hotness in Web syndication technology, PubSubHubbub. If you’re running a hub and you’re not evil, let me know and I’ll ping you.
PubSubWhat? · It was initially a private project by a couple of Googlers which seems to have gotten legs here and there around the Web. The idea is best explained, with a slideshow even, on the home page; take a minute to read through it.
It seems painfully obvious that this whole thing was at least in part provoked by Twitter. Which is a new and very general-purpose communication medium; but it’s owned by a single company, and no matter how much we like them (and I do) it’s all wrong for a general-purpose Internet medium to be owned by anyone.
So I see PubSubHubbub, as much as anything, as an attempt to capture Twitter’s pattern of information flow in a reproducible, interoperable way. I think what we’d like to see is a large number of micro-publishers (just like on Twitter) and an even larger number of subscribers (just like on Twitter). But I think we’d like to see a moderate number of hubs to move all this goodness around — unlike Twitter which by definition has only one.
The effect I imagine is quite a bit like my Twitter client looks to me; except for, among the 140-character micro-posts there’d be summaries of real, meatier posts, with links to the full content, all produced as an automatic side-effect of people hitting their “publish” buttons.
Easy and Hard ·
Hooking up a publishing system to the PubSubHubbub machinery is damn easy;
I know because I just did it. You have to put <link>
element(s)
in your Atom feed pointing at one or more hubs that will be aggregating you.
Then, when you update your site, you need to ping the hub(s) using HTTP POST.
(In fact, you might not even have to; the pubs are perfectly capable of
polling publishers to check for updates.)
Subscribing through a hub is bit trickier. You have to process a potentially-asynchronous callback from the hub to verify that you really want to subscribe and aren’t just a spammer. What’s going to be even harder for a lot of people is that you have to be prepared to accept POSTs from the hub when it wants to tell you that there’s an update in something you’re subscribing to.
This last one is a key limitation of the system as it stands. The vast majority of desktops, at the moment, can’t accept POSTs because there’s a firewall in the way. So the utility of PubSubHubbub for ordinary end-point subscribers on ordinary computers is pretty limited. I can think of a few different ways you might try to work around this, but they’d require some community energy, so let’s see if any develops.
Building a really big scalable hub would be a challenge too, but far from outside the scope of what we know how to do. Me, I’d use Erlang in a flash, but there’d be other ways to go too.
The Spec · I first read the 0.2 spec yesterday and, since I’m a hopeless specification pedant, had to send a bunch of comments to the discussion group. It’s not terrible. I thought there were a couple of places where it offered unnecessary flexibility and probably wandered into YAGNI territory.
But there’s really only one thing that made me seriously nervous. Let me quote from release 0.2 section 7.3, Content Distribution:
... the feed-level elements SHOULD be preserved aside from the atom:entry elements. However, the atom:id element MUST be reproduced exactly. The other atom:updated and atom:title elements required by the Atom specification SHOULD be present. Each atom:entry element in the feed contains the content from an entry in the single topic that the subscriber has an active subscription for. Essentially, in the single feed case the subscriber will receive an Atom document that looks like the original.
Um... Excuse me!? Is the space between the lines here crying out
that a syndication hub should be considered within its rights to change
anything in my feed that’s not atom:id
? Like for example, insert
a Cialis ad in my first paragraph?
Protocols can’t enforce good behavior; if a sleazeball hub operator wants to fuck with the content there ain’t no protocol specification that’s going to prevent it. But in this area, the expectations need to be very clearly set.
Conclusions · I really don’t know. I just don’t see how, absent heroics like Skype has to use, POST-to-the-client is going to deal with the reality of ubiquitous firewalls. On the other hand, Twitter clients which rely on polling seem to make their users happy. I see nothing in the spec about supporting polling, i.e. how a client might ask a hub for its version of a feed, but that seems to me like it might be a real useful function.
So, in closing:
If you’re running a hub and would like ongoing to ping you when I update, I can fix that up; the latency should be a single-digit number of seconds from the time when I hit the “publish” button here on my laptop.
If you know of any interesting PubSubHubbub clients, let me know; I think I’m probably exactly the kind of person who’s apt to get good use out of one.
Comment feed for ongoing:
From: Julien GENESTOUX (Oct 18 2009, at 23:27)
Great piece.
I think the last comment about "getting behind the firewall" is out of scope of PubSubHubbub. Right now, I think it is more intended to be a server 2 server protocol, used mainly by web application (say social networks...) to communicate between each other.
The problem of the 'last mile' to the desktop, the phone... remains to be solved, I agree.
PS : We run a hub at http://superfeedr.com/hubbub and we're not evil :D
[link]
From: Steven Koss (Oct 18 2009, at 23:38)
The "last mile" to the client is outside the scope of PubSubHubBub. This is a good thing because there are many ways to handle the last mile:
FriendFeed is a PubSubHubBub subscriber. FriendFeed uses long polling to get updates to a browser.
Mihai Parparita's PushBot is also a PubSubHubBub subscriber. The PushBot uses XMPP to deliver notifications to client.
I don't remember the name, but there's a PubSubHubBub subscriber that publishes updates to Twitter.
Somebody will probably build an application that uses the heroics the Skype uses. Somebody might even write an application that uses SIP (SIP is not just for voice).
[link]
From: Pádraic Brady (Oct 18 2009, at 23:49)
The last mile is problematic but it's outside the spec's domain. Pubsubhubbub is primarily a server to server protocol for web applications and services. There are, I think, 2+ Hubs using XMPP to cover the last mile, for example.
[link]
From: Fred Blasdel (Oct 19 2009, at 00:02)
None of the PubSubHubBub implementers give two shits about desktop clients, and with good reason. This is a firehose protocol -- the whole point is to have the enormous volume of updates between 9000lb gorillas like Google Reader, Blogspot, Feedburner, Wordpress.com, Facebook, Twitter, et. al. all go through a single socket for each pair.
The only real-world precedent I know of for such a thing is UUCP.
[link]
From: mike bradshaw (Oct 19 2009, at 00:05)
Actually Tim, I think it was inspired more by Jaiku than Twitter.
Jaiku *used* to have the option that you could add your _own_ feeds to your profile, then people who followed you could chosse to un-subscribe from your various feeds, if they so desired.
Although the user numbers were never huge for Jaiku, the action of fetching all the feeds (some 10's or 100's of thousands I guess) might start to appear as a DOS attack for popular services like Flickr, for instance (i.e. once every few hours a single or small number of IP's makes lots of feed requests.
The federation concept (for micro-blogging) is an extra benefit that I think, pleases at least one of the founders Jyri Engstrom.
[link]
From: Graham Parks (Oct 19 2009, at 00:46)
I tried to open a few issues regarding sloppily worded bits of the spec, but didn't get the feeling I was being listened to at all.
The guy at Google that's running this seems to have no idea how to write a good spec or of how important a rigorous one is.
So I'm soured on the whole project.
[link]
From: alexis richardson (Oct 19 2009, at 02:24)
Tim,
Thanks for this summary.
I have a couple of comments. Some of the points you raise have been discussed on the PSHB mailing list which is public. For example there is ongoing discussion about what needs to be added to (or perhaps more likely, removed from) PSHB 0.2 to get to 1.0 and what that means.
One area that we (at RabbitMQ) have been pushing on is support for fat pings and arbitrary payloads, which would make the federated / server2server aspects of this protocol more obvious, and IMO make it *simpler*.
Incidentally if you want something in erlang we wrote RabbitHub which is an experimental implementation of PSHB using RabbitMQ under the hood. It also relays into XMPP for example. See: http://github.com/tonyg/rabbithub
Note also that the same URL includes a suggestion for using reverseHTTP to POST to the client through firewalls.
We welcome contributions, but more importantly I would urge anyone to join the PSHB list and get involved.
alexis
[link]
From: Simon Willison (Oct 19 2009, at 03:26)
On the HTTP server behind the firewall issue, http://www.reversehttp.net/ is really interesting. It's basically a comet-based HTTP proxy in the cloud - your web server sits on your local machine, behind a firewall, making long-polling style requests up to the revershttp service. It exposes a separate URL to the rest of the world. When someone visits that URL, the long-polling request returns to your local server with details of the incoming request - you then send the HTTP response up to reversehttp which forwards it on to the waiting client. It's a truly terrifying hack - try out their demo, which uses the same trick to implement a web server in client-side JavaScript running in your browser.
[link]
From: Tim Freeman (Oct 19 2009, at 03:34)
Here is an idea regarding the POST problem: http://grack.com/blog/2009/09/09/pubsubhubbub-to-xmpp-gateway/
[link]
From: DGentry (Oct 20 2009, at 06:54)
> Is the space between the lines here crying out that a syndication hub
> should be considered within its rights to change anything in my feed
> that’s not atom:id? Like for example, insert a Cialis ad in my first
> paragraph?
Actually, I think the hub should explicitly be allowed to modify the feed. For example, FeedBurner offers a number of different services that involve splicing elements into the original RSS feed, including Cialis ads (probably other types of ads as well).
The problem the hub can solve is a lack of capability in the software publishing the original RSS feed. RSS/ATOM have been implemented over and over, sometimes well and sometimes poorly. If someone likes the web-facing aspects of their publishing software they often continue to use it even if its RSS implementation is terrible. I think the spec should allow, if not encourage, hubs to innovate in this way.
Its possible I've completely misunderstood how pubsubhubbub works and this comment is utter nonsense. Sadly, it would not be the first such comment in my arsenal.
[link]
From: directeur (Oct 20 2009, at 07:15)
Hi, I humbly started working on a an idea called "scondes" which uses pubsubhubbub to build distributed realtime social networks.
A node is a pub and a sub at the same time, and feeds travel from a node to another (foaf entries) - The idea is here http://socnode.org and I have a basic yet working proof of concept here http://socnode.org/code
Please let me know what you think about it :)
[link]
From: Graham (Oct 22 2009, at 23:53)
Why a few hubs? That's still about as healthy for the Internet as just a few food distribution companies controlling 80% of the world's food supply until they go bankrupt.
Forget hubs: distribute hub activity a la torrents but with super-peers, just not official, permanent hubs. Dynamic IP permanent, not Static.
[link]