I saw a notice from Google that their blog-search robot will start reporting subscriber counts. I poked into a recent log-file and found lots of agents doing this, so here’s a report with some numbers.
Volume · There may be some issues with these numbers, read the discussion below; but here’s the first cut.
Crawler | Subscribers |
---|---|
NewsGator | 6161 |
4552 | |
Bloglines | 4363 |
NetVibes | 288 |
NewsHutch | 143 |
livedoor | 38 |
FeedLounge | 24 |
HanRSS | 20 |
Alesti | 18 |
PageFlakes | 13 |
Feedbringer | 7 |
kb.Rmail | 5 |
NewsAlloy | 3 |
Eldono | 2 |
Pluck | 2 |
RssFwd | 2 |
Feedpath | 1 |
Issues · As it says above, there are some. First of all, I’m only counting the number of subscribers to my Atom 1.0 feed, not my no-longer-present permanently-redirected RSS feed. I think the biggest sufferer is NetVibes, which drops 354.
Second, several of the services, notably Google and NewsGator, report what
seem to be multiple different groups of subscribers to my Atom feed. A few
of them are kind of understandable: two slashes instead of one,
www.tbray.org
vs. tbray.org
, and a few to
/ongoing/ongoing.atom?foo=bar
(huh?); but for example Google,
which helpfully labels each group with a unique feed-id
, suggests
that several different ones apply to
http://www.tbray.org/ongoing/ongoing.atom
. Beats me. This may
lead to a little inaccuracy when adding up other multi-subscribers like
NewsGator; but wouldn’t change the ranking, I think.
Rah Rah Bloglines ·
Apparently the programmers of Bloglines’ robots are the only ones who have
taken the trouble to read RFC2616 and RFC3986 (especially Section 6 of the
latter); they handle all the redirects and variations correctly and only know
about two ways to subscribe to me, the right one, and twenty subscribers to
that ?foo=bar
anomaly.
Unclear On The Concept ·
There are a few people who’ve subscribed to URIs that don’t exist, or
aren’t feeds: /ongoing/
itself, my robots.txt
,
ongoing.pie
(snicker), random pictures;
that’d be part of your Internet Background Radiation.
Comment feed for ongoing:
From: Corey (Feb 17 2007, at 22:45)
Interesting info.
I track a lot of stats also and I'm trying to see if incoming traffic skews as highly towards Google as I see on my dinky little site, or if it is anomaly:
http://www.goldb.org/goldblog/2007/02/18/ClarifyingArchitecturalStylesForTheWeb.aspx
I'm convinced almost all tech/geek/code searches originate from them, so stats from highly technical sites (like your blog) would be of interest.
p.s. Just a curious geek, not an SEO
- Corey
[link]
From: Dion Almaer (Feb 17 2007, at 22:54)
The big question is what is considered a subscriber.
If I setup a bloglines (as an example) account years ago to ongoing but haven't checked it for a couple of years since I moved to something else, is it still counting me as a subscriber via its service?
If that is true then I should be a +1 in bloglines, google reader, newsgator, and others.
The right way to count is to see who is actually reading items. I don't get a good feel for who is doing this correctly.
Cheers,
Dion
[link]
From: Tim Bray (Feb 17 2007, at 23:23)
Ouch, Dion is right of course, I would appear in several of those lists too. I suppose that unless the subscriber numbers from the crawlers are being pruned for inactivity (somehow I doubt it), they're all high. Having said all that, I suspect that the big-picture takeway is accurate enough to be useful: Right at the moment there are three big dogs in this business, and some interesting up-and-comers.
[link]
From: Janne (Feb 18 2007, at 00:48)
I subscribe to the atom feed via Sage (http://sage.mozdev.org/), a Firefox extension, but that does not seem to be counted. Makes you wonder how many other feed readers are missed.
[link]
From: Sander (Feb 18 2007, at 03:44)
In my observation, Bloglines has a slightly larger issue than counting people who're not actively reading anymore: it seems to never decrease the subscriber count even if people actively unsubscribe from a feed. So the bloglines count appears to be the count of all those who ever were subscribed.
I'm saying this kinda carefully as I'm basing that on a very tiny subscriber base, and really only one observation back in late 2005 - I wrote a post about that here: http://weblog.juima.org/showpost.asp?postid=3476 ). It would be interesting if someone with a larger subscriber base could plot a graph of reported count over the last couple of years (for all readers which report these numbers). Over time, you'd expect at least a few minor drops in the count, but if what I saw holds, then the line would only ever go up.
[link]
From: David Smith (Feb 18 2007, at 05:51)
Thanks for the note, Janne - I've been thinking that I'm the only Sage user out here in userland. Are we missing something?
[link]
From: wka (Feb 18 2007, at 07:11)
Note that the NewsGator number includes NetNewsWire and FeedDemon users who use NewsGator's feed sync service.
[link]
From: Jordan Christensen (Feb 18 2007, at 08:21)
Tim are you sure you meant RFC2986? I'm not sure what "PKCS #10: Certification Request Syntax Specification" has to do with feeds and redirects, and what section 6 (which lists the authors addresses) has to do with it either.
[link]
From: Randy Charles Morin (Feb 18 2007, at 08:36)
Another issue is that you are only counting aggregators that poll on behalf of multiple users. A native client, like GreatNews, polls either via Bloglines feed cache or directly. If you used FeedBurner or similar, then you could also capture the native client data.
[link]
From: Antone Roundy (Feb 19 2007, at 08:52)
I wrote about this issue back in June 2005 (http://antone.geckotribe.com/alpha-gecko/2005/06/16/improving-rssatom-metrics-by-proxy-reporting/). In summary, it would be useful to have one or more standardized HTTP headers to enable proxies to report how much activity they're handling on the publisher's behalf. Expanding on what I wrote then, perhaps something along these lines would do:
Proxy-Fetch-Count: 1000
Proxy-Active-Subscribers: 100
Proxy-Total-Subscribers: 300
The first would indicate how many times the resource was accessed out of a caching proxy's cache since the last time the cache was refreshed. The second and third would be specific to proxies that handle subscriptions (eg. online feed readers). The second would indicate the number of subscribers who had accessed the feed out of the proxy's cache since the last time the cache was refreshed. The third would be the total number of clients currently subscribed to the resource, even if they hadn't accessed it recently.
[link]
From: Johann Thomas (Feb 21 2007, at 09:51)
I have Sage.
Today I found http://inforss.mozdev.org/ see also under https://addons.mozilla.org/firefox/361/
It's kind of full blown.
[link]
From: Antoine Imbert (Feb 26 2007, at 07:41)
I also wrote my own little perl script to estimate the number of subscribers to my blog. The problem is, how to maintain an up-to-date and mostly complete list of user-agents. I've got a YAML file with about 30 aggregators and their associated regexps, but I think, I'm far to get a reliable count.
Is there somewhere a user-agent database that is publicly available and accessible through an API ?
[link]
From: Gordon Weakliem (Feb 28 2007, at 16:37)
Tim, that feed-id thing is key. What these services are doing is substituting another ID (possibly a primary key in a database) for a URL, which on its own is a perfectly good unique identifier. Services should be able to normalize that out, but you run into an additional problem when a service encounters a permanent redirect on a feed, where it redirects to another feed that the system already knows about. In that case, you either allow the duplicate records to exist, or resubscribe all your users to the new feed. If you're exposing those feeds as resources that use the internal ID, now you're faced with reconciling those ID changes to external clients. So the easier solution is to allow duplication. Of course, this doesn't include the possibility that there simply may be bugs in the server application allowing the duplicates in.
My takeaway is that in cases where you've already got a perfectly good ID, don't go creating another one. You've exchanged a little bit of convenience for an architectural headache.
[link]