The benchmark we’re grinding away at over in Wide Finder land is computing popularity stats using this weblog’s server logs from its birth through April 2008. The benchmark is more interesting than the results, but since I write this stuff, I find the results interesting too.
Popular Fragments · Ordinary ongoing pieces in the year/month/day directory tree.
218540 | nXML Oh My Updated: 2003/09/18 |
148630 | XML Is Too Hard For Programmers Updated: 2004/12/31 |
129746 | Debbie Does BitTorrent Updated: 2003/11/27 |
110427 | The Door Is Ajar Updated: 2003/07/17 |
102513 | On Search, the Series Updated: 2004/12/31 |
100932 | Characters vs. Bytes Updated: 2003/04/26 |
94408 | On Not Being a Gamer Updated: 2003/07/25 |
93060 | On the Goodness of Unicode Updated: 2003/04/06 |
92006 | Protecting Your Data Updated: 2006/02/01 |
89795 | Statistics Updated: 2007/02/11 |
Grains of Salt · Now that you’ve seen the first readout, there are a bunch of things that need to be said:
The numbers are too low because they don’t include things people read via my Atom feed, which these days is the (large) majority of traffic.
The numbers are too high because they include accesses by bots and crawlers and so on.
Virtually every piece here gets read a few times a year; the popular ones a few times a day, every day. So old stuff naturally has an advantage.
On the other hand, when I started blogging, I started with this huge backlog of hot stuff that I really wanted to write about, so the first year or so actually does have more really meaty, long-lived pieces of writing.
The numbers are approximate; the Wide-Finder implementors have discovered all sorts of actual hits that aren’t caught because the requesting software sent a weird-looking request that didn’t match my regular expression. Having said that, the error is a very small fractional percentage.
Big Data · This aggregates all the successful fetches from ongoing and adds up the bytes the server pumped out for each. The picture is quite different:
Gigabytes | Resource |
---|---|
871 | ongoing’s Atom 1.0 feed. |
374 | The little random picture on the right side of the front page. |
279 | ongoing’s (now discontinued) RSS 2.0 feed. |
91 | The old orange RSS logo, no longer used. |
63 | A picture entitled “Saskatchewan prairie grass on the skyline at sunset” from Grass—for a while it was the top Google Image search result for “grass”. |
40 | A monster QuickTime video from Java One Day Zero Podcast. |
37 | Video of Shonen Knife taking the stage, from Shonen Knife with The Juliet Dagger. |
36 | ongoing’s front page. |
33 | One of the header graphics, mossy-green. |
32 | A photo of an Aussie street sign from Oz Out. |
Referers · When people came to an ongoing page, where did they come from?
182996 | ongoing’s front page. |
158562 | Slashdot. |
62218 | Google Reader. |
40243 | Daring Fireball. |
30197 | Reddit programming. |
20463 | Reddit. |
15182 | Scripting News. |
13864 | tbray.org’s empty top level. |
13618 | A redirect to the front page from somewhere in Ask.com. |
9917 | MacSurfer. |
Pretty Pix · This wasn’t part of the Wide Finder work, but since I had the data, I asked myself which of the pictures I’ve run here are the most popular, as in they motivated people to click on them and see the full-size version.