The ongoing logfiles flip over early Sunday mornings, and sometimes I run some basic stats over them. This last Sunday they said that a total of 995,213 pages have been read, so there is a chance that if you’re reading this on the 29th or 30th of September, you will get the millionth page. Thanks to all; herewith a couple more statistics and some discussion of them.
But before the stats, I wanted to re-iterate that thank-you to everyone who takes the time to read this, I really mean it, ongoing has filled a hole in my life that I previously didn’t know existed. I can’t imagine not doing this.
How Do You Count? · Anyone who publishes anything wants to know who’s reading it. On the Web, it’s hard to figure out the right questions, and then it’s hard to figure out the answers. So when I say “a million pages read,” what do I really mean? Well, for the unix-literate, here’s an exact characterization of what I mean:
zcat *.log.gz | \
egrep '"GET /ongoing/.* 200 ' | \
awk ' {print $7}' | \
egrep -v '\.' | \
wc -l
For the rest, an approximate English description would be “everything that
was fetched successfully whose URI began with /ongoing/
and
which didn’t contain a dot.”
Excluding the dot excludes all graphics as well as the RSS feed and the CSS
stylesheet.
So it really is a pretty decent approximation of of the number of times
someone looked at a page.
It’s not perfect: it overestimates because some proportion of that million or so fetches were by Google, Inktomi, and many less-skilful robots and crawlers and so on. On the other hand, it underestimates because it excludes all the fetches of the full-size versions of the images, and all fetches of the source-code snippets and so on that I’ve posted. Also, it leaves out all the single-paragraph postings that are contained entirely in the RSS feed and are read that way. I’m willing to bet that the two errors kind of cancel each other out, and say that about a million stories have been read.
In that same time-span, my RSS feed has been fetched 1,856,905 times.
How Many Different People? · Resources at ongoing have been accessed from 228,855 different IP addresses. The RSS feed has been fetched from 49,703; 21,836 since August first.
Everyone knows that IPs are a lousy way to count people; it estimates high because people move around: I have one address at home, another at work, and have showed up from any number of hotel rooms and conferences. On the other hand, everyone at AOL has one IP address, as does everyone at Microsoft. My gut tells me that the number of unique IP addresses overcounts the number of unique people, maybe by a factor of two? But we shouldn’t have to rely on my gut, since there are people out there who count subscribers properly with cookies and so on, and would have a good feel for what the real ratio is. Anyhow, I’d be surprised if I had less than five thousand subscribers or more than fifteen thousand.
The Hit Parade · Q: What do people like reading? A: You’re a bunch of hopeless geeks, but that’s OK, so am I. I live in hope that one of my notes about nature or politics or music gets noticed outside the coterie of markup-slingin’ webheads who apparently are my natural audience.
Fetches | Essay |
---|---|
153116 | ongoing |
83816 | XML Is Too Hard For Programmers |
44539 | Why XML Doesn’t Suck |
30650 | The Web’s the Place |
17152 | The Door Is Ajar |
14601 | I Like Pie |
10133 | Truth |
9232 | Language Fermentation |
8649 | What This Is |
7715 | Author |
7402 | Technology |
7106 | What · Technology · XML |
6941 | iYear |
6286 | iTunes Music Store and the WWW |
5833 | Business |
5762 | On the Goodness of Unicode |
5739 | Colophon |
5474 | What |
5454 | The RDF.net Challenge |
5049 | When |
Pix · More geekery; the only full-size pictures that people look at are screen grabs and pictures of Macintoshes. The top three non-tech pictures that people actually looked at were the panoramic second shot in the write-up on my Canon S50 (330 views), the close-up of Byron’s Troy at the end of the Slim Book of Verse photo-essay (307 views), and of course the Bit Bucket (298 views). The lesson for me is obvious; the way I present the pictures on the page is the way they’re gonna get seen, so maybe the current approach of crushing them all down to 300 bytes wide is sub-optimal.