What
 · Technology
 · · Storage

Remembering Bonnie · The murderer I emailed with is still in prison. And the software that got him pissed off at me still runs, so I ran it. Now here I am to pass on the history and then go all geeky. Here’s the tell: If you don’t know what a “filesystem” is (that’s perfectly OK, few reasonable adults need to) you might want to stay for the murderer story then step off the train ...
 
Time Machine Completed the Backup · Recently, I acquired a Synology DiskStation and wired up a nice comforting Time Machine-to-Synology-to-S3-to-Glacier backup data flow. But then I started to see “Time Machine couldn’t complete the backup” with something about “could not be accessed (error 21)”. Here’s how it got fixed ...
[5 comments]  
Network Storage · A couple days ago in New Home Network I posted a request for advice on a home NAS box and networking hardware. Now I have the storage box, and boy was it ever easy and straightforward and anxiety-relieving. If you haven’t done this already, you might want to ...
[11 comments]  
New Home Network · Holiday project: Redesign the domestic infastructure. Looking for: Network and storage gear. Got any advice? ...
[22 comments]  
Maxed Book · My Google-issue Mac is pretty nice, but I decided to improve it by swapping obsolete optical storage for not-obsolete-yet spinning rust. With benchmarks for the disk geeks in the crowd ...
[12 comments]  
Database Helper · I’ve been in a lot of Cloud-flavored discussions recently about what kind of Platform-as-a-Service offerings might hit sweet spots. On several occasions, People Who Should Know have said things like “A huge proportion of apps, even really big apps, can coast along just fine on a single MySQL instance with help from memcached.” Some numbers crossed my radar today that would tend to support that theory; and they’re sort of astounding ...
[4 comments]  
Tab Sweep — Tech · Herewith gleanings from a circle of browser tabs facing inward at the world of technology. Some are weeks and weeks old: Amber Road, Clojure, tail recursion, cloudzones, deep packet inspection, and key/value microbenchmarking ...
[5 comments]  
2008 Disk Performance · I did some research on storage-system performance for my QConf San Francisco preso and have numbers for your delectation (pictures too). Now is a good time to do this research because the hardware spectrum is shifting, and there are a bunch of things you might mean by the word “disk”. I managed to test some of the very hottest and latest stuff ...
[10 comments]  
Transparent Storage · In preparation for that Disk Performance piece and accompanying keynote last week, I spent quite a bit of time with the new Unified Storage “Fishworks” Analytics Software, which is fascinating stuff. Herewith an illustrated report ...
[2 comments]  
2008 Storage Hierarchy · We call them “computers”, but the software and hardware are overwhelmingly concerned with storing and retrieving data. Yesterday’s Disk Performance research fit into a larger context in the QConf presentation; a survey of all the levels of storage that make up a modern system ...
[8 comments]  
Storage 7000 · This is certainly our biggest announcement of the year so far; just possibly the biggest since I showed up here in 2004. The official name is the “Sun Storage 7000” and there are three systems in the line-up. As usual, the real actual technology news is in the blogs; the hub is at the Storage News blog, but I’d start with the co-conspirators: Bryan Cantrill’s Fishworks: Now it can be told and Mike Shapiro’s Introducing the Sun Storage 7000 Series. I have some opinions too ...
[4 comments]  
Memories · I’ve got this new Mac Pro, and the 2G it came with just isn’t going to do the trick. Last week, both Lauren and I were in the Valley, at different Sun meetings. So one lunchtime, we snuck away to geek-shop. I picked up 4G of high-performance RAM at S.A. Technologies, a little memory specialist that I totally recommend, their prices are pretty hard to beat. It cost about $360 including tax. On the way back, we stopped at a big tech emporium for some other odds and ends, and at the checkout they were advertising high-capacity USB disks for not much; Lauren picked up 8G for $29.99. That’s quite a pricing spread.
[8 comments]  
Online Data · That S3 outage sure concentrated people’s minds. And almost simultaneously, EMC announces that they’re getting into cloud storage. It’s obvious to me that we’re nowhere near having worked out the economics and safety and performance issues around where to put your data. There are some areas of clarity; geek über-photog James Duncan Davidson, in The Economics of Online Backup, shows that for a person with a ton of personal data, the online option is really unattractive. And you do hear sotto voce rumbles about going online in the geek hallways, for example “Amazon web services: 3x the price, 0.5x the reliability, and lower scalability than DYI. Buy only for the low capex and lead time.” That’s from Stanislav Shalunov, who by the way is a damn fine Twitterer. The big questions remain open.
[3 comments]  
Them Bits · Earlier this evening, I finished scanning the slides I have that my Dad took. That’s a lot of slides and a lot of bits. With observations about Wal-Mart and Ubuntu and the end of optical storage ...
[10 comments]  
Slow Bonnie · I’ve been noticing that it takes longer and longer to get a meaningful Bonnie run. To make sure you’ve busted the filesystem caching and are actually doing I/O, you need to use a test file two or three times the size of system memory. Which can easily get into a couple of hundred gigs on a serious server these days. And while I/O has been getting faster, it still takes a while to process that much data; and Bonnie does it five times. So, the ratio that governs Bonnie testing time is something like memory-size over I/O-performance. Thus we observe that, proportionately, memory size has grown faster than I/O speed. Thus, memcached and friends.
[1 comment]  
Seeking Basement Disk · Dear LazyWeb: we’re looking for a great big honkin’ storage server to sit on the home network and be a backup pool for the motley crew of computers around the house: Mac, Solaris, Ubuntu, & Windows. Simon Phipps has a Buffalo TeraStation and is very happy with it. On the other hand, there’s a wiki which suggests it’s kind of loud, and it’s going to be hard for us to get it behind closed doors. Might the Net have a suggestion?
[38 comments]  
Postmodern Litigation · Well, it’s all over the news; we and NetApp are in court. Blecch. There is one interesting side-note in this dreary story, a first I suspect: NetApp’s CEO provided color commentary on his blog (no linkage from me to bloggers who are suing us). And then later on today, on our official PR blog, appears Sun response to NetApp lawsuit which says, more or less, “In yo face”. Now, I guess, it’s over to the lawyers. [Update: As of now, I’m rejecting all comments on this one. There were a pile in the in-basket this morning, and a couple were entirely inappropriate in a matter involving litigation, and I suddenly became uncomfortable trying to make judgment calls. So, sorry, but let’s just leave this.]
[Update: I think Bryan Cantrill’s DTrace on ONTAP? deserves a link, since Bryan was one of the guys who built the technology that’s now in play in court.]

[1 comment]  
ORM Bien Phu · I thought the laugh line “Object-Relational Mapping is the Vietnam of Computer Science” was ancient, but Ted Neward claims that he made it up in 2004. Ted has written an immense, detailed, essay on the subject, The Vietnam of Computer Science, which, just to be thorough, includes a capsule history of the Vietnam conflict. This ought to be required reading for all Computer Science undergrads, so they’ll at least be forewarned before they stumble into their own private Southeast Asia. Bonus: in the comments, the first commenter asks “If ORM = Vietnam, does SOA = Iraq?”
 
No Database!? · Recently, in discussion of a design for a comments system, I noted that I wasn’t planning to use a database, and I even allowed my self a little fun sneering at the idea. I got several reasonable-sounding emails from reasonable-sounding people saying “Why on earth wouldn’t you?” Here’s why ...
 
The Databox · After I reported on the Thumper announcement yesterday, Simon Phipps wrote: I want one. I kind of snickered, thinking “Simon, get real, that sucker weights 77kg and probably sounds like a 747.” But last night, coincidentally, I ran a backup, which provoked thought, and you know, I think Simon’s right, I think there’s a huge opening for a consumer product in this space. [Update: Hah! Bill Pierce specs out a Databox, it’ll cost you $2,312.33; dig it!] ...
 
WinFS · Wow, it’s dead. You have to be sad when anything goes south that so many people have worked on so hard for so long. Still, I remember being told in the early Nineties, when I was talking up Unix servers, that I was silly and wrong because the Cairo object filesystem would make everything else irrelevant. And then years later, when I was selling search and content management for a living, being told once again that we’d all be casualties of the WinFS bandwagon. I wonder if, in other professions as in ours, the conventional wisdom is so often so wrong? [Update: Lots of thoughtful coverage: The OS Review, Developing on the Edge, The Fishbowl, Dare Obasanjo, Simon Phipps.]
 
The RAID in the Mirror · If you have lots of data to store and are figuring out how to lay out your disks, check out Roch Bourbonnais’ WHEN TO (AND NOT TO) USE RAID-Z. (Hey Roch; could you find a slightly less brutal way to format your blog?) For RAID & filesystem wonks only. It’s a lucid, quantitative explanation of the trade-offs between mirroring, striping, and RAID-ing. Some of the narrative is ZFS-specific, but I suspect that the lessons are pretty general. Out there in the real world of production applications, you’d be surprised how often it is, when you’re waiting for a slow app, you’re waiting for the disk, not the CPU. This stuff matters.
 
Flat Files Rule · Yes, databases are useful. But there are a lot of good reasons not to use them: they’re a lot of work to administer and it’s very easy to make them run slow. Particularly when the alternative, ordinary flat files in an ordinary directory tree, is so incredibly useful. For more evidence, see Tim O’Reilly’s reportage on the subject, with inputs from Mark Fletcher (Bloglines) and Gabe Rivera (Memeorandum). Note that both of them are supplementing their flat files with memory-resident data stores; it’s a powerful combination. Now if Mark would only put some of that powerful machinery to fixing Bloglines’ broken Atom 1.0 handling...
 
JDiskReport · Hey, this is cool; it’s a little doo-hickey that draws pie charts and graphs of what you’ve got on your disk. I wonder on what set of hardware/OS combinations the web-start Just Works like it did on my Mac? The pie-charts of my life were so cool I had to publish a few. And I turned up a real problem, too ...
 
More ZFS Data · I see that Dana H. Myers has been digging away at ZFS performance using the only metric that really matters to the real geek: OS build performance. The numbers are interesting... I’m surprised that compression made so little difference, both source and object code compress quite well (I just ran a little test: the Emacs binary compressed to 18% of its size, a bunch of Java code to 19%.) Maybe the fact that it’s zillions of little files means that the file open/create overhead dominates the actual input/output time? There is no doubt there is a huge amount of work to be done on I/O performance, both understanding it and improving it. But ZFS is increasingly looking like a step forward.
 
Protecting Your Data · I was watching a mailing-list discussion of backup software, and how often you should back up, and based on some decades’ experience, found some of the thinking sloppy. Here are my life lessons on keeping your data safe while assuming that The Worst Will Happen. Some of it is Macintosh-specific, but there may be useful take-aways even from those parts, even for non-Mac-hacks ...
 
Filesystem Lessons · I had the idea that I’d chop up the disk on my Ultra 20 into a bunch of partitions and do some filesystem performance testing with UFS and ZFS and Ext3 and Reiser. This turned out to be a really bad idea, but I still got some interesting numbers ...
 
Bonnie Z · In case you hadn’t noticed, yesterday the much-announced ZFS finally shipped. There’s the now-typical flurry of blogging; the best place to start is with Bryan Cantrill’s round-up. I haven’t had time to break out Bonnie and ZFS myself, but I do have some raw data to report, from Dana Myers, who did some Bonnie runs on a great big honkin’ Dell [Surely you jest. -Ed.] server. The data is pretty interesting. [Update: Another run, with compression.] [And another, with bigger data. Very interesting.] ...
 
An Evening With Bonnie · Like almost everyone, I have a long list of things that I regret not having done, and mine includes writing a Unix filesystem. So instead, I measure ’em, with the help of my old friend Bonnie. I just spent some time addressing the question: “How much does FileVault slow down a Macintosh?” And turned up a couple other interesting results, too, including a fairly startling three-way OS X/Linux/Solaris comparison. [Update: Many readers write on the subject of Linux and hdparm(8).] ...
 
Bonnie 64 · Fifteen years ago I wrote a little filesystem benchmark called Bonnie. I hadn’t maintained it in years and there are a few interesting forks out there. Suddenly, by accident I found myself fiddling with the Bonnie code and I think I’m going to call the new version “Bonnie 64”. Herewith, for those who care about filesystem performance, the details ...
 
Moore Who? · Cyberspace is buzzin’ tonight over the release of the Reiser4 filesystem, which seems to be pretty hot stuff. I was looking at their benchmarks page and was charmed to see an appearance by Bonnie++, a direct descendent of the original Bonnie mentioned here just the other day. The benchmarks suggest that on a good computer with a modern filesystem, you can expect to get 130 or so random seeks/second in 1G of data, 105 in 3G. That’s not bad... in fact it’s three or four times faster than the best results I was able to get in 1990 (search for “asymptotically”). Check out the computers I ran that on, they’re museum pieces. Per Moore’s law, in fourteen years the CPUs ought to have sped up by a factor of 214/1.5=645 or so. Yep, one of them was a 4MHz 386, 4MHz×645=2.58GHz, damn that Moore is smart. I happen to remember that of the original computers I benchmarked, the biggest had 64M of memory. If you applied the same multiplier (645) to the memory, you’d get 4.1G, quite a reasonable figure for a big modern Unix box. I think the lesson is obvious: for high-performance applications, keep your data away from those filthy disks, no matter what filesystem, use memory.
 
author · Dad
colophon · rights
Random image, linked to its containing fragment

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!