I spent the day in meetings with a maker of storage technology; it seems quite possible that Visual Net will find a handy application with these folks. My mind is just now unboggling, because these guys deal with disk subsystems measured in tens of terabytes. One customer, they said, is managing three petabytes up and down the East Coast. At this point in the meeting, I got real quiet for a while while a hamster in the back of my head got stuck on the treadmill of all those zeroes. Um er 3,000,000,000,000,000 bytes. We seriously need some perspective.
That's 3,000 terabytes; 3 million gig; 3 billion meg. No; I'm not getting any perspective.
There are about 6 meg in an electronic Bible; say half a billion bibles - a resonant phrase to be sure, but no perspective.
The online text of the Oxford English Dictionary, Second Edition is about half a gig. So we've got room for a paltry 6 million OED 2es. Nope... that perspective thing isn't happening. Let's try a different direction.
Let me see, I have here a chunk of Web server log file. It's 2,327,228 bytes in size and contains 11,573 lines; so just a touch over 200 bytes/record.
Suppose I'm running a million-hits-a-day web site; there are lots of those out there. When someone says "a million hits", if they're being honest they're talking a million pages, I seem to recall a number somewhere suggesting that the average web page has more or less five callouts to images and stylesheets and things; ongoing, which is as about is as minimal as you can get, has callouts to two images on every page (the rug at the top and the Antarctica logo), so I think five is really not stretching it; so a typical web-page fetch is going to write 5 logfile records, or about a K. So your million pages of Web traffic will write a gigabyte of logfile; I've run lots of Web servers and think these estimates are low, there are more logs than just the Webserver's getting written.
By the way, on a well-run Linux system, the web server log files roll over and get compressed once a day or so; they compress beautifully. But, on a well-run Web server, the management is obsessive about doing all sorts of data mining and analysis on those log files, so a lot of them are being uncompressed as script fodder, or being loaded into great bing honking Oracle or DB2 stores, which have horrible storage overhead.
So let's round way up and say that you get a handful of terabytes per year in log files; so the 3 petabytes at the top of the column might only last you a century or so. Er, perspective?
Let's take a run from another direction. In iMovie here on my Mac, an edited-down 10-minute video project with all the discarded footage gone costs 2.5gig. If I were a pro, I'd be keeping all the footage around, and in my experience people frequently shoot hours of tape to get seconds of production video. So I'm going to go wild and throw a factor-of-20 multiplier in here, let's say 50gig for 10 professional minutes, and 500gig for a feature, hell we're in rounding-up hog heaven, so let's say a terabyte. That poor customer mentioned above only has disk space room for a thousand or two feature film productions.
OK, I think I know what's actually going on here; if you're a bank with a couple million reasonably-active customers, or a manufacturer with a few tens of thousands of products and several thousand customers and a big SAP installation running across an Oracle plantation, I don't even have to use the back of a very big envelope to convince myself that you can eat terabytes for breakfast.
And I think we've all heard the folklore about some of NASA's terabyte/day satellite feeds.
And it's reasonable to assume that the NSA, and quite likely some of its competitors both in the US and abroad, capture a substantial volume of all the world's email traffic every day.
So maybe it's not as crazy as it sounds at first blush.
But Momma, them are some big damn disk drives.
I wonder how they back them up?