[This is part of the Wide Finder 2 series.] I made a couple of little changes to the Strawman Benchmark and let ’er rip on the big 45Gig data set. The results were miserable, but instructive.
Just How Miserable? · 29:31:30 elapsed, 29:30:52 CPU, i.e. it maxed one of the T2000’s 32 CPUs. That’s one day plus five hours and change. Ouch!
Small lessons · First of all, it was just using Ruby-by-default-out-of-the-box. As Haik pointed out, there’s a SPARC-optimized version available from Sun and I’ll install that.
Second, the code computes the various top-10 lists (hits, bytes, clients)
by storing the numbers in a hash and running Enumerable#sort_by
.
It says right in the docs for that function that it’s inefficient. Still, it
wasn’t too bad for sorting hashes keyed by ongoing
resources, because there are only a few thousand of those.
But when it sorted the hash keyed by the hostnames that had fetched resources,
well... it wasn’t pretty. The sort_by
call took
hours.
It was made worse by a programming error on my part; several people pointed out that one couldn’t diff the program output because if several of the top entries had the same numeric value, they could show up in arbitrary order. I simply changed the code to sort on the value, then the key. Of course, I stupidly didn’t take out the old sort, so it ended up sorting twice.
Anyhow, I’ve changed the code to do a single O(N)
scan
through the big hashes, rather than sort them.
Big Lessons · The naive Ruby code is making terrible use of the machine. Over the course of the run, it uses, on average, 3.2% of the available CPU resources, and 3.8% of the available memory.
If you look at the
Bonnie
results, this sucker can do simple block I/O at about 150M/second at a
cost of about 75% of one CPU. So you ought to be able to plow through the
45G dataset in under five minutes, leaving 31 CPUs to do the actual work, even
without fancy mmap()
tricks. I bet somebody will write a WF2
implementation that runs in about five minutes as opposed to 29 hours.
Next Steps · I’m going to re-run the cleaned-up Ruby code with the SPARC-optimized MRI and propose that as the baseline benchmark for WF2 implementations to beat. Then the fun should start.
Comment feed for ongoing:
From: Justin Sheehy (May 31 2008, at 16:14)
I just fired off a run to get my first pathetic results. It's also an intentionally naive implementation -- exactly the same code I showed you before. You can look for PID 9224 to see if it's still running.
On the first run, I'm not doing it inside a profiler or anything. I'll add a comment with the results when I get a chance.
[link]
From: Yuri Schimke (May 31 2008, at 22:57)
I'm itching to get access to this box.
I've managed to get this Groovy script
http://fisheye.codehaus.org/browse/kolja/trunk/kolja/kolja-widefinder/src/test/script/wfii.groovy?r=188
working with concurrent threads and also via GridGain. Have you got another machine? :)
The approach is to do the heavy lifting (parsing) with Java and leave customised reports to the scripting language.
[link]
From: Alex Morega (Jun 01 2008, at 12:21)
I've been working on a Python implementation, some preliminary (semi-pathetic) results here: http://grep.ro/blog/2008/05/wide_finder_going_parallel
[link]