This is the thirteenth progress report from the Wide Finder Project. It’s just scratchpad to catalogue all the problems I’ve had getting contributed code to work. Probably not of general interest, but an essential part of a complete write-up.
Java · [2007/11/19]: Actually, disregard most of the following; there’s something horribly wrong with this computer. Whenever I spin up a big Java job of any sort, everything locks up, the computer won’t even ping. Working on it...
CB’s Java code causes the computer to just lock up while it goes away and grinds for minute after minute. But only on the big file, not smaller test files. WTF?! It bothers me not to have Java in this study, so I may have to write something myself.
Now I have another Java attempt, but it produces wildly incorrect results. I’ve pinged the author, but I’m definitely seeing some Java coding in my future.
Lisp ·
I couldn’t get Irate Nate’s
Lisp
code to run; SBCL ran into some problem installing cl-pcre
,
with completely incomprehensible error messages.
C++ · I can’t get Bob Miller’s C++ code to compile. Any experts out there with a clue? I’m reminded why it is that I’ve always hated C++.
Haskell · GHC Haskell won’t build on Solaris:
sca12-3200a-40 ~/n2/ghc-6.8.1/> ./configure
checking build system type... sparc-sun-solaris2.10
checking host system type... sparc-sun-solaris2.10
checking target system type... sparc-sun-solaris2.10
Canonicalised to: sparc-sun-solaris2
checking for ghc... no
checking for path to top of build tree... ./configure: line 2173: -v0: command not found
./configure: line 2177: utils/pwd/pwd: No such file or directory
configure: error: cannot determine current directory
Scala · I still don’t have a Scala version that takes the form of a .jar or something I can actually run.
OK, as of Nov. 19th I do, but Martin Probst’s .jar
file seems
to be sliding into CLASSPATH hell, can’t seem to see the
scala-library
classes.
PHP · I’d like to run Harry Fuecks’ PHP, but this computer is sitting behind a firewall proxy’d to the max, and getting Apache and PHP and the necessary connective tissue all installed and running is just a major pain in the butt. I’ll try to get to it though.
Erlang · Steve Vinoski’s last effort silently locks up, grinds away without making any progress. This is really unfair to Steve, since essentially all the really good entries are basing their code on ideas he was the first to cook up.
Comment feed for ongoing:
From: Brett (Nov 09 2007, at 11:59)
PHP can be compiled as a plain console app, (no Apache, no mod_php) iirc.
[link]
From: Bob Monsour (Nov 09 2007, at 13:29)
For OS X PHP setup, you might want to consider the MAMP sandbox at http://sourceforge.net/projects/mamp
According to the site, "MAMP is a very easy to install compilation of Apache, PHP and MySQL for OSX. Everything will be installed in one folder. If you want to get rid of it, just move the folder into trash. An easy to use cocoa program to start and stop the servers is included."
Regards,
-Bob
[link]
From: Hub (Nov 09 2007, at 14:12)
hard to help on the C++ if you don't post the error message you get.
[link]
From: Reinier Zwitserloot (Nov 09 2007, at 15:24)
I talked to a Sun dude last week at the web 2.0 expo, and so far Ian Murdock (for those who don't know, that's the "ian" in debian, and he's been working for sun for the past couple of months)'s Project Indiana is on track and should effectively be an apt-get knockoff. Same tool, properly maintained repository.
Personally I can't wait. solaris has lots of cool stuff but I maintain a solaris server and 2 debian servers and it's a pain in the ass to install anything on the solaris box, whereas on the debian box everything just works.
[link]
From: Carey (Nov 09 2007, at 18:57)
To compile the C++ on Debian Etch, I had to install libpcre3-dev, then use this command:
g++ -O2 -o wfk wfk.c++ -lpcrecpp -lpthread
[link]
From: Bob Miller (Nov 09 2007, at 19:46)
I knew this code was going to have portability issues.
Tim, I suggest you (a) use gcc 4.x, (b) start with wfk-2.c++, which should be a little more portable, (c) give me access to a Solaris machine w/ compiler so I can clean up my own mess. (-:
I've been working on making it more portable (and a little faster), but until I get onto a SPARC/Solaris box, I can't guarantee success.
Find the latest sources and compilation tips here.
http://www.lug.corvallis.or.us/drupal/node/103
[link]
From: Russ Weeks (Nov 09 2007, at 21:39)
I can't believe the shortage of Java implementations... I figured you'd be swamped. A first stab at it is here: http://linuxmedianetwork.blogspot.com/2007/11/widefinderjava6.html
There's definitely some room for improvement... I was disappointed to see the T2K getting beaten out by an aging 4-CPU Xeon. Gonna have to run it through DTrace and see where the hot spots are.
Java 1.6 is required.
[link]
From: Michel S. (Nov 11 2007, at 00:02)
Scala should be straightforward to install on any system that has a recent version of Java, I should think -- just untar the Unix tarball somewhere and run it from that directory. sbaz works like a charm for keeping it up to date too.
[link]
From: Bob Miller (Nov 11 2007, at 01:51)
I've written wfk-3. My 900 MB benchmark now runs in 0.56 seconds on a generic Linux PC. I've been working hard to eliminate lock contention, so it should parallelize even better on the T2.
Tim, you have my email address. Please drop me a line if you can't get wfk-3 to compile.
Meanwhile, find the latest source file and a writeup here.
Wide Finder Three
http://www.lug.corvallis.or.us/drupal/node/104
Oh, and one more request. You said you'll make your "official" 900 MB test file available. Can I please get a copy? Don't worry, I won't redistribute it.
Thanks.
Bob
[link]
From: Penny (Nov 12 2007, at 13:00)
Haskell: Are you trying to compile ghc from source? That won't work unless you already have a version of ghc (to bootstrap). I think that's what the config output is complaining about: checking for ghc ... no). There is a binary distribution of ghc 6.8.1 for Solaris.
[link]
From: Russ (Nov 13 2007, at 18:17)
I've updated my Java implementation a little bit and posted a revision here: http://gregchan.com/wf/wf_02.tar.gz
Description of the revision and some DTrace timings are available here: http://linuxmedianetwork.blogspot.com/2007/11/widefinderjava6-rev2.html
[link]
From: Nik Clayton (Nov 22 2007, at 00:09)
Have you considered building the interpreters with different compilers, and/or different compilation optimisation options, and adding that to the mix?
When I was comparing gcc with cc last year I saw a marked difference in the speed of the Perl's built by the two compilers.
The speeds of the binary, by the compiler and options that built them, (tested by running the SpamAssassin tests) was:
* gcc, -O2 (fastest)
* cc
* gcc, -O
* cc, -fast (slowest)
(yes, a binary built by 'cc' was faster than one built with 'cc -fast' -- this was on an Ultra 40)
Looking at the speed of the Perl implementations so far I doubt that it would make much difference given the size of the dataset you're using, but it might have a difference with larger datasets.
My notes on methodology and results are at http://jc.ngo.org.uk/blog/2006/09/06/day-59-of-60-developer-benchmarks-pt-4/. My tests were single threaded, so you might not see the same benefits. Or maybe the Sun compiler will generate dramatically better threaded code. It would be interesting to find out.
[link]
From: Erik Engbrecht (Nov 25 2007, at 15:53)
Complete Scala version in a jar here:
http://mysite.verizon.net/vze2rswi/scala-widefinder.jar
You should just be able to run it as:
java -server -jar scala-widefinder.jar /path/to/file
All the Scala library is in there. So is the source for my solution. It needs 1.5 or higher. I tested in under 1.6.
Discussion of a Scala version here:
http://erikengbrecht.blogspot.com/2007/11/adventures-in-widefinding.html
[link]