I read, via
Don Box,
Jan Gray’s monumental piece about
performance
of managed code in .NET.
If you care about performance in general it’s a good read.
This provoked a lot of thought and I’ll write more, but also suggested a
specific coding technique
for making loops faster; I tried it out and it failed the first rough-cut
test, but suggests an improvement for future language designs.
(Warning: in-the-trenches geeky.)
(Update: on iterators and dynamic languages and Java and C#, with more benchmarks.)
(Update: massively-erudite write-up from Erik van Konijnenburg.)
Update · Boy, this one touched a nerve, setting (I think) an all-time record for incoming email. Write-up tacked on at the end, but here is a massive, fascinating deep-dive into the subject from Erik van Konijnenburg, who detours into FORTRAN-land, where they really care about their loops. I haven’t had time to give it the attention it obviously deserves.
Array Dereferencing · Gray does some measurements and discovers that array references are a little more expensive than you’d like in .NET because it does bounds-checking on each one; the same is doubtless true in Java. This is a little bit irritating, especially if you believe as I do that arrays of native integers are the ideal in-memory data structure; no others come close in terms of conceptual simplicity and low impedence mismatch with the CPU’s view of the world.
In fact, consider the single most common idiom for looping through the elements of an array:
for (int i = 0; i < a.length; i++)
doWhateverTo(a[i]);
I wonder, since the virtual machine is bounds-checking each reference to
a[i]
, why it is that I have to redundantly check i
against a.length
every time around the loop?
So, since I had insomnia anyhow, I wrote a little teeny benchmark to investigate this a bit. Here's the core:
int frob(int in)
{
return (in % 1234) & 0xff;
}
int sum1(int [] a)
{
int sum = 0;
for (int i = 0; i < a.length; i++)
sum += frob(a[i]);
return sum;
}
int sum2(int [] a)
{
int sum = 0;
int i = 0;
try
{
while (true)
sum += frob(a[i++]);
}
catch (ArrayIndexOutOfBoundsException e)
{
if (i >= a.length)
return sum;
throw e;
}
}
The trick is that the sum2()
loop is simpler but the VM has to
build, throw, and
catch an exception to get out of it.
So you’d expect that if this is a good idea at all, it would do better on
big rather than small arrays.
I ran this against a million array elements in five different ways,
summarized in the table below (the numbers are seconds).
The results are partially as you’d expect in that the cost of the
exception-catching method falls inversely as the size of the array.
Runs | Array Size | sum1() | sum2() |
---|---|---|---|
10000 | 100 | 0.045 | 2.063 |
1000 | 1000 | 0.043 | 0.206 |
100 | 10000 | 0.035 | 0.068 |
10 | 100000 | 0.051 | 0.048 |
1 | 1000000 | 0.047 | 0.048 |
However, the exception-catching never really gets to be clearly better than the old-fashioned technique. So this is probably not a coding breakthrough (although it might be on some other JVMs, and I’d be interested to see what happened if someone recompiled that into C#, the source including mainline is here). However, I do wonder why designers of modern languages don’t provide an idiom for looping along the lines of:
for (int i indexes a)
doWhateverTo(a[i]);
Update June 12 · I got lots of email on this one. They told me some things I already knew, and some useful things I didn’t know.
Research ·
Piotr Kaminski writes to point at
research
by Michael Zastre.
Stan Dyck relates the story of someone who claims running the loop backward
from a.length
to 0
would be faster.
Dynamic Languages · Every dynamic language on the planet has this kind of facility, of course. (Thanks to Andrew Sidwell and Adam Turoff) Some perl idioms look like this:
foreach $i (0 .. #$array)
foreach $i (@array)
Some python stuff looks like this (thanks to Fazal Majid and Simon Willison):
for x in a:
doWhateverTo(x)
And:
for line in open('/etc/passwd'):
print line.split(':')[0]
Matt Brubeck points out you can do this with Javascript too.
Look, I knew this, OK? I don’t to hurt the feelings over in dynamic-language land, but if I’m obsessing about squeezing the last few cycles out of traversing an array, I ain’t programming in Perl or Python or any of their ilk.
Java and C# · Java 1.5 is going to have an iterator that looks like this (thanks to Piotr Kaminski and Elliot Hughes):
int[] xs;
for (x : xs) {
f(x);
}
C# has this already built-in (thanks to Ben Meadowcroft and Jim Ancona):
foreach (int i in integerArray)
doWhateverTo(i);
Jim Ancona went so far as to translate my benchmark into C# and run it once with the optimizer off:
Runs | Array Size | sum1() | sum2() | foreach |
---|---|---|---|---|
10000 | 100 | 0.070 | 0.491 | 0.060 |
1000 | 1000 | 0.100 | 0.090 | 0.040 |
100 | 10000 | 0.060 | 0.091 | 0.050 |
10 | 100000 | 0.080 | 0.090 | 0.080 |
1 | 1000000 | 0.080 | 0.080 | 0.090 |
And once with it on:
Runs | Array Size | sum1() | sum2() | foreach |
---|---|---|---|---|
10000 | 100 | 0.050 | 0.441 | 0.050 |
1000 | 1000 | 0.060 | 0.090 | 0.051 |
100 | 10000 | 0.060 | 0.020 | 0.030 |
10 | 100000 | 0.090 | 0.010 | 0.090 |
1 | 1000000 | 0.050 | 0.060 | 0.050 |
Jim writes:
The non-optimized results seem to pretty much parallel your Java results, except the optimizer seems to be able to do something with your exception-throwing approach, and the CLR exception-throwing overhead seems generally lower. The 'foreach' approach pretty much tracks the normal 'for' loop, so maybe they aren't turning off bounds checking.
Note that neither Java nor C#, unlike what I suggested, straightforwardly give you access to the index into the array, so would be less useful for dealing with multiple parallel arrays.
Anyhow.