Are there any computer programs that you wish were faster? Time was, you could solve that problem just by waiting; next year’s system would run them faster. No longer; Next year’s system will do more computing all right, but by giving you more CPUs, running at this year’s speed, to work with. So the only way to make your program faster is to work with more CPUs. Bad news: this is hard. Good news: we have some really promising technologies to help make it less hard. Bad news: none of them are mainstream. But I’m betting that will change.
The “Java Moment” · On my recent excursion to Japan, I had the chance for a few conversations with Bruce Tate. He advanced a line of thinking that I found compelling: Right now is the time when the concurrent-programming winners will emerge. He sees an analogy to Object-Orientation in the early nineties: Several O-O languages were in play (most notably C++ and Smalltalk), but it hadn’t penetrated the application-development mainstream. Then Java came along, and turned out to have just the right characteristics to push O-O into the middle of the road.
Thus the analogy. Right at the moment, we have a bunch of candidate technologies to fill in the concurrent-programming void; obvious examples include Erlang, Scala, Clojure, and Haskell. While there are common threads, they differ from each other in many essential ways. Between the lot of them, there are a whole lot of different characteristics. The fact is, we don’t know at this moment which laundry-list of features is going to turn some candidate into The Java Of Concurrency.
What I’m Up To · I think that right now is a good time to have a run at this problem. I’ve been scanning back and forth around the Internet, information-gathering, contrasting, and comparing. As I work through this, I’ll publish my research notes here on the blog as I go along.
Here’s what I have so far:
Java — The factors that led to Java’s success.
The Laundry List — Ingredients that might go into the winning formula that brings concurrent programming to the mainstream.
Crosstalk — High-quality feedback, yours truly accused of bias, and deltas to the laundry list.
My Take — Exposing my own experiences and bias.
C vs. P - “Concurrency” vs. “Parallelism”, and where does the distinction lead?
Messaging — First impressions of messaging in Clojure as opposed to Erlang.
References — Using Clojure refs.
Parallel I/O — More Clojure concurrency tools, with generally pleasing results.
More Clojure I/O — Refactoring and retuning, based mostly on your feedback. Kind of discouraging.
No Free Lunch — In which the compute costs of concurrency seem anomalously high.
Tab Sweep — Concurrency miscellany, including node.js, optimism, Joe Armstrong, and an Intel voice.
Idiomatic Clojure — Code from those who, unlike me, natively inhabit this space.
Eleven Theses on Clojure — Summing up my perceptions so far.
Tuning Concurrent Clojure — Making the code run faster (and slower).
Hard-Core Clojure — Alex Osborne bangs the pedal to the metal.
Who Should Care · The developers, actually; they have a direct incentive in that by doing concurrency well, their apps will run faster, and their employers do too in that the same performance level will require less iron.
In fact, I suspect the most likely candidates to get behind this are the chip builders; it’s traditionally been seen as their role to push the development tools that make their products shine. So I suspect that Intel, AMD, IBM, and the Sun part of Oracle are the most likely candidates to go all activist. In particular, the Sun SPARC processors have been leading the more-cores-and-damn-the-gigahertz charge for the last few years, so we ought to be the ones who care the most.
Linkography · I’ve started working on a Late-2009 Concurrency Linkography page over at Project Kenai; it seems something that fits more comfortably on a wiki than a blog. If there’s anyone else out there that wants to contribute, I’m open-minded. I’m leaning toward things that are contemporary rather than historic in value.
Non-Problem · I think the concurrency problem is pretty well solved for one class of application: the Web server and most of what runs in it. If you know what you’re doing, whether you’re working in an Apache module or Java EE or Rails or Django or PHP or .NET, given enough load you can usually arrange to saturate as many cores as you can get your hands on.
That’s a big chunk of the application space, but not all of it. And even in web apps, it’s not uncommon to have pure application code that needs to wrestle the concurrency dragon.
Assumption · I’m taking the following as an axiom: Exposing real pre-emptive threading with shared mutable data structures to application programmers is wrong. Once you get past Doug Lea, Brian Goetz, and a few people who write operating systems and database kernels for a living, it gets very hard to find humans who can actually reason about threads well enough to be usefully productive.
When I give talks about this stuff, I assert that threads are a recipe for deadlocks, race conditions, horrible non-reproducible bugs that take endless pain to find, and hard-to-diagnose performance problems. Nobody ever pushes back.
Comment feed for ongoing:
From: Terry Jones (@terrycojones) (Sep 30 2009, at 16:00)
Hi Tim
I guess this wont be a popular choice, but using Python and Twisted to build asynchronous event-driven systems is a nice choice. If you have C Python underneath, you'll need to use separate processes to take advantages of multiple cores - but that's OK (IMO) and I think there's a trend in that direction too (Chrome is one example, I think, and maybe MSIE 7 is too). Twisted's documentation could be better and in ways it seems impenetrably complex, but it's wonderful once you get your head more or less around it. That's too much to expect for a casual program, but if you're building a serious system the investment is worth the time (again IMO). I wish the FriendFeed folks had put in more effort on the Twisted front rather than writing their own async Python web server. Twisted offers so much more, and is apparently a little faster too (not that that matters much).
Regards as ever,
Terry
[link]
From: Chris Anderson (Sep 30 2009, at 16:02)
I think the Erlang story of being great for concurrency as a side effect of a focus on reliability and redundancy is very instructive. Concurrent programs are a lot easier to reason about, when you can keep them simple. Message passing is simple. Force programmers to make non-functional operations explicit, and then you can optimize whole languages to fit many different parallelism architectures.
In the "Coders At Work" book, Joe says he feels like Erlang is the Smalltalk of concurrency, and that someone will learn from it, and build the Java or the Ruby of concurrency.
[link]
From: roberthahn (Sep 30 2009, at 17:19)
I would be interested in hearing what you would have to say about libdispatch too; it strikes me as interesting that Apple chose to solve the problem in a completely different way (than building a new language for it).
[link]
From: Alex Miller (Sep 30 2009, at 17:25)
While I don't disagree with your assertions about threaded programming and shared state, you omit the fact that if you do get it right, it can be darn fast, which is why all the other approaches ultimately rely on...threads and locks.
I maintain a concurrency link blog at http://concurrency.tumblr.com that harvests as many good links as I can find if you want to rifle through it.
[link]
From: Wes Felter (Sep 30 2009, at 18:02)
I see a Web bias here in the lack of OpenCL, OpenMP, Cilk++, Ct, TBB, GCD, etc.
[link]
From: Brett Morgan (Sep 30 2009, at 18:55)
I agree with the core of your post, but I disagree with one assertion.
I disagree with your assertion that concurrency in websites is a solved problem. I'm currently working on a content management system for a major newspaper here in Oz, and it has some serious concurrency issues.
And then there is the concurrency of tomorrow's web applications with Wave functionality...
[link]
From: Greg Pfister (Sep 30 2009, at 19:09)
There's another reason besides speed for parallelism. As I noted in http://bit.ly/jEWqQ , if you do the same thing, in the same time, using more, slower, processors, you use less power: Power rises linearly with the number of processors, but as the square of the speed of a processor. So, for example, a parallelized browser can use less battery life on your cellphone. Or fewer KVA in a data center.
Greg Pfister
[link]
From: Paul Bone (Sep 30 2009, at 19:43)
It's important to distinguish concurrency with parallelism. You begun with the problem of "harnessing the Multicores", a problem of parallelism, and then spoke about concurrency.
Lets have some definitions:
Parallelism is the ability of a program to harness multiple processors, (such as in a multi-core system).
Concurrency is being able to run more than one process at once, whether that process is a task in an OS or a thread in a parallel program.
What your post talks about is achieving parallelism through concurrency. Parallelism can be achieved without concurrency just as concurrency can be achieved on a system with a single processor.
I agree that these two things are strongly related but it's also important to (attempt to) achieve parallelism without concurrency, otherwise you're limiting the amount of parallelism that can be achieved.
This will be especially important when machines have dozens if not hundreds of cores, it will be difficult for a programmer to track this many concurrent actions. Your example of a web-server is an exception to this, other examples also exist.
Otherwise I completely agree with your post :-).
[link]
From: astrange (Sep 30 2009, at 20:24)
> No longer; Next year’s system will do more computing all right, but by giving you more CPUs, running at this year’s speed, to work with.
This isn't true yet - for instance, Nehalem/stupid new brand name is definitely faster than Penryn/Core 2.
I guess it's a good way to encourage people, though.
[link]
From: Greg Pfister (Sep 30 2009, at 20:24)
Oh, and I forgot to mention: Isn't this the same problem as the now-standard conundrum of finding a "killer app" for parallelism? (E.g., see http://bit.ly/wAyCh .)
(Apologies for the self-promotion - both this and the prior ref is to my blog. It's an issue I've worried about for a bit.)
Greg Pfister
[link]
From: Cedric (Sep 30 2009, at 21:17)
I don't buy it.
Java became mainstream because it simplified development in several dimensions at the same time, and also because developers started hurting really badly with C++. Compare the number of programmers who enjoyed programming in C++ back then to the number of developers who enjoy programming in Java today (my guess: 10% and 90% respectively).
Java is liked, popular and amazingly adaptable (I'm talking about the language + the platform here). Very few people want to replace it, really.
But for the sake of argument, let's assume that concurrency is ready for a disruptive technology. Well, Erlang is not it. Erlang might simplify a few things in terms of concurrency but the amount of manual work that you have to do is still enormous (e.g. you have to create and maintain your trees of supervisors), the syntax is absolutely alien to 95% of the developers, it's very primitive at the lowest level (just take a look at the error messages), has virtually no tool support (debugger, IDE, ...). Note also that the compromises forced upon developers by the Erlang approach (everything by value, no references and no aliasing) comes with its own set of problems that might end up being just as complex to deal with as the other problems that Erlang is trying to solve.
On top of that, the Java concurrent libraries are very powerful and the existing frameworks (application servers and the likes) allow developers to really have to think little about concurrency overall.
Neither concurrency nor functional programming are anywhere close to pushing Java away. Nothing is, really, for quite a few years.
At any rate, looking forward to your future posts on the subject.
[link]
From: Bob Aman (Sep 30 2009, at 21:36)
'In the "Coders At Work" book, Joe says he feels like Erlang is the Smalltalk of concurrency, and that someone will learn from it, and build the Java or the Ruby of concurrency.'
And I'm sure he's not the first to say it or think it. I certainly have believed this for quite some time. I've been working off and on on a side-project that aspires to be exactly this, but it's slow going, especially now with Google keeping me busy.
[link]
From: rns123@ukr.net (Sep 30 2009, at 23:17)
http://en.wikipedia.org/wiki/Grand_Central_Dispatch also looks very cool, albeit Apple-only.
[link]
From: Erik Onnen (Oct 01 2009, at 00:18)
I think you're overstating the complexity of multithreaded programming, much as functional advocates tend to overstate the evil of a for loop. If you follow emerging idioms like immutable message passing (generally possible in Java even), multithreaded programming isn't all that difficult.
Likewise, I think
[link]
From: JulesLt (Oct 01 2009, at 01:35)
Thoughts - there's a lot of reasons why Java is where it is today, but I think a major one is the whole C to C++ to Java lineage - it was syntactically familiar, and only added a few new concepts - compared to the complete leap that Smalltalk required (and lest we forget Ruby is a similar vintage to Java).
Once people got their head - properly - around OO concepts using the familiar syntax of Java, then we've seen an interest in the pure OO children of Smalltalk.
To me, that suggests the likely 'winner' will be something like Scala - that minimises the difference from Java - even if superior choices exist.
Agreed on comment about race conditions in HTTP apps - we've seen 2 in the last 2 months - but only when the client can make asynchronous calls. I think it's something inherent to the problem space.
GCD/libdispatch is open source (but not GPL so may be some difficulties integrating into Linux). As someone who deals with queueing middleware to pass jobs between server, that is what GCD looks like to me (a very low level, local job server).
No bad thing. My mind does wander as to whether you could create a 'desktop'/mobile platform that used a browser view for the UI (like Palm's WebOS) with a web server behind - threads as asynchronous HTTP requests.
And I would agree it's worth looking at, because I suspect it could be a good back end for a lot of language runtime developers - especially if integrated into the kernel as it is on OS X/XNU.
I think there's also scope for encouraging developers working above the framework level to work in such a way that it could become a lot easier to improve concurrency below - in much the same way that we encourage people to work to an MVC pattern today, even if they don't fully appreciate why, it strikes me that an asynchronous / callback style is inherently easier to make concurrent at the framework/language/runtime level than a synchronous call one.
[link]
From: Sjuul Janssen (Oct 01 2009, at 04:01)
Before I really learned how programming and concurrency work I've viewed this problem differently.
My view was that one core should have the task of identifying instructions that can be sent as a package. Those packages then could be sent to a worker core. And then when the workers are finished somehow weave the results together.
Note though that those packages != a thread but a set of processor instructions
Now that I understand programming and concurrency more, I see that this is an oversimplification of the problem.
Still... If this problem was solvable by hardware that would be ideal. And even with now that I understand it better I still think that the ideal solution would be a hardware solution and not a sofware one
[link]
From: Timothy Collett (Oct 01 2009, at 06:13)
Grand Central Dispatch itself may be Apple-only, but the bones of it, libdispatch, have been open-sourced, and can be found at http://libdispatch.macosforge.org/
[link]
From: Tony Fisk (Oct 01 2009, at 06:25)
Deadlocks should be solved by wrapping the locks in a resource handler (a la PIMPL or shared pointers) That way the lock can be released by the associated destructor when the reference goes out of scope.
Anyway, I thought the real issue was configuring your problem to take advantage of that parallelism. To do this, I think you'll need to finely chop the problem (maybe into individual instructions?) sort them by dependency (A and B need to occur before C etc.) and place them on a queue to be pulled through by the horde of threads/processes/oompa-loompahs. I recall a system like this being discussed in a computer architecture course over twenty years ago.
Lots of fun with load balancing as well (ie ensuring stuff gets chucked in just as quickly as stuff can get tossed out)
[link]
From: Tomas Zezula (Oct 01 2009, at 09:39)
I fully agree that mutable shared state should be eliminated as possible. The Erlang (Scala) actors are very good pattern to achieve high level concurrency by message passing. The good news is that the JSR 166 (Fork Join) should be a part of JDK 7.0 (Scala actors are based on "the same" framework).
Another interesting way to concurrency, with no need to eliminate shared state, used for example by Clojure is Software Transactional Memory.
Finally for some tasks the data flow concurrency may be the best choice but a bit tricky to use. Example of such a framework is gparallelizer built on groovy.
[link]
From: Bob Foster (Oct 01 2009, at 11:22)
Your page seems a bit Java-centric. What about F#, Axum, LINQ (and DryadLINQ)?
[link]
From: thefre (Oct 02 2009, at 09:49)
> Deadlocks should be solved by wrapping the locks in a resource handler (a la PIMPL or shared pointers) That way the lock can be released by the associated destructor when the reference goes out of scope.
Waouh. You just don't know what a deadlock is, and yet propose ways to avoid them (which is evidently completely HS, so won't help). This is really a dangerous attitude for a programmer. Please do something else.
[link]
From: Robert Young (Oct 05 2009, at 18:34)
To sum up:
- concurrency is being used to describe parallelism
- the problem with multi-core/processor machines is a plethora of slow(er) cores/processors
- (left unsaid) application programs are inherently linear
- the mantra from the 80's when parallel machines were first au courant (Thinking Machines, et al), there just aren't that many parallel problems to solve
Now, this situation is distinct from multi-user programs; database servers, operating systems, web servers and the like. 99.44% of application programmers won't have the luxury of such obvious problems.
The notion of slicing these inherently linear problems into pieces to be pawned off on other cores, just leads to each slice waiting for the data needed from its predecessor, and (to within an arbitrary delta) running no faster than if the linear problem had been executed on a single core in the first place. Perhaps slower, since the multi-core solution requires all that sync-ing code.
There is a reason Thinking Machines and the rest went belly up: there just ain't dat much problem for parallel machines. Now, I suppose some enterprising young buck could invent a problem to fit this solution, but that's kind of cheating, you know?
[link]
From: Chris Quenelle (Oct 06 2009, at 20:55)
I'm not sure the needs of concurrency are enough to push a new langauge into general usage. In the long run, I think languages will move towards increased ability to abstract away from implementation details, and allow the compiler and libraries to do more of the heavy lifting. I wrote more about my ideas in a blog posting.
http://quenelle.org/unix/2009/concurrent-language/
[link]
From: mikhailfranco (Oct 07 2009, at 01:45)
I also fully agree that mutable shared state should be eliminated as far as possible. One of the problems with Erlang is that it also eliminates mutability for non-shared private state,
i.e. the internal state of a process.
This means many common operations and calculations can be very expensive in performance and memory bloat. Add this to the fact that all strings are linked lists of characters, and it means both data arrays and strings are never going to be efficient.
Mik
[link]
From: Karl Armstrong (Oct 08 2009, at 07:34)
While thinking about this subject it occurred to me that if we are looking at a new language shift, it would be useful to also consider security as a fundamental design principle. For example, the language should assume that each module runs in separate address space (which has some obvious congruence with concurrency), and support semantics for very leak resistant private data within a module.
Ideally, each module should also be signed, and that could form the basis of a trust paradigm supported by the language.
A bit more of a reach would be intrinsic support for Mandatory Access Control in the language - Enforceable protection labels on data and modules/processes.
I am not aware of any language that currently focuses on these concepts, and unfortunately I think they are unlikely to arise spontaneously in the FOSS/education community. I'm sure you could find some enthusiasm for such an approach if some big player (Oracle???) pushed it.
[link]
From: Tim (but not THE Tim) (Oct 26 2009, at 19:36)
This is a tangent, but I think one that deserves some thought. It comes from my mainframe background where we timeslice the machine to serve thousands of users/transactions every minute:
Perhaps the best future for the multi-hundred core machines is not serving one highly-parallelized application, but rather reducing the cost of "the cloud" by allowing thousands of independent applications to operate in parallel?
There are still problems to solve, but many have some sort of solution - multi-processing solutions where multiple CPUs exist in one machine seem to me almost the same as mutiple-core machines; in fact I have a hard time understanding the difference.
[link]
From: Carson Gross (Nov 24 2009, at 23:40)
Late to this post, but...
What if there are no winners? What if it turns out that the majority of parallelism has already been mined in the apps where it matters? What if completely changing our approach to programming for this problem makes us all less productive? What if todays processors are fast enough?
Cheers,
Carson
[link]
From: Jacek (Nov 25 2009, at 05:21)
Carson, if there's no way to improve parallel apps and processor speeds, that would really only mean that Tim et al. are wasting their time, not yours. Maybe they'll come up with something useful, maybe not, but we can't really know that there's no way to go unless many have tried and failed, and even then a bright mind can uncover what all others to date have been blind to.
In other words, your questions are not productive. I've found myself asking similar questions in the past, but I've realized that posing such questions is itself not a useful thing for me to do.
[link]
From: SusanJ (Dec 16 2009, at 13:54)
@RobertYoung makes an important point. Since at least the Cray-1 vector computer (released in 1976) scientific programmers have been trying to make single runs faster through parallelization. Where is the input from that community?
It doesn't make sense to debate what a supportive language/tool would look like until you can specify what the goal is. Amdahl's Law still holds. It is also still true that a change of algorithm typically makes a bigger difference to efficiency than even an optimal implementation of the same-old algorithm.
[link]