The more I look at Clojure, the more I think it’s a heroic attempt to Do The Right Thing, in fact All The Right Things, as we move toward the lots-of-not-particularly-fast-cores future. I’m still working my head around Clojure’s concurrency primitives. We come to understand the things we don’t by contrast with the things we do; so I’m finding contrast between the Clojure and Erlang approaches to messaging instructive.
[This is part of the Concur.next series. By way of motivation, consider the recent 100-core processor from Tilera. I quote from the coverage, which says that the new chip should offer “somewhere between seven and eight times the performance of the original Tile64 chip that debuted two years ago. That is well above and beyond the Moore’s Law curve, provided your workload can scale across lots of cores.” Which, absent a good Concur.next solution, most workloads can’t.]
Let’s suppose we’ve got a program we want to smash across lots of cores, and we want it to count things. The counts aren’t going to be reported out till the program finishes, so there’s no reason for the main workload to wait for the counting arithmetic. So the obvious thing to do is to run the counters off in their own thread(s), send them please-count messages (no need to wait for the answers), and the numbers will all get added up eventually.
In Erlang · Here’s how you might do it. Let’s take the simplest possible approach and have a separate process for each item you’re counting.
1 -module(counter).
2 -export([new/0, incr/1, count/1, counter/0]).
3
4 new() ->
5 Counter = spawn(fun counter:counter/0),
6 Counter.
7
8 incr(Counter) ->
9 Counter ! { incr }.
10
11 count(Counter) ->
12 Counter ! { count, self() },
13 receive
14 { count, Count } ->
15 Count
16 end.
17
18 %- The counter process -------------------
19 counter() ->
20 counter_loop(0).
21
22 counter_loop(Count) ->
23 receive
24 { incr } ->
25 counter_loop(Count + 1);
26 { count, Requestor } ->
27 Requestor ! { count, Count },
28 counter_loop(Count)
29 end.
When a user calls the incr
function, line 8, that just sends
off a one-element tuple in fire-and-forget mode. The counter’s main loop,
starting at line 22, receives that message at line 24, increments its counter
and waits for the next message.
To find out what the count is, the count
function at line 11
sends off a tuple with the symbol count
and your process ID, and
waits for the counter to report.
This probably isn’t the world’s most idiomatic Erlang, but I think it illustrates message-passing pretty clearly.
In Clojure · You’d use an agent, which is a reference to some data; here’s an example from Stuart Halloway’s Programming Clojure book. First, you get a reference to the agent:
(def counter (agent 0))
Then, to increment it, you send it a function; since Clojure has a built-in
inc
function, this is pretty easy.
(send counter inc)
Now, you could have sent any old function over there that could apply to
whatever counter
referred to, in this case an integer, and
have that function applied; you can send arguments too.
Clojure takes care of maintaining a thread pool
and sequencing the messages to the agent. If there’s a chance that the
function might block, you can use send-off
rather than
send
; it uses an expandable thread pool.
Anyhow, to find the current value, you just ask; either of these will do.
(deref counter)
@counter
Different Strokes · The differences between these two approaches are extremely interesting. I’m wondering if the class of things you can do with one but not the other is very big or interesting; I suspect that both can be made isomorphic to the Actor model of computation, although the mapping from Erlang is more direct.
[Update:] Thanks to Phil Hagelberg for this comment, which links to a detailed discussion of the Clojure/Erlang trade-offs hosted by Bill Clementson. I see lots of stuff to argue about there...
I’ll dig into this some more (I’m doing a Wide Finder in Clojure to get some hands-on), but I place a certain value on first impressions and thought it might be useful to capture some.
This Is Important · Not all the practical sane lock-free solutions to the concurrency conundrum involve message-passing; for example, see Clojure’s Atoms and Refs. But I do think that message-passing is important; it presents a relatively simple mental model that isn’t terribly hard to think about, it scales naturally outside the shared-global-memory space, and it maps reasonably well onto the semantics actually provided by the underlying operating systems.
Ceremony · Clojure has a lot less. You don’t have to create threads or take care of sending and receiving messages. That counter function involves three lines of code, as opposed to 29 in Erlang. This is good.
To be fair, a user of the Counter module would only ever have to say
incr(Counter)
and count(Counter)
. But still, the
whole implementation of both sides is just three lines of Clojure.
I do wonder if there are cases where I’d miss Erlang’s direct control over the process inventory? Sometimes I might be counting twelve things and sometimes twelve thousand.
Coupling · Erlang has a lot less. I don’t need to know what kind of a thing I’m talking to, the messages I send it are just chunks of data, and it gets to decide how to deal with them. In particular, it can trivially ignore those it doesn’t care about.
This seems good to me. But maybe I’m kidding myself here; As a developer,
normally wouldn’t I be controlling both sides of the conversation? And as I
look at actual Erlang code, I notice that in practice, a lot of the messages
are along the lines of
{ opcode, argument, argument, ... }
— they smell a
lot like dressed-up function calls.
Pattern Matching · I think this is actually the important difference between the approaches. The message you send to a Clojure agent is just a function, and the system goes about applying it like any other. On the other hand, the receiving end in Erlang gets to pattern-match, which feels natural and looks readable. Is this the proverbial Computer Science one-more-level-of-indirection which turns out to solve an important problem?
Scaling · My use-case here, counting integers, is awfully limited. I’ve built up a “feel” for what Erlang message wrangling feels like in practice at significant-program scale. It’s one of the reasons I’m positive on Erlang: it’s easy to explain and easy to understand (well, compared to locking). When I stress Clojure a bit, if I find it even easier, that’d be a big selling point.
If only Clojure weren’t a Lisp...
Comment feed for ongoing:
From: Kevin Smith (Oct 26 2009, at 17:03)
Disclaimer: I derive a significant portion of my income using and helping others use Erlang.
I like Erlang because it's a pretty small language (once you get past the syntax and fixed variables) with just a few core concepts. Message passing and pattern matching are pretty easy to understand and use. Message passing is even more powerful when you consider the fact it works across Erlang VM boundaries fairly transparently, too. I also appreciate Erlang's flexible combination of dynamic typing and pattern matching.
OTOH, it seems like Clojure can't make up it's mind what kind of concurrency primitives it wants. I'm probably wrong -- I've only written a few toy Clojure programs -- but Clojure's array of concurrency primitives seems a bit much. I'm not sure when to use atoms versus refs versus agents in some cases. I'm sure my confusion would go away if I spent more time writing Clojure but that's my current predicament. Anecdotally, many of my friends have had the same reaction on their first few exposures to Clojure, too.
Message passing, especially distributed message passing, seems to be a really good fit for the kinds of concurrency problems we're facing now and in the future. It's a developer friendly mental model with good performance characteristics. Even if Erlang isn't the language of the future I'd like to think it's message passing model and the Actor model in general will live on.
[link]
From: Kevin Scaldeferri (Oct 26 2009, at 17:05)
"Is this the proverbial Computer Science one-more-level-of-indirection which turns out to solve an important problem?"
Off hand, I'm thinking that it's unclear how the Clojure approach deals with: a) distributed systems, b) live code updating; both of which are handled transparently by the Erlang approach.
These are both really different angles coming at the same root question which is: when you say "(send counter inc)", what exactly do you mean by "inc"?
[link]
From: SImon Koss (Oct 26 2009, at 17:48)
The key differences between Clojure agents and Erlang actors are:
- An agent's state can be shared without message passing. An actor's state cannot be shared.
- An actor has its own logical thread of execution, an agent does not.
- An actor can selectively receive messages from its mailbox. An agent has no control over the order function invocations in the agent.
[link]
From: DGentry (Oct 26 2009, at 17:52)
Many of the decisions in the design of Erlang seem to have been driven by the requirements of the messaging system, optimized for the SMP systems which were current at the time. For example, immutability allows the runtime to reference items by address without having to worry about whether the sender tries to write to them again before the receiver reads them (or before the messaging service copies them into the packet).
For a common case of multiple Erlang processes running within single Erlang virtual machine (BEAM) in a single address space, believe the runtime never has to worry about copy on write or dirtying of objects. It can pass pointers around and count references, the objects are immutable by design.
This is all well and good, but systems with lots of cores tend to be NUMA. Passing a pointer isn't automatically ideal, I suspect that sometimes it really would be better to copy the message to the remote core's local memory. In principle I guess Erlang's messaging system could try to infer whether to use a pointer versus a copy, but in practice I'm not sure it would be practical to do so. A system where the messaging is explicitly a copy might actually work better.
[link]
From: Chris Anderson (Oct 26 2009, at 18:34)
Erlang maintains an independent heap for each lightweight process, which helps with cache coherancy when the scheduler switches processes in and out. Instead of holding pointers all over the place, Erlang processes have high memory locality. Once a process is running most operations will avoid access to off-core memory.
[link]
From: Zach Tellman (Oct 26 2009, at 18:58)
Kevin:
'inc' is a function that increments a number. (inc 1) will return 2.
I don't think there's an official Clojure solution for either of those problems. I expect that there will be several libraries that try to tackle the distributed computing issue, and that at least some of them will use Erlang as glue.
Out of curiosity, though, how does Erlang handle transactional updates across several processes? It seems like the atomicity is limited to a single process, unless you use Mnesia or something. I'm certainly not an expert, though, so maybe there's something I'm not aware of.
[link]
From: Mark Reid (Oct 26 2009, at 19:19)
Regarding the second Kevin's comment, it is worth noting that Clojure wasn't intended for distributed computing, only single machine concurrent programming. There are efforts underway to do distributed computing with Clojure (search for "Clojure Terracotta") but that's not Clojure's main focus.
Regarding the plethora of primitives for concurrency mentioned by the first Kevin, have a look at Rich Hickey's InfoQ talk entitled "Persistent Data Structures and Managed References". He clearly articulates what needs to happen when dealing with states and values to have a chance of reasoning about concurrency. Once that's done, there are several kinds of semantics you can give to references that are consistent with those constraints.
The various models correspond to whether you want coordination or not and, independently, synchronisation or not (e.g., agents = uncoordinated + unsynchronised ; refs = coordinated + synchronised ; atoms = uncoordinated + unsynchronised). You pick the model you want based on what the problem you are solving requires.
[link]
From: Stan Dyck (Oct 26 2009, at 20:12)
@Simon: clojure agents *can* selectively receive updates. When you create an agent you can pass it an optional validator function that can be used to reject attempts to update its state. Not quite as sophisticated as erlang and scala's pattern matching, but more than nothing.
[link]
From: Phil (Oct 26 2009, at 20:14)
There's a good discussion here of why Rich (author of Clojure) decided not to implement erlang-style distributed concurrency: http://bc.tech.coop/blog/081201.html
[link]
From: Jeff Rose (Oct 26 2009, at 20:47)
As someone who came to Clojure after going through C++, Java, Python and Ruby, I can say that being a Lisp hasn't been so bad. It was awkward in the beginning, no doubt, and I found it hard to read other peoples code. After my first couple months of programming in Clojure though, I found it easier to read then Ruby code, and macros make Ruby's meta-programming facilities look both weak and over complicated. I think after the first few weeks the phobia of parenthesis goes away, and then it just takes a while to learn the standard library and the many ways that the basic sequence manipulation functions can be composed to do interesting things.
[link]
From: Kevin Scaldeferri (Oct 26 2009, at 22:47)
@Zach,
You've missed my point. You believe that you know what "inc" does, and what it will always do. However, in a distributed system, or a system which allows hot code upgrades, such an assumption is not necessarily true. The node that you send to might be running different code. Or, even within a single node, by the time your message is received you might be running different code. So, when you say that you're telling an agent to run the function "inc", there's an ambiguity that you need to clarify.
(I realized that this point seems strained and artificial, but that's because this example is artificial. Yes, I imagine it's very, very unlikely that anyone is going to redefine "inc". But in real distributed systems, this is a serious issue.)
[link]
From: SImon Koss (Oct 26 2009, at 22:49)
@Stan: An Erlang agent can select messages from the mailbox using a pattern. If a message is not selected, then it stays in the mailbox. This allows an Erlang agent to initiate an "rpc" to another agent and then wait for the response. Other message sent the agent are queued in the mailbox. Clojure validators cannot do this. (note that Erlang use the term "process" for an agent)
[link]
From: Tim Dysinger (Oct 26 2009, at 22:50)
I have had the pleasure of working with both languages. They both have their place.
Rich argues why he didn't use an actor-like setup in clojure here http://clojure.org/state in the middle under "Message Passing and Actors". He makes some good points. (IE, Unless you are doing distributed programming, you pay a price for message passing).
We have developed some neat simple message passing on top clojure on our project using RabbitMQ and clojure sequences & actors. So :P :)
Clojure wins for us by keeping it functional, being awesome on cores and giving access to a hundred thousand java libraries.
[link]
From: Attila Szegedi (Oct 27 2009, at 01:46)
@Mark Reid, you made the exactly same comment I wanted to make regarding the concurrency semantics of various reference primitives, except you unfortunately have the atoms wrong: atoms are synchronized. (I know you know it - just a typing error).
So, to repeat: there are four reference types[*] in Clojure, and they differ in three concurrency attributes, namely: synchronicity (whether the change-initiating thread waits for the change to complete), coordination (of multiple changes across multiple references - transactions, basically), and value sharing (among threads):
Vars: unshared (thread-local), synchronous (consequence of thread-locality), uncoordinated (consequence of thread-locality).
Atoms: shared, synchronous, uncoordinated. They're basically a compare-and-swap.
Agents: shared, asynchronous, uncoordinated. Change functions are queued, and each function is run by agent sequentially using a thread pool.
Refs: shared, synchronous, coordinated (transacted). They're the software transactional memory implementation.
Also worth noting is that agents are integrated with the STM system - any sends to an agent that are initiated from within a STM transaction will get delayed until the transaction commits (otherwise the transactions could have external side-effects via agent sends).
[*] Rich used to call them "reference types", don't know if the term is still current.
[link]
From: Kevin Smith (Oct 27 2009, at 05:54)
I agree with Tim. I think Clojure's sweet spot is an app which needs some sanity around concurrency -- and what app these days doesn't? -- and needs either the JVM or some set of Java libraries.
Personally, I shudder a little bit at the prospect of doing distributed stuff with any JVM language since I've started using Erlang. The simplicity really is a game changer if you're doing any amount of distributed work.
[link]
From: DGentry (Oct 27 2009, at 06:15)
> Chris Anderson (Oct 26 2009, at 18:34)
> Erlang maintains an independent heap for each
> lightweight process, which helps with cache
> coherancy when the scheduler switches
> processes in and out. Instead of holding
> pointers all over the place, Erlang processes
> have high memory locality. Once a process is
> running most operations will avoid access to
> off-core memory.
Data which a LWP writes will be stored in its own heap. When it sends a message to another LWP, does it point to the data in its local heap? So the recipient of the message will be making remote reads?
I may be emphasizing this too much, in that the recipient will extract whatever information it needs from the message and then operate on local data to handle the operation. My point is that the massively multicore world has somewhat different performance characteristics from the SMP world that Erlang evolved in.
[link]
From: Robert Virding (Oct 27 2009, at 08:12)
@DGentry How an Erlang implementation decides to handle processes and heaps is an implementation detail. It can, if it so wishes, share data between processes and handle messages by sending pointers or it can have separate process heaps and copy message data between them. Or it could decide to use a mixture depending on how to best use the underlying architecture. It is not a langauge detail, all that the language Erlang requires is that data structures are immutable and that the only way to communicate between processes is by sending messages. I think it is critical to be aware of this distinction otherwise you will get a skewed argument.
Another point I wish to make is that irrespective which is the "best" way to implement concurrency and communication using the actor model with explicit asynchronous messages is a very good way of describing many concurrent applications. It was ideal, if not the best, for telecoms apps which is why we chose it. It is also very suited to many (most?) of today's internet apps. It just seems to map easily into many real world applications.
A third point is that the Erlang concurrency model is not just actors and asynchronous message passing. The error handling mechanisms are a central part of it, a fact which is often forgotten and tends to result in comparisons between apples and orchards. Any implementation which purports to implement Erlang style concurrency which does not include the error handling is just lacking.
And finally, when we first started designing an Erlang and its concurrency model we had not heard of actors. At least I hadn't, and I did not "discover" it until much later.
[link]
From: Kevin Smith (Oct 27 2009, at 08:17)
Re: Erlang, SMP, and NUMA. The cool thing is the OTP team at Ericsson is well aware of this and actively working on enhancing Erlang to run more efficiently on NUMA machines. Ken Lundin, the manager of the OTP group, gave an informative talk outlining his groups plans on just this topic at the London Erlang Factory: http://video.yahoo.com/watch/5420968/14278461
[link]
From: Stuart Halloway (Oct 27 2009, at 09:37)
I don't think you can easily make Clojure's agents isomorphic to actors. Agents are deadlock-proofed, in the sense that you cannot await an agent from within another agent. This is sensible with an in-process model, but wouldn't make sense (IMO) for an actor model.
[link]
From: Robert Virding (Oct 27 2009, at 09:59)
@DGentry I don't think I was as really clear as I wished in my comment. The design of Erlang was driven by the concurrency model, processes and asynchronous messaging, but the design was not optimised by hardware considerations, SMP or otherwise. The concurrency model and the error handling were driven by the needs of the type of applications which we were considering. Looking at it from this level, however, does allow us to optimise the *implementation* for various types of hardware.
[link]
From: James Swift (Oct 27 2009, at 16:56)
@Kevin
"You believe that you know what "inc" does, and what it will always do."
Forgive me if I'm missing the point as I'm relatively new to Clojure but could you solve by having the 'data' in an agent actually be the code that you want to hot-swap. E.g. agent data could be some real data and some data that is code which the sent function could change.
Not particularly clean compared to Erlang but definitely possible.
[link]
From: Mark (Oct 29 2009, at 00:45)
Although Clojure has several concurrency primitives, I always tell people that "when in doubt, use refs". It's really hard to go wrong with refs. As you work more with refs, you'll begin to notice cases where maybe atoms or agents are more suitable, but if you're not sure, just go ahead and use refs.
[link]
From: Ulf Wiger (Oct 29 2009, at 12:14)
A place where loose coupling comes in handy is in very large and complex systems that keep evolving over a long period of time. This is very typical of telecoms systems, and our practical experience of Erlang is that it shines in this regard.
Many of the defining aspects of Erlang are there exactly because hard-earned experience at Ericsson identified them as vital for writing complex, robust, constantly evolving, stateful message-passing systems. Their utility may not always show up in trivial examples, but when the feature requirements start piling up - that's when you start realizing why they're there.
[link]
From: Tom Faulhaber (Oct 30 2009, at 11:52)
re: Pattern Matching
While it's true that Clojure doesn't implement pattern matching natively, because it *is* a Lisp, it's easy to add it. Tim Lopez has an implementation here: http://www.brool.com/index.php/pattern-matching-in-clojure.
It doesn't map so well to this example because the example is so simplified (in fact you'd certainly use an atom for this in Clojure).
Also, many Erlang message patterns would be encapsulated by multimethod dispatch in Clojure. This isn't quite as general as pattern matching, but works for many cases.
But these comparisons are all hard to make, because message passing and Clojure agents are really not isomorphic, nor are meant to be.
[link]
From: Ulf Wiger (Oct 30 2009, at 17:01)
There have been applications written in erlang where the messages passed to some processes where indeed just a function call spec (an "MFA" in Erlang speak, or a {Module, Function, Args} tuple). The idea is that this would be very convenient, since you save a lot of typing otherwise needed to specify _what_ specific functions the server is intended to serve.
This style of programming has its uses (indeed, the rpc module in OTP works exactly like this), but experience shows that it makes for a terrible style of concurrent GOTO programming that quickly renders your code unreadable unless you can contain it all within a fairly small body of code.
[link]
From: Vsevolod Dyomkin (Oct 31 2009, at 07:32)
> If only Clojure weren’t a Lisp...
Why won't you continue, if only Erlang weren't a Prolog, or if only Ruby weren't Perl?.. ;)
[link]
From: Travis Whitton (Nov 02 2009, at 10:56)
@Kevin Smith
I don't think Clojure has a hard time making up its mind; rather, it offers different tools for different jobs. Refs, agents, and vars all serve distinctly different purposes. Sorting them out is pretty easy once you know the basics.
Vars - Thread local mutable storage locations. Good for things like "flags" that need to change within a thread without coordinating with the outside world.
Refs - Synchronized mutation and access to values across threads. Great for reading and writing a value (or set of values) from multiple threads in a controlled fashion.
Agents - Action oriented async state handlers. If you want to fire and forget then come back to check on a value later, agents are what you're looking for.
[link]
From: Mauricio Arango (Nov 09 2009, at 07:58)
Agree messaging offers a simple model to reason about building concurrent applications.
However, Erlang and other languages supporting the Actor model only offer direct messaging, which has limitations when the number of tasks in an application is large and variable. As the number of tasks and types of tasks in a parallel application grows, it becomes extremely difficult for developers to program logic that keeps track of the names and addresses of other tasks. The problem becomes increasingly complex because the number of interacting tasks could expand or contract based service load.
Indirect messaging solves the above problem through decoupling of communicating entities. With indirect communication, application tasks don't need to know about each other, they communicate by placing messages in an intermediate entity, tasks interested in certain types of messages access the intermediate entity and consume the messages they are interested in. The only addresses that the communicating tasks need to know are the access points to the intermediate entity. This is referred to as space decoupling. Indirect communication also enables time decoupling between application tasks, which means that a task can send a message without its receiver being active or in existence.
There are three main types of indirect messaging mechanisms: Tuple Space, Message Queue and Publish/Subscribe.
An ideal Erlang enhancement would be shared queues or tuple spaces, leveraging the language's rich tuple management and pattern matching features.
[link]