Why is the commute function called twice when changing a ref in Clojure?
I've done some experiments in order to understand how commute
works. I would like to break my explanation into 3 parts:
- Compare and Set Semantics
alter
commute
Compare and Set Semantics
I think Clojure for the Brave and True has explained it quite well:
swap!
implements "compare-and-set" semantics, meaning it does the following internally:
- It reads the current state of the atom
- It then applies the update function to that state
- Next, it checks whether the value it read in step 1 is identical to the atom's current value
- If it is, then swap! updates the atom to refer to the result of step 2
- If it isn't, then swap! retries, going through the process again with step 1.
swap!
is for atom
, but knowing it will help us to understand alter
and commute
, because they got use similar method to update ref
.
Unlike atom
, ref
modification (through alter
, commute
, ref-set
) has to be wrapped inside a transaction. When transaction start (or retry), it will capture a snapshot of all containing ref
(because alter
needs it). ref
will be modified only when the transaction is committed.
alter
In a transaction, all ref
that will be modified by alter
form a group. If any one of the ref
in the group fails its alteration, transaction will be retried. Basically alter
does the following:
- Compare its altering
ref
to the snapshot captured by transaction. If they look different, retry the transaction; else - Create a new state from the snapshot using function provided.
- Compare
ref
to the snapshot again. If they look different, retry the transaction; else - Try to write-lock the
ref
, don't let anybody modify it until end of this transaction trial. If failed (ref
has already been locked), wait for a while (e.g. 100 ms), then retry the transaction. - Tell transaction to update this
ref
into new state when it perform commission.
Let's demonstrate a smooth alteration. First, we are going to create a thread t1
to alter
3 counters c1
, c2
and c3
with slow-inc
.
(ns testing.core)
(def start (atom 0)) ; Record start time.
(def c1 (ref 0)) ; Counter 1
(def c2 (ref 0)) ; Counter 2
(def c3 (ref 0)) ; Counter 3
(defn milliTime
"Get current time in millisecond."
[]
(int (/ (System/nanoTime) 1000000)))
(defn lap
"Get elapse time since 'start' in millisecond."
[]
(- (milliTime) @start))
(defn slow-inc
"Slow increment, takes 1 second."
[x x-name]
(println "slow-inc beg" x-name ":" x "|" (lap) "ms")
(Thread/sleep 1000)
(println "slow-inc end" x-name ":" (inc x) "|" (lap) "ms")
(inc x))
(defn fast-inc
"Fast increment. The value it prints is incremented."
[x x-name]
(println "fast-inc " x-name ":" (inc x) "|" (lap) "ms")
(inc x))
(defn -main
[]
;; Initialize c1, c2, c3 and start.
(dosync (ref-set c1 0)
(ref-set c2 0)
(ref-set c3 0))
(reset! start (milliTime))
;; Start two new threads simultaneously.
(let [t1 (future
(dosync
(println "transaction start |" (lap) "ms")
(alter c1 slow-inc "c1")
(alter c2 slow-inc "c2")
(alter c3 slow-inc "c3")
(println "transaction end |" (lap) "ms")))
t2 (future)]
;; Dereference all of them (wait until all 2 threads finish).
@t1 @t2
;; Print final counters' values.
(println "c1 :" @c1)
(println "c2 :" @c2)
(println "c3 :" @c3)))
And we got this:
transaction start | 3 ms ; 1st try
slow-inc beg c1 : 0 | 8 ms
slow-inc end c1 : 1 | 1008 ms
slow-inc beg c2 : 0 | 1009 ms
slow-inc end c2 : 1 | 2010 ms
slow-inc beg c3 : 0 | 2010 ms
slow-inc end c3 : 1 | 3011 ms
transaction end | 3012 ms
c1 : 1
c2 : 1
c3 : 1
Smooth procedure. No surprise.
Let's see what will happen if a ref
(let's say, c3
) is modified before its alteration ((alter c3 ...)
). We'll modify it during alteration of c1
. Edit the let
's binding of t2
into:
t2 (future
(Thread/sleep 900) ; Increment at 900 ms
(dosync (alter c3 fast-inc "c3")))
Result:
transaction start | 2 ms ; 1st try
slow-inc beg c1 : 0 | 7 ms
fast-inc c3 : 1 | 904 ms ; c3 being modified in thread t2
slow-inc end c1 : 1 | 1008 ms
slow-inc beg c2 : 0 | 1009 ms
slow-inc end c2 : 1 | 2010 ms
transaction start | 2011 ms ; 2nd try
slow-inc beg c1 : 0 | 2011 ms
slow-inc end c1 : 1 | 3012 ms
slow-inc beg c2 : 0 | 3013 ms
slow-inc end c2 : 1 | 4014 ms
slow-inc beg c3 : 1 | 4015 ms
slow-inc end c3 : 2 | 5016 ms
transaction end | 5016 ms
c1 : 1
c2 : 1
c3 : 2
As you can see, in step 1 of 1st-try-(alter c3 ...)
, it realize c3
(val = 1) looks different to the snapshot (val = 0) captured by transaction, so it retry the transaction.
Now, what if a ref
(let's say, c1
) was modified during its alteration ((alter c1 ...)
)? We will modify c1
on thread t2
. Edit the let
's binding of t2
into:
t2 (future
(Thread/sleep 900) ; Increment at 900 ms
(dosync (alter c1 fast-inc "c1")))
Result:
transaction start | 3 ms ; 1st try
slow-inc beg c1 : 0 | 8 ms
fast-inc c1 : 1 | 904 ms ; c1 being modified in thread t2
slow-inc end c1 : 1 | 1008 ms
transaction start | 1009 ms ; 2nd try
slow-inc beg c1 : 1 | 1009 ms
slow-inc end c1 : 2 | 2010 ms
slow-inc beg c2 : 0 | 2011 ms
slow-inc end c2 : 1 | 3011 ms
slow-inc beg c3 : 0 | 3012 ms
slow-inc end c3 : 1 | 4013 ms
transaction end | 4014 ms
c1 : 2
c2 : 1
c3 : 1
This time, in step 3 of 1st-try-(alter c1 ...)
, it find out that ref
has been modified, so it call for a transaction retry.
Now, let's try to modify a ref
(let's say, c1
) after its alteration ((alter c1 ...)
). We'll modify it during alteration of c2
.
t2 (future
(Thread/sleep 1600) ; Increment at 1600 ms
(dosync (alter c1 fast-inc "c1")))
Result:
transaction start | 3 ms ; 1st try
slow-inc beg c1 : 0 | 8 ms
slow-inc end c1 : 1 | 1009 ms
slow-inc beg c2 : 0 | 1010 ms
fast-inc c1 : 1 | 1604 ms ; try to modify c1 in thread t2, but failed
fast-inc c1 : 1 | 1705 ms ; keep trying...
fast-inc c1 : 1 | 1806 ms
fast-inc c1 : 1 | 1908 ms
fast-inc c1 : 1 | 2009 ms
slow-inc end c2 : 1 | 2011 ms
slow-inc beg c3 : 0 | 2012 ms
fast-inc c1 : 1 | 2110 ms ; still trying...
fast-inc c1 : 1 | 2211 ms
fast-inc c1 : 1 | 2312 ms
fast-inc c1 : 1 | 2413 ms
fast-inc c1 : 1 | 2514 ms
fast-inc c1 : 1 | 2615 ms
fast-inc c1 : 1 | 2716 ms
fast-inc c1 : 1 | 2817 ms
fast-inc c1 : 1 | 2918 ms ; and trying....
slow-inc end c3 : 1 | 3012 ms
transaction end | 3013 ms ; 1st try ended, transaction committed.
fast-inc c1 : 2 | 3014 ms ; finally c1 modified successfully
c1 : 2
c2 : 1
c3 : 1
Since 1st-try-(alter c1 ...)
has locked c1
(step 4), no one can modify c1
, not until this round of transaction trial ended.
That's it for alter
.
So, what if we don't want c1
, c2
, c3
all group together? Let's say I want to retry the transaction only when c1
or c3
fail their alteration (being modified by other thread during the transaction). I don't care the state of c2
. There's no need to retry the transaction if c2
is modified during the transaction, so that I can save some time. How do we achieve that? Yes, through commute
.
commute
Basically, commute
does the following:
- Run the function provided using
ref
directly (not from snapshot) but don't do anything with the result. - Ask the transaction to call
real-commute
with the same arguments before transaction commits. (real-commute
is just a made up name by me.)
I'm don't actually know why commute
has to run step 1. It seems to me step 2 alone is enough. real-commute
does the following:
- Read-and-write-lock the
ref
until end of this transaction trial if it hasn't been locked, else retry the transaction. - Create a new state from
ref
using given function. - Tell transaction to update this
ref
into new state when it perform commission.
Let's examine it. Edit the let
's binding into:
t1 (future
(dosync
(println "transaction start |" (lap) "ms")
(alter c1 slow-inc "c1")
(commute c2 slow-inc "c2") ; changed to commute
(alter c3 slow-inc "c3")
(println "transaction end |" (lap) "ms")))
t2 (future)
Result:
transaction start | 3 ms
slow-inc beg c1 : 0 | 7 ms ; called by alter
slow-inc end c1 : 1 | 1008 ms
slow-inc beg c2 : 0 | 1009 ms ; called by commute
slow-inc end c2 : 1 | 2009 ms
slow-inc beg c3 : 0 | 2010 ms ; called by alter
slow-inc end c3 : 1 | 3011 ms
transaction end | 3012 ms
slow-inc beg c2 : 0 | 3012 ms ; called by real-commute
slow-inc end c2 : 1 | 4012 ms
c1 : 1
c2 : 1
c3 : 1
So slow-inc
get called twice if you use commute
, once by commute
and once by real-commute
before transaction commit. The first commute
didn't do anything with slow-inc
's result.
slow-inc
can get called more than twice. For example, let's try to modify c3
on thread t2
:
t2 (future
(Thread/sleep 500) ; modify c3 at 500 ms
(dosync (alter c3 fast-inc "c3")))
Result:
transaction start | 2 ms
slow-inc beg c1 : 0 | 8 ms
fast-inc c3 : 1 | 504 ms ; c3 modified at thread t2
slow-inc end c1 : 1 | 1008 ms
slow-inc beg c2 : 0 | 1009 ms ; 1st time
slow-inc end c2 : 1 | 2010 ms
transaction start | 2012 ms
slow-inc beg c1 : 0 | 2012 ms
slow-inc end c1 : 1 | 3013 ms
slow-inc beg c2 : 0 | 3014 ms ; 2nd time
slow-inc end c2 : 1 | 4015 ms
slow-inc beg c3 : 1 | 4016 ms
slow-inc end c3 : 2 | 5016 ms
transaction end | 5017 ms
slow-inc beg c2 : 0 | 5017 ms ; 3rd time
slow-inc end c2 : 1 | 6018 ms
c1 : 1
c2 : 1
c3 : 2
In first trial of transaction, after (commute c2 ...)
has been evaluated, (alter c3 ...)
found out that c3
was different from the snapshot, thus triggered transaction to retry. If (alter c3 ...)
is before (commute c2 ...)
, then retrial will be triggered before evaluation or (commute c2 ..)
. So, placing all commute
s after all alter
s may save you some time.
Let's see what will happen if we modify c2
in thread t2
while transaction in t1
is being evaluated.
t2 (future
(Thread/sleep 500) ; before evaluation of (commute c2 ...)
(dosync (alter c2 fast-inc "c2"))
(Thread/sleep 1000) ; during evaluation of (commute c2 ...)
(dosync (alter c2 fast-inc "c2"))
(Thread/sleep 1000) ; after evaluation of (commute c2 ...)
(dosync (alter c2 fast-inc "c2")))
Result:
transaction start | 3 ms
slow-inc beg c1 : 0 | 9 ms
fast-inc c2 : 1 | 504 ms ; before
slow-inc end c1 : 1 | 1009 ms
slow-inc beg c2 : 1 | 1010 ms
fast-inc c2 : 2 | 1506 ms ; during
slow-inc end c2 : 2 | 2011 ms
slow-inc beg c3 : 0 | 2012 ms
fast-inc c2 : 3 | 2508 ms ; after
slow-inc end c3 : 1 | 3013 ms
transaction end | 3013 ms
slow-inc beg c2 : 3 | 3014 ms
slow-inc end c2 : 4 | 4014 ms
c1 : 1
c2 : 4
c3 : 1
As you can see, no transaction retrial, and c2
still updated to our expected value (4), thanks to real-commute
.
Now I want to demonstrate the effect of step 1 in real-commute
: its ref
is read-and-write-locked. First, to confirm it is read-locked:
t2 (future
(Thread/sleep 3500) ; during real-commute
(println "try to read c2:" @c2 " |" (lap) "ms"))
Result:
transaction start | 3 ms
slow-inc beg c1 : 0 | 9 ms
slow-inc end c1 : 1 | 1010 ms
slow-inc beg c2 : 0 | 1010 ms
slow-inc end c2 : 1 | 2011 ms
slow-inc beg c3 : 0 | 2012 ms
slow-inc end c3 : 1 | 3012 ms
transaction end | 3013 ms
slow-inc beg c2 : 0 | 3013 ms
slow-inc end c2 : 1 | 4014 ms
try to read c2: 1 | 4015 ms ; got printed after transaction trial ended
c1 : 1
c2 : 1
c3 : 1
The @c2
is blocked until c2
got unlocked. That's why println
got evaluated after 4000 ms even though our order is sleep for 3500 ms.
Since commute
and alter
need to read their ref
to perform given function, they will be blocked until their ref
is unlocked too. You can try to replace (println ...)
with (alter c2 fast-inc "c2")
. The effect should be same as this example.
So, in order to confirm it is write-locked, we can use ref-set
:
t2 (future
(Thread/sleep 3500) ; during real-commute
(dosync (ref-set c2 (fast-inc 9 " 8"))))
Result:
transaction start | 3 ms
slow-inc beg c1 : 0 | 8 ms
slow-inc end c1 : 1 | 1008 ms
slow-inc beg c2 : 0 | 1010 ms
slow-inc end c2 : 1 | 2011 ms
slow-inc beg c3 : 0 | 2012 ms
slow-inc end c3 : 1 | 3013 ms
transaction end | 3014 ms
slow-inc beg c2 : 0 | 3014 ms
fast-inc 8 : 9 | 3504 ms ; try to ref-set but failed
fast-inc 8 : 9 | 3605 ms ; try again...
fast-inc 8 : 9 | 3706 ms
fast-inc 8 : 9 | 3807 ms
fast-inc 8 : 9 | 3908 ms
fast-inc 8 : 9 | 4009 ms
slow-inc end c2 : 1 | 4015 ms
fast-inc 8 : 9 | 4016 ms ; finally success, c2 ref-set to 9
c1 : 1
c2 : 9
c3 : 1
From here you can also guess what ref-set
does:
- If its
ref
has been write-locked, retry the transaction after some time (e.g. 100 ms); else tell the transaction to update thisref
into given value when it perform commission.
real-commute
can fail too, when its ref
has been locked on step 1. Unlike alter
or ref-set
, it doesn't wait for some time before retrying the transaction. This may cause problem if the ref
is locked for too long. For example, we'll try to modify c1
after its alteration, using commute
:
t2 (future
(Thread/sleep 2500) ; during alteration of c3
(dosync (commute c1 fast-inc "c1")))
Result:
transaction start | 3 ms
slow-inc beg c1 : 0 | 8 ms
slow-inc end c1 : 1 | 1008 ms
slow-inc beg c2 : 0 | 1010 ms
slow-inc end c2 : 1 | 2011 ms
slow-inc beg c3 : 0 | 2012 ms
fast-inc c1 : 1 | 2506 ms
fast-inc c1 : 1 | 2506 ms
fast-inc c1 : 1 | 2506 ms
...
Exception in thread "main" java.util.concurrent.ExecutionException:
java.lang.RuntimeException: Transaction failed after reaching retry
limit, compiling: ...
Recall that c1
is write-locked by alter
after its alteration, therefore real-commute
keep failing and keep retrying the transaction. Without buffer time, it reached transaction retrial limit and boomed.
Note
commute
helps to improve concurrency by letting user reduce ref
that will cause transaction retrial, with the cost of calling given function at least twice to update its ref
. There are some scenarios which commute
can be slower than alter
. For example, when the only thing to do in a transaction is update a ref
, commute
cost more than alter
:
(def c (ref 0)) ; counter
(defn slow-inc
[x]
(Thread/sleep 1000)
(inc x))
(defn add-2
"Create two threads to slow-inc c simultaneously with func.
func can be alter or commute."
[func]
(let [t1 (future (dosync (func c slow-inc)))
t2 (future (dosync (func c slow-inc)))]
@t1 @t2))
(defn -main
[& args]
(dosync (ref-set c 0))
(time (add-2 alter))
(dosync (ref-set c 0))
(time (add-2 commute)))
Result:
"Elapsed time: 2003.239891 msecs" ; alter
"Elapsed time: 4001.073448 msecs" ; commute
Here are the procedures of alter
:
- 0 ms:
t1
'salter
started. - 1 ms:
t2
'salter
started. - 1000 ms:
t1
'salter
successfully,t1
committed,c
became 1. - 1001 ms:
t2
'salter
found out thatc
is different from its snapshot (step 2), retry transaction. - 2001 ms:
t2
'salter
successfully,t2
committed,c
became 2.
And procedures of commute
:
- 0 ms:
t1
'scommute
started. - 1 ms:
t2
'scommute
started. - 1000 ms:
t1
'sreal-commute
started.c
is locked. - 1001 ms:
t2
'sreal-commute
started. It found out thatc
has been locked, so it retry the transaction (step 1). - 1002 ms:
t2
'scommute
started, butc
is locked, so it is blocked. - 2000 ms:
t1
'sreal-commute
ended, transaction committed.c
became 1.t2
is unblocked. - 3002 ms:
t2
'sreal-commute
started. - 4002 ms:
t2
'sreal-commute
ended, transaction committed.c
became 2.
That's why commute
is slower than alter
in this example.
This may look contradict to example of commute from clojuredocs.org. The key difference is, in his example, the delay (100 ms) happened in body of transaction, but the delay happened in slow-inc
in my example. This difference cause his real-commute
phase run very fast, thus reducing the locking time and blocking time. Less locking time means less probability of retrial. That's why in his example, commute
is faster than alter
. Change his inc
into slow-inc
and you will get same observation as mine.
That's all.
I figured it out.
This happens because commute functions are always executed twice.
Commute allows for more potential concurrency than alter because it doesn't lock the identity for the entire duration of the transaction.
Instead, it reads the identity's value once at the beginning of the transaction and when the commute operation is called it returns the commute function applied to THIS VALUE.
It's entirely possible that this value is now out of date because some other thread could have changed it some time between the start of the transaction and the execute of the commute function.
However, integrity is maintained because the commute function is executed AGAIN at commit time when it actually modifies the ref.
This site has a very clear explanation of the difference: http://squirrel.pl/blog/2010/07/13/clojure-alter-vs-commute/
In fact, when commute is called it instantly returns result of running the function on the ref. At the very end of transaction it performs the calculation again, this time synchronously (like alter) updating the ref. That’s why eventually the counter value is 51 even though the last thread printed 45.
So be careful if your commute function has side-effects because they will be executed twice!!