How expensive is the lock statement?

The technical answer is that this is impossible to quantify, it heavily depends on the state of the CPU memory write-back buffers and how much data that the prefetcher gathered has to be discarded and re-read. Which are both very non-deterministic. I use 150 CPU cycles as a back-of-the-envelope approximation that avoids major disappointments.

The practical answer is that it is waaaay cheaper than the amount of time you'll burn on debugging your code when you think you can skip a lock.

To get a hard number you'll have to measure. Visual Studio has a slick concurrency analyzer available as an extension.


Here is an article that goes into the cost. Short answer is 50ns.


Further reading:

I would like to present few articles of mine, that are interested in general synchronization primitives and they are digging into Monitor, C# lock statement behavior, properties, and costs depending on distinct scenarios and number of threads. It is specifically interested about CPU wastage and throughput periods to understand how much work can be pushed through in multiple scenarios:

https://www.codeproject.com/Articles/1236238/Unified-Concurrency-I-Introduction https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies https://www.codeproject.com/Articles/1242156/Unified-Concurrency-III-cross-benchmarking

Original answer:

Oh dear!

It seems that correct answer flagged here as THE ANSWER is inherently incorrect! I would like to ask the author of the answer, respectfully, to read the linked article to the end. article

The author of the article from 2003 article was measuring on Dual Core machine only and in the first measuring case, he measured locking with a single thread only and the result was about 50ns per lock access.

It says nothing about a lock in the concurrent environment. So we have to continue reading the article and in the second half, the author was measuring locking scenario with two and three threads, which gets closer to concurrency levels of today's processors.

So the author says, that with two threads on Dual Core, the locks cost 120ns, and with 3 threads it goes to 180ns. So it seems to be clearly dependent on the number of threads accessing the lock concurrently.

So it is simple, it is not 50 ns unless it is a single thread, where the lock gets useless.

Another issue for consideration is that it is measured as average time!

If the time of iterations would be measured, there would be even times between 1ms to 20ms, simply because the majority was fast, but few threads will be waiting for processors time and incur even milliseconds long delays.

This is bad news for any kind of application which requires high throughput, low latency.

And the last issue for consideration is that there could be slower operations inside the lock and very often that is the case. The longer the block of code is executed inside the lock, the higher the contention is and delays rise sky high.

Please consider, that over one decade has passed already from 2003, that is few generations of processors designed specifically to run fully concurrently and locking is considerably harming their performance.