Handling out of order execution
I will address your question as one about multithreading in a high-level language, rather than discussing CPU pipeline optimization.
Can anyone explain how to correctly deal with the possibility of reordered operations in a multi-threaded environment?
Most, if not all, modern high-level multithreaded languages provide constructs for managing this potential for the compiler to reorder the logical execution of instructions. In C#, these include field-level constructs (volatile
modifier), block-level constructs (lock
keyword), and imperative constructs (Thead.MemoryBarrier
).
Applying volatile
to a field causes all access to that field in the CPU/memory to be executed in the same relative order in which it occurs in the instruction sequence (source code).
Using lock
around a block of code causes the enclosed instruction sequence to be executed in the same relative order in which it occurs in the parent block of code.
The Thread.MemoryBarrier
method indicates to the compiler that the CPU must not reorder memory access around this point in the instruction sequence. This enables a more advanced technique for specialized requirements.
The techniques above are described in order of increasing complexity and performance. As with all concurrency programming, determining when and where to apply these techniques is the challenge. When synchronizing access to a single field, the volatile
keyword will work, but it could prove to be overkill. Sometimes you only need to synchronize writes (in which case a ReaderWriterLockSlim
would accomplish the same thing with much better performance). Sometimes you need to manipulate the field multiple times in quick succession, or you must check a field and conditionally manipulate it. In these cases, the lock
keyword is a better idea. Sometimes you have multiple threads manipulating shared state in a very loosely-synchronized model to improve performance (not typically recommended). In that case, carefully placed memory barriers can prevent stale and inconsistent data from being used in threads.
Let me ask a question: Given a program code (say it is a single-threaded application), what is the correct execution? Intuitively, executing by CPU in-order as code specifies would be correct. This illusion of sequential execution is what programmers have.
However, modern CPU doesn't obey such restriction. Unless dependences are violated (data dependence, control dependence, and memory dependence), CPUs are executing instructions in out-of-order fashion. However, it is completely hidden to programmers. Programmers can never see what is going on inside of the CPU.
Compilers also exploit such fact. If the program's semantics (i.e., the inherent dependencies in your code) can be preserved, compilers would reorder any possible instruction to achieve better performance. One notable optimization is code hoisting: compilers may hoist load instruction to minimize memory latency. But, don't worry, compilers guarantee its correctness; In any case, compilers will NOT crash your program due to such instructing reordering since compilers must preserve dependencies at least. (But, compilers might have bugs :-)
If you're only considering single-thread application, you do not need to worry about such out-of-order execution either by compilers and CPUs, for a single-thread case.
(To learn more, I recommend you to take a look at the concept of ILP(instruction-level parallelism). Single thread performance is mostly dependent on how much ILP you can extract from a single thread. So, both CPUs and compilers do whatever they can do for better performance.)
However, when you consider multithreaded execution, then it has a potential problem called memory consistency problem. Intuitively programmers have a concept of sequential consistency. However, modern multi-core architectures are doing dirty and aggressive optimizations (e.g., caches and buffers). It is hard to implement sequential consistency with low-overheads in modern computer architecture. So, there could be very confusing situation due to out-of-order executions of memory loads and stores. You may observe some loads and stores had been executed in out-of-order. Read some articles related to relaxed memory models such as Intel x86's memory model (Read Chapter 8, Memory Ordering, of Volume 3A of Intel 64 and IA-32 Architectures Software Developer’s Manual). Memory barriers are needed in this situation where you have to enforce orders of memory instructions for the correctness.
THE ANSWER TO THE QUESTION: It's not easy to answer for this question in short. There is no good tools that detects such out-of-order and problematic behaviors due to memory consistency model (though there are research papers). So, in short, it is even hard for you to find such bugs exist in your code. However, I strongly suggest you to read articles on double-checked locking and its detailed paper. In a double-checked locking, due to relaxed memory consistency and compilers' reordering (note that compilers do not aware multi-threaded behaviors unless you explicitly specify with memory barriers), it may lead misbehavior.
In sum:
- If you're only working on a single-threaded program, then you don't need to worry about out-of-order behaviors.
- On multi-core, you may need to consider memory consistency problems. But, it's actually rare when you really need to worry about memory consistency issue. Mostly, data races, deadlocks, and atomicity violations kill your multi-threaded program.