Volatile in C++11
Whether it is optimized out depends entirely on compilers and what they choose to optimize away. The C++98/03 memory model does not recognize the possibility that x
could change between the setting of it and the retrieval of the value.
The C++11 memory model does recognize that x
could be changed. However, it doesn't care. Non-atomic access to variables (ie: not using std::atomic
s or proper mutexes) yields undefined behavior. So it's perfectly fine for a C++11 compiler to assume that x
never changes between the write and reads, since undefined behavior can mean, "the function never sees x
change ever."
Now, let's look at what C++11 says about volatile int x;
. If you put that in there, and you have some other thread mess with x
, you still have undefined behavior. Volatile does not affect threading behavior. C++11's memory model does not define reads or writes from/to x
to be atomic, nor does it require the memory barriers needed for non-atomic reads/writes to be properly ordered. volatile
has nothing to do with it one way or the other.
Oh, your code might work. But C++11 doesn't guarantee it.
What volatile
tells the compiler is that it can't optimize memory reads from that variable. However, CPU cores have different caches, and most memory writes do not immediately go out to main memory. They get stored in that core's local cache, and may be written... eventually.
CPUs have ways to force cache lines out into memory and to synchronize memory access among different cores. These memory barriers allow two threads to communicate effectively. Merely reading from memory in one core that was written in another core isn't enough; the core that wrote the memory needs to issue a barrier, and the core that's reading it needs to have had that barrier complete before reading it to actually get the data.
volatile
guarantees none of this. Volatile works with "hardware, mapped memory and stuff" because the hardware that writes that memory makes sure that the cache issue is taken care of. If CPU cores issued a memory barrier after every write, you can basically kiss any hope of performance goodbye. So C++11 has specific language saying when constructs are required to issue a barrier.
volatile
is about memory access (when to read); threading is about memory integrity (what is actually stored there).
The C++11 memory model is specific about what operations will cause writes in one thread to become visible in another. It's about memory integrity, which is not something volatile
handles. And memory integrity generally requires both threads to do something.
For example, if thread A locks a mutex, does a write, and then unlocks it, the C++11 memory model only requires that write to become visible to thread B if thread B later locks it. Until it actually acquires that particular lock, it's undefined what value is there. This stuff is laid out in great detail in section 1.10 of the standard.
Let's look at the code you cite, with respect to the standard. Section 1.10, p8 speaks of the ability of certain library calls to cause a thread to "synchronize with" another thread. Most of the other paragraphs explain how synchronization (and other things) build an order of operations between threads. Of course, your code doesn't invoke any of this. There is no synchronization point, no dependency ordering, nothing.
Without such protection, without some form of synchronization or ordering, 1.10 p21 comes in:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
Your program contains two conflicting actions (reading from x
and writing to x
). Neither is atomic, and neither is ordered by synchronization to happen before the other.
Thus, you have achieved undefined behavior.
So the only case where you get guaranteed multithreaded behavior by the C++11 memory model is if you use a proper mutex or std::atomic<int> x
with the proper atomic load/store calls.
Oh, and you don't need to make x
volatile too. Anytime you call a (non-inline) function, that function or something it calls could modify a global variable. So it cannot optimize away the read of x
in the while
loop. And every C++11 mechanism to synchronize requires calling a function. That just so happens to invoke a memory barrier.