Java memory model - volatile and x86
The X86 provides TSO. So, on a hardware level, the following barriers you get for free [LoadLoad][LoadStore][StoreStore]. The only one missing is the [StoreLoad].
A load has acquire semantics
r1=X
[LoadLoad]
[LoadStore]
A store has release semantics
[LoadStore]
[StoreStore]
Y=r2
If you would do a store followed by a load you end up with this:
[LoadStore]
[StoreStore]
Y=r2
r1=X
[LoadLoad]
[LoadStore]
The issue is that the load and store can still be reordered and hence it isn't sequential consistent; and this is mandatory for the Java Memory model. They only way to prevent this is with a [StoreLoad].
[LoadStore]
[StoreStore]
Y=r2
[StoreLoad]
r1=X
[LoadLoad]
[LoadStore]
And the most logical place would be to add it to the write since normally reads are more frequent than writes. So the write would become:
[LoadStore]
[StoreStore]
Y=r2
[StoreLoad]
Because the X86 provides TSO, the following fences can be no-ops:
[LoadLoad][LoadStore][StoreStore]
So the only one relevant is the [StoreLoad] and this can be accomplished by an MFENCE
or a lock addl %(RSP),0
The LFENCE and the SFENCE are not relevant for this situation. The LFENCE and SFENCE are for weakly ordered loads and stores (e.g. those of SSE).
What the [StoreLoad] does on the X86 is to stop executing loads, till the store buffer has been drained. This will make sure that the load is globally visible (so read from memory/cache) AFTER the store has become globally visible (has left the store buffer and entered the L1d).
On x86, the buffers are pinned to the cache line. If the cache line is lost, the value in the buffer isn't used. So there's no need to fence or drain the buffers; the value they contain must be current because another core can't modify the data without first invalidating the cache line.