AtomicInteger lazySet vs. set
A wider discussion of the origins and utility of lazySet and the underlying putOrdered can be found here: http://psy-lob-saw.blogspot.co.uk/2012/12/atomiclazyset-is-performance-win-for.html
To summarize: lazySet is a weak volatile write in the sense that it acts as a store-store and not a store-load fence. This boils down to lazySet being JIT compiled to a MOV instruction that cannot be re-ordered by the compiler rather then the significantly more expensive instruction used for a volatile set.
When reading the value you always end up doing a volatile read(with an Atomic*.get() in any case).
lazySet offers a single writer a consistent volatile write mechanism, i.e. it is perfectly legitimate for a single writer to use lazySet to increment a counter, multiple threads incrementing the same counter will have to resolve the competing writes using CAS, which is exactly what happens under the covers of Atomic* for incAndGet.
Cited straight from "JDK-6275329: Add lazySet methods to atomic classes":
As probably the last little JSR166 follow-up for Mustang, we added a "lazySet" method to the Atomic classes (AtomicInteger, AtomicReference, etc). This is a niche method that is sometimes useful when fine-tuning code using non-blocking data structures. The semantics are that the write is guaranteed not to be re-ordered with any previous write, but may be reordered with subsequent operations (or equivalently, might not be visible to other threads) until some other volatile write or synchronizing action occurs).
The main use case is for nulling out fields of nodes in non-blocking data structures solely for the sake of avoiding long-term garbage retention; it applies when it is harmless if other threads see non-null values for a while, but you'd like to ensure that structures are eventually GCable. In such cases, you can get better performance by avoiding the costs of the null volatile-write. There are a few other use cases along these lines for non-reference-based atomics as well, so the method is supported across all of the AtomicX classes.
For people who like to think of these operations in terms of machine-level barriers on common multiprocessors, lazySet provides a preceeding store-store barrier (which is either a no-op or very cheap on current platforms), but no store-load barrier (which is usually the expensive part of a volatile-write).
lazySet can be used for rmw inter thread communication, because xchg is atomic, as for visibility, when writer thread process modify a cache line location, reader thread's processor will see it at the next read, because the cache coherence protocol of intel cpu will garantee LazySet works, but the cache line will be updated at the next read, again, the CPU has to be modern enough.
http://sc.tamu.edu/systems/eos/nehalem.pdf For Nehalem which is a multi-processor platform, the processors have the ability to “snoop” (eavesdrop) the address bus for other processor’s accesses to system memory and to their internal caches. They use this snooping ability to keep their internal caches consistent both with system memory and with the caches in other interconnected processors. If through snooping one processor detects that another processor intends to write to a memory location that it currently has cached in Shared state, the snooping processor will invalidate its cache block forcing it to perform a cache line fill the next time it accesses the same memory location.
oracle hotspot jdk for x86 cpu architecture->
lazySet == unsafe.putOrderedLong == xchg rw( asm instruction that serve as a soft barrier costing 20 cycles on nehelem intel cpu)
on x86 (x86_64) such a barrier is much cheaper performance-wise than volatile or AtomicLong getAndAdd ,
In an one producer, one consumer queue scenario, xchg soft barrier can force the line of codes before the lazySet(sequence+1) for producer thread to happen BEFORE any consumer thread code that will consume (work on) the new data, of course consumer thread will need to check atomically that producer sequence was incremented by exactly one using a compareAndSet (sequence, sequence + 1).
I traced after Hotspot source code to find the exact mapping of the lazySet to cpp code: http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/9b0ca45cd756/src/share/vm/prims/unsafe.cpp Unsafe_setOrderedLong -> SET_FIELD_VOLATILE definition -> OrderAccess:release_store_fence. For x86_64, OrderAccess:release_store_fence is defined as using the xchg instruction.
You can see how it is exactly defined in jdk7 (doug lea is working on some new stuff for JDK 8): http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/4fc084dac61e/src/os_cpu/linux_x86/vm/orderAccess_linux_x86.inline.hpp
you can also use the hdis to disassemble the lazySet code's assembly in action.
There is another related question: Do we need mfence when using xchg