Is there any std::chrono thread safety guarantee even with multicore context?

Yes, calls to some_clock::now() from different threads should be thread safe.

As regards the specific issue you mention with QueryPerformanceCounter, it is just that the Windows API exposes a hardware issue on some platforms. Other OSes may or may not expose this hardware issue to user code.

As far as the C++ standard is concerned, if the clock claims to be a "steady clock" then it must never go backwards, so if there are two reads on the same thread, the second must never return a value earlier than the first, even if the OS switches the thread to a different processor.

For non-steady clocks (such as std::chrono::system_clock on many systems), there is no guarantee about this, since an external agent could change the clock arbitrarily anyway.

With my implementation of the C++11 thread library (including the std::chrono stuff) the implementation takes care to ensure that the steady clocks are indeed steady. This does impose a cost over and above a raw call to QueryPerformanceCounter to ensure the synchronization, but no longer pins the thread to CPU 0 (which it used to do). I would expect other implementations to have workarounds for this issue too.

The requirements for a steady clock are in 20.11.3 [time.clock.req] (C++11 standard)