Java thread executing remainder operation in a loop blocks all other threads

After all the explanations here (thanks to Peter Lawrey) we found that the main source of this pause is that safepoint inside the loop is reached rather rarely so it takes a long time to stop all threads for JIT-compiled code replacement.

But I decided to go deeper and find why safepoint is reached rarely. I found it a bit confusing why the back jump of while loop is not "safe" in this case.

So I summon -XX:+PrintAssembly in all its glory to help

-XX:+UnlockDiagnosticVMOptions \
-XX:+TraceClassLoading \
-XX:+DebugNonSafepoints \
-XX:+PrintCompilation \
-XX:+PrintGCDetails \
-XX:+PrintStubCode \
-XX:+PrintAssembly \
-XX:PrintAssemblyOptions=-Mintel

After some investigation I found that after third recompilation of lambda C2 compiler threw away safepoint polls inside loop completely.

UPDATE

During the profiling stage variable i was never seen equal to 0. That's why C2 speculatively optimized this branch away, so that the loop was transformed to something like

for (int i = OSR_value; i != 0; i++) {
    if (1 % i == 0) {
        uncommon_trap();
    }
}
uncommon_trap();

Note that originally infinite loop was reshaped to a regular finite loop with a counter! Due to JIT optimization to eliminate safepoint polls in finite counted loops, there was no safepoint poll in this loop either.

After some time, i wrapped back to 0, and the uncommon trap was taken. The method was deoptimized and continued execution in the interpreter. During recompilation with a new knowledge C2 recognized the infinite loop and gave up compilation. The rest of the method proceeded in the interpreter with proper safepoints.

There is a great must-read blog post "Safepoints: Meaning, Side Effects and Overheads" by Nitsan Wakart covering safepoints and this particular issue.

Safepoint elimination in very long counted loops is known to be a problem. The bug JDK-5014723 (thanks to Vladimir Ivanov) addresses this problem.

The workaround is available until the bug is finally fixed.

  1. You can try using -XX:+UseCountedLoopSafepoints (it will cause overall performance penalty and may lead to JVM crash JDK-8161147). After using it C2 compiler continue keeping safepoints at the back jumps and original pause disappears completely.
  2. You can explicitly disable compilation of problematic method by using
    -XX:CompileCommand='exclude,binary/class/Name,methodName'

  3. Or you can rewrite your code by adding safepoint manually. For example Thread.yield() call at the end of cycle or even changing int i to long i (thanks, Nitsan Wakart) will also fix pause.


Found the answer of why. They are called safepoints, and are best known as the Stop-The-World that happens because of GC.

See this articles: Logging stop-the-world pauses in JVM

Different events can cause the JVM to pause all the application threads. Such pauses are called Stop-The-World (STW) pauses. The most common cause for an STW pause to be triggered is garbage collection (example in github) , but different JIT actions (example), biased lock revocation (example), certain JVMTI operations , and many more also require the application to be stopped.

The points at which the application threads may be safely stopped are called, surprise, safepoints. This term is also often used to refer to all the STW pauses.

It is more or less common that GC logs are enabled. However, this does not capture information on all the safepoints. To get it all, use these JVM options:

-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime

If you are wondering about the naming explicitly referring to GC, don’t be alarmed – turning on these options logs all of the safepoints, not just garbage collection pauses. If you run a following example (source in github) with the flags specified above.

Reading the HotSpot Glossary of Terms, it defines this:

safepoint

A point during program execution at which all GC roots are known and all heap object contents are consistent. From a global point of view, all threads must block at a safepoint before the GC can run. (As a special case, threads running JNI code can continue to run, because they use only handles. During a safepoint they must block instead of loading the contents of the handle.) From a local point of view, a safepoint is a distinguished point in a block of code where the executing thread may block for the GC. Most call sites qualify as safepoints. There are strong invariants which hold true at every safepoint, which may be disregarded at non-safepoints. Both compiled Java code and C/C++ code be optimized between safepoints, but less so across safepoints. The JIT compiler emits a GC map at each safepoint. C/C++ code in the VM uses stylized macro-based conventions (e.g., TRAPS) to mark potential safepoints.

Running with the above mentioned flags, I get this output:

Application time: 0.9668750 seconds
Total time for which application threads were stopped: 0.0000747 seconds, Stopping threads took: 0.0000291 seconds
timeElapsed=1015
Application time: 1.0148568 seconds
Total time for which application threads were stopped: 0.0000556 seconds, Stopping threads took: 0.0000168 seconds
timeElapsed=1015
timeElapsed=1014
Application time: 2.0453971 seconds
Total time for which application threads were stopped: 10.7951187 seconds, Stopping threads took: 10.7950774 seconds
timeElapsed=11732
Application time: 1.0149263 seconds
Total time for which application threads were stopped: 0.0000644 seconds, Stopping threads took: 0.0000368 seconds
timeElapsed=1015

Notice the third STW event:
Total time stopped: 10.7951187 seconds
Stopping threads took: 10.7950774 seconds

JIT itself took virtually no time, but once the JVM had decided to perform a JIT compilation, it entered STW mode, however since the code to be compiled (the infinite loop) doesn't have a call site, no safepoint was ever reached.

The STW ends when JIT eventually gives up waiting and concludes the code is in an infinite loop.


In short, the loop you have has no safe point inside it except when i == 0 is reached. When this method is compiled and triggers the code to be replaced it needs to bring all the threads to a safe point, but this takes a very long time, locking up not just the thread running the code but all threads in the JVM.

I added the following command line options.

-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintCompilation

I also modified the code to use floating point which appears to take longer.

boolean b = 1.0 / i == 0;

And what I see in the output is

timeElapsed=100
Application time: 0.9560686 seconds
  41423  280 %     4       TestBlockingThread::lambda$main$0 @ -2 (27 bytes)   made not entrant
Total time for which application threads were stopped: 40.3971116 seconds, Stopping threads took: 40.3967755 seconds
Application time: 0.0000219 seconds
Total time for which application threads were stopped: 0.0005840 seconds, Stopping threads took: 0.0000383 seconds
  41424  281 %     3       TestBlockingThread::lambda$main$0 @ 2 (27 bytes)
timeElapsed=40473
  41425  282 %     4       TestBlockingThread::lambda$main$0 @ 2 (27 bytes)
  41426  281 %     3       TestBlockingThread::lambda$main$0 @ -2 (27 bytes)   made not entrant
timeElapsed=100

Note: for code to be replaced, threads have to be stopped at a safe point. However it appears here that such a safe point is reached very rarely (possibly only when i == 0 Changing the task to

Runnable task = () -> {
    for (int i = 1; i != 0 ; i++) {
        boolean b = 1.0 / i == 0;
    }
};

I see a similar delay.

timeElapsed=100
Application time: 0.9587419 seconds
  39044  280 %     4       TestBlockingThread::lambda$main$0 @ -2 (28 bytes)   made not entrant
Total time for which application threads were stopped: 38.0227039 seconds, Stopping threads took: 38.0225761 seconds
Application time: 0.0000087 seconds
Total time for which application threads were stopped: 0.0003102 seconds, Stopping threads took: 0.0000105 seconds
timeElapsed=38100
timeElapsed=100

Adding code to the loop carefully you get a longer delay.

for (int i = 1; i != 0 ; i++) {
    boolean b = 1.0 / i / i == 0;
}

gets

 Total time for which application threads were stopped: 59.6034546 seconds, Stopping threads took: 59.6030773 seconds

However, change the code to use a native method which always has a safe point (if it is not an intrinsic)

for (int i = 1; i != 0 ; i++) {
    boolean b = Math.cos(1.0 / i) == 0;
}

prints

Total time for which application threads were stopped: 0.0001444 seconds, Stopping threads took: 0.0000615 seconds

Note: adding if (Thread.currentThread().isInterrupted()) { ... } to a loop adds a safe point.

Note: This happened on a 16 core machine so there is no lack of CPU resources.