Lock by synchronize is acquired by shortest waiting threads
There are many sources, such as this, that already indicate that there should be no assumption regarding the order in which threads acquire locks. But it doesn't mean the order has to be scrambled.
It probably depends at the very least on the JVM implementation. For example, this document about HotSpot says:
Contended synchronization operations use advanced adaptive spinning techniques to improve throughput even for applications with significant amounts of lock contention. As a result, synchronization performance becomes so fast that it is not a significant performance issue for the vast majority of real-world programs.
...
In the normal case when there's no contention, the synchronization operation will be completed entirely in the fast-path. If, however, we need to block or wake a thread (in monitorenter or monitorexit, respectively), the fast-path code will call into the slow-path. The slow-path implementation is in native C++ code while the fast-path is emitted by the JITs.
I'm not an expert on HotSpot (maybe someone else can provide a more authoritative answer), but based on the C++ code, it looks like the contending threads will be pushed onto a LIFO structure, which may explain the stack-like order you observed:
// * Contending threads "push" themselves onto the cxq with CAS // and then spin/park. ... // Cxq points to the set of Recently Arrived Threads attempting entry. // Because we push threads onto _cxq with CAS, the RATs must take the form of // a singly-linked LIFO.