Determining which locks are most contended?
Get a good profiler like YourKit. It can tell you how much time is spent waiting and blocking on particular methods and object monitors contained therein. For example:
In regards to your comment about production metrics, you're quite limited in what you can gather. The most information you're going to get is from the ThreadMXBean which can give you metadata about all the running threads. It won't give you information about the contention of a specific object monitor though.
I don't want to get on my ivory tower here but I really feel that your best bet is to try to replicate your production environment as close as possible. Spending some time getting that set up now will pay dividends many times over in the future.
Even running a profiler with in a simulated-but-not-quite-good-enough environment will probably give you good information.
Ted, I sympathize with your situation, but when performance is that critical, I'd recommend that you bite the bullet and simulate.
It shouldn't be as hard as you fear: instead of trying to generate message flow from your exchanges, why not record the incoming flow and replay it back on the simulation?
Without something like this, you're always going to be running into the Heisenberg problem: affecting the system you're measuring.
For similar problem in database, we log a line just before requesting and immediately after acquiring a lock. We also log one after releasing. We then post-process this data to generate the kind of stats you are looking for.
EDIT: On top of a developed system, AspectJ might be a good option to generate the logs.