JBoss threads waiting on random monitor
This sort of behaviour is to be expected. As you scale up a load test, you're always going to find bottlenecks, and in a complex system, those bottlenecks are going to shift around.
Your job is to identify those bottlenecks and try to fix them, one at a time. Each time you do, you'll always find another one, but hopefully the system will scale better each time. It's not easy, but then scaling for load isn't easy.
Take your 1st example. You have lots of calls to log4j's
Logger.debug()
method. Log4j doesn't perform well when logging under load, so you need to take some precautions. Even if your log4j config says "do not log DEBUG messages", log4j still has to do some work before realising this. The recommended approach to handle to is to wrap everyLogger.debug()
call in aif Logger.isDebugEnabled()
{ Logger.debug(); }` block. This should shift that particular bottleneck.In your 2ndexample, you're calling XOM's
Node.query()
method. This method has to recompile the XPath expression on every call, and this seems to be a bottleneck. Find an API where you can pre-compile the XPath expression and re-use it.In the 3rd example, you're reading a
File
. This isn't a good idea in a high-load system, file-io is slow when you're doing a large number of small operations. Consider re-implementing this to work a different way if you can.
All of these are unrelated, but all present performance bottlenecks you'll see when scaling for load. You'll never get rid of them all, but hopefully you can get it to a point where it's good enough.
I set up the application in Tomcat running through Eclipse and did not see the problem. Eventually I found we were starting JBoss using a 32-bit Windows service wrapper, even though we were using a 64-bit JDK. The machine was 64-bit. I'm not sure how this would even work? At any rate, changing to a 32-bit JDK caused the crazy problem to go away and I was able to move on with my life.