Apache+Tomcat having problems communicating. Unclear error messages. Bringing down websites hosted under Tomcat

Solution 1:

It turns out that this version (classes12 - quite old) of the Oracle driver had various bugs in it that caused a deadlock (as seen in the TP-Processor2 state quoted above). It didn't become active until we switched to the new environment. Upgrading to the latest version (ojdbc14) has resolved the issue on the primary server.

Solution 2:

From the description, I'd suggest the problem may be due to the database queries taking too long. If the queries are taking longer, request will take longer and therefore you'll have more of them running at once. As you're seeing, you're running out of tomcat threads. When you solve the problem with the database you should be okay.

  • Get a stack trace, either using jstack or using kill -3 $process_id. See what your threads are doing when it dies. If they're all waiting on the database, that's a good pointer to my theory. They might all be waiting on some lock.
  • Install LambdaProbe. It's invaluable for finding out what your tomcat is doing.
  • Upgrade your tomcat. 5.5.8 is incredibly old. I think they're now on 5.5.27.

Solution 3:

Add connectionTimeout and keepAliveTimeout to your AJP connector found in /etc/tomcat7/server.xml.

<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" 
           connectionTimeout="10000" keepAliveTimeout="10000" />

Info about the AJP connector at https://tomcat.apache.org/tomcat-7.0-doc/config/ajp.html

  • connectionTimeout = The number of milliseconds this Connector will wait, after accepting a connection, for the request URI line to be presented. The default value for AJP protocol connectors is -1 (i.e. infinite).

  • keepAliveTimeout = The number of milliseconds this Connector will wait for another AJP request before closing the connection. The default value is to use the value that has been set for the connectionTimeout attribute.

If connectionTimeout and keepAliveTimeout values is not defined, then AJP connections will be kept alive for infinite. Causing to many threads, default max threads is 200.

I recommend installing psi-probe - an advanced manager and monitor for Apache Tomcat, forked from Lambda Probe. https://code.google.com/p/psi-probe/


Solution 4:

Because of the way AJP works, the persistent connections between apache (using either mod_proxy_ajp or mod_jk) can only be safely closed by the client. In this case, the client is the apache worker that opens, and then holds a connection to tomcat for the life for the worker process.

Because of this behavior you cannot have more apache workers than tomcat worker threads. Doing so will cause additional http workers to fail to connect to tomcat (as the accept queue is full) and will mark your backend as DOWN!


Solution 5:

I've had better results with mod_proxy instead of mod_ajp in terms of stability, so try that solution. It's non-invasive - at best it will solve the problem and at worst it will rule out mod_ajp.

Other that that, sounds like your Tomcats stop responding and all request threads are tied up. Have your dev team look into what's going on - taking a thread dump and delivering it to them will be useful.