How to Troubleshoot Intermittent SQL Timeout Errors

Performance problems boil down to CPU, IO, or Lock contention. It sounds like you have ruled out IO. I would guess CPU is not a problem since this is a database, not a number cruncher. So, that leaves lock contention.

If you can execute a sp_who2 while the queries are timing out, you can use the BlkBy column to trace back to the holding the lock that everyone else is waiting on. Since this is only happening a few times a day, you may have trouble catching enough data if you are running this manually, so I suggest you rig up an automated system to dump this output on a regular basis, or maybe to be triggered by the application timeout exceptions. You can also use the Activity Monitor to watch the degradation of query responsiveness in real-time, as suggested by peer.

Once you find the long-running query and the application that executes it, you can immediately resolve the domino of timeouts by reducing the timeout for that single application below all the others (right now, it must be longer). Then, you should inspect the code to determine a better solution. You could reduce the time the lock is held by committing the transaction sooner within a sproc, or reduce the lock required by the reading query with hints such as NOLOCK or UPDLOCK.

Here's some more reading on sp_who2: http://sqlserverplanet.com/dba/using-sp_who2/

And query hints: http://msdn.microsoft.com/en-us/library/ms181714.aspx http://msdn.microsoft.com/en-us/library/ms187373.aspx


Run SQL trace of long running queries and deadlocks. This shows no deadlocks at the times of the problems, and long running queries all coincide with our timeout errors, but look to be a side effect, and not the cause. Queries that are very basic that typically return instantly end up taking 30, 60 or 120 seconds to run at times. This happens for a few minutes then everything picks up and works fine after that.

It looks like some queries/transaction lock your database till they are done. You have to find out which queries are blocking and rewrite them/run them at an other time to avoid blocking other processes. At this moment the waiting queries just timeout.

An extra point to dig into is the auto increment size of your transaction log and database. Set them on a fixed size instead of a percentage of the current files. If files are getting taller the time it takes to allocate enough space will eventually longer as your transaction timeout. And your db comes to a halt.