Troubleshooting SQL connectivity issues after database migration
We have tracked down and fixed this issue in our environment. The description as I understand it is below (please excuse potential inaccuracies below; This is the way I (as a software developer) understand the descriptions given to me by our Network Administrator (who also was working with our hosting company).
The cause was eventually tracked down as a network configuration issue involving the Load Balancer. We had expected that the Load Balancer was sitting between the internet and our web servers, and that all of our servers were communicating freely with each other. Unfortunately the network was set up in such a way that all network traffic (including traffic between the SQL Servers and Web Servers) was passing through the Load Balancer. The Load Balancer was configured to limit bandwidth passing through it, and when the limit was exceeded it simply dropped packets. The limit was often exceeded when large file transfers were occurring between the servers (eg, when database backups were copied off of the database server, etc). This was hard for us to see as we didn't have access to the Load Balancer (only our hosting provider could access it), and as far as we could tell we were far from saturating our network interfaces. Additionally, these issues were extremely sporadic (on the order of a handful of minutes every 3-5 months).
The fix was to rearrange the environment so our internal network traffic did not go through the LB; I believe the network was rearranged to fit a One-armed Load Balancing Architecture. Since making this change we have not experienced the intermittent connectivity issues.