Why does my query run fast in Environment A, but slow in Environment B?
There are many factors both internal and external to SQL Server that can cause the same query to perform differently across different environments, even when they are configured close to exactly the same, any one of which can result in very different query plans and performance.
Server
- Is the hardware across environments the same (disks, memory, CPU, etc.) ?
- If VMs are being used, could noisy neighbors affect overall VM performance?
- If in the cloud, do autoscaling and other configurations have parity?
- Are environments mixed between physical/virtual/cloud?
- Do the OS versions match?
- Are environments in different datacenters?
Instance
- Are the SQL Server versions the same?
- A CU or SP can make a world of a difference even if the major versions are the same.
- Is the active workload during the query execution(s) comparable?
- Are the same volume of queries present in all environments?
- Is the nature of the workload the same across all environments?
- Do all environments participate in the same HA/DR setup?
- Many times lower environments will not have Availability Groups, Log Shipping, or Replication setup while production / DR may be using these technologies.
- Do the same maintenance jobs get run on the same schedules in all environments?
- Are trace flags equivalent across all environments?
- Are the same backup jobs running in all environments?
- Impact from backups should be minimal, but often they are not run at all on lower environments.
- Are the sys configurations the same?
Database
- Are the schemas / indexes / statistics / objects all equal across environments?
- Does the exact same data exist across environments?
- Amount of data
- Distribution of data
- Size of data (think dummy data in variable length data types that may not reflect the size of actual values in other environments)
- Are database level configs the same?
- Are compatibility levels the same?
With all of this in mind, it is not surprising that in many cases it is simply not possible to perfectly copy every aspect of a database across different environments. While testing can give a good degree of certainty as to how queries will perform in each environment, it should be no surprise if there are discrepancies between environments. When developing a new query, additional tuning is often needed as it moves towards production.
Usually, tuning a query that is slower in one environment should not cause regressions in the execution plans generated, so it is an opportunity to adjust indexes, statistics, or the query itself for an overall improvement.
Final note: lower environments are undersized and often should not be expected to give the same performance as a production or pre-production environment.
More Resources:
- Different Plans for "Identical" Servers by Aaron Bertrand
The other answers are good, but I would add that you should consider the amount of data in Environment B, and any contention with other queries.
Some SQL queries show no performance problems in isolation (e.g. 1000 rows in table, no other queries running) , but can be horror-show with 10,000,000 rows in table (e.g. parameter sniffing issues), and/or other queries potentially writing to, updating or locking the tables involved.
I agree with the other answers on checking hardware/environments/configurations match first, but if nothing obvious comes up, start looking at query execution plans, running SQL Profiler etc.
In short, you need to isolate whether the db itself is slow compared to the other, or its environment is slower. Rule out the easiest things first.
This has happened to me on a few occasions. Each time it turned out to be environment: someone else was hammering away and starving the db of IOPS on one server.
Run a top(1) on the slower server and see if the CPU is experiencing high amounts of wait states, or cpu stealing if you're in a virtual environment.
This will also help point to missing indexes that cause execution plans to perform full table scans instead of index scans (but that's easy to spot with slow query logging). This will also show up in ps as procs in a D state.
Once you've ruled that out, it's time to dig deeper into the hardware: is the work being spread out across all the CPUs, has a network port renegotiated itself to 100Mb. Run vmstat and/or iostat on both machine and compare the differences.
If the datasets are identical, does the same query generate the same execution plan on both? Do tables contain the same numbers of rows? Are the index definitions identical? Do the tables have similar levels of fragmentation? Similar numbers of active connections?