How to hint many-to-many join in SQL Server?
Just to be clear, the optimizer already knows that it's a many-to-many join. If you force merge joins and look at an estimated plan you can see a property for the join operator which tells you if the join could be many-to-many. The problem that you need to solve here is bumping up the cardinality estimates, presumably so you get a more efficient query plan for the part of the query that you left out.
The first thing that I would try is putting the results of the join from Object3
and Object5
into a temp table. For the plan that you posted it's just a single column on 51393 rows, so it should hardly take up any space in tempdb. You can gather full stats on the temp table and that alone might be enough to get a sufficient accurate final cardinality estimate. Gathering full stats on Object1
may help as well. Cardinality estimates often get worse as you traverse from a plan from right to left.
If that doesn't work you can try the ENABLE_QUERY_OPTIMIZER_HOTFIXES
query hint if you don't already have it enabled at the database or server level. Microsoft locks plan-affecting performance fixes for SQL Server 2016 behind that setting. Some of them relate to cardinality estimates, so perhaps you'll get lucky and one of the fixes will help with your query. You can also try using the legacy cardinality estimator with a FORCE_LEGACY_CARDINALITY_ESTIMATION
query hint. Certain data sets may get better estimates with the legacy CE.
As a last resort you can manually increase the cardinality estimate by whatever factor you like using Adam Machanic's MANY()
function. I talk about it in another answer but it looks like the link is dead. If you're interested I can try to dig something up.
SQL Server statistics only contain a histogram for the leading column of the statistics object. Therefore, you could create filtered stats that provide a histogram of values for Key2
, but only among rows with Key1 = 1
. Creating these filtered statistics on each table fixes the estimates and leads to the behavior you expect for the test query: each new join does not impact the final cardinality estimate (confirmed in both SQL 2016 SP1 and SQL 2017).
-- Note: Add "WITH FULLSCAN" to each if you want a perfect 20,000 row estimate
CREATE STATISTICS st_#Table1 ON #Table1 (Key2) WHERE Key1 = 1
CREATE STATISTICS st_#Table2 ON #Table2 (Key2) WHERE Key1 = 1
CREATE STATISTICS st_#Table3 ON #Table3 (Key2) WHERE Key1 = 1
Without these filtered statistics, SQL Server will take a more heuristic-based approach to estimating the cardinality of your join. The following whitepaper contains good high-level descriptions of some of the heuristics that SQL Server uses: Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator.
For example, adding the USE HINT('ASSUME_JOIN_PREDICATE_DEPENDS_ON_FILTERS')
hint to your query will change the join containment heuristic to assume some correlation (rather than independence) between the Key1
predicate and the Key2
join predicate, which may be beneficial to your query. For the final test query, this hint increases the cardinality estimate from 1,175
to 7,551
, but is still quite a bit shy of the correct 20,000
row estimate produced with the filtered statistics.
Another approach we've used in similar situations is to extract the relevant subset of the data into #temp tables. Especially now that newer versions of SQL Server no longer eagerly write #temp tables to disk, we've had good results with this approach. Your description of your many-to-many join implies that each individual #temp table in your case would be relatively small (or at least smaller than the final result set), so this approach might be worth trying.
DROP TABLE IF EXISTS #Table1_extract, #Table2_extract, #Table3_extract, #c
-- Extract only the subset of rows that match the filter predicate
-- (Or better yet, extract only the subset of columns you need!)
SELECT * INTO #Table1_extract FROM #Table1 WHERE Key1 = 1
SELECT * INTO #Table2_extract FROM #Table2 WHERE Key1 = 1
SELECT * INTO #Table3_extract FROM #Table3 WHERE Key1 = 1
-- Now perform the join on those extracts, removing the filter predicate
SELECT col = 1
INTO #c
FROM #Table1_extract t1
JOIN #Table2_extract t2
ON t1.Key2 = t2.Key2
JOIN #Table3_extract t3
ON t1.Key2 = t3.Key2