SQL Server: Accurate Row Estimates without Histogram?
First, as you noticed, SQL Server will automatically create missing statistics (where possible) when compiling an execution plan, if the database option AUTO_CREATE_STATISTICS
is on.
When statistics are not available, the default guess for an unknown inequality predicate is 0.3 (30%). When you use a local variable, the value in the variable cannot be sniffed (unless OPTION (RECOMPILE)
is also specified). With a table cardinality of 316, the estimate is 0.3 * 316 = 94.8.
When you use a constant literal, the value is sniffed. Without statistics, it cannot use this sniffed value to check the histogram as it normally would, but it can judge its impact on the CHECK
constraint (the one limiting Rate values to between $6.50 and $200).
If the sniffed value does not exclude the check constraint range completely, the estimate is based on the 0.09 (9%) guessed selectivity for a BETWEEN
predicate (from the check constraint). 316 * 0.09 = 28.44.
If the sniffed value does exclude the check constraint range completely, the estimate is always 1 row (the cardinality estimator (almost) never produces an estimated number of rows less than 1).
If the query is simple enough to qualify for a trivial plan, and it is considered safe for simple parameterization, the constant literal is replaced with a parameter marker e.g. @1
. This happens for a query like:
SELECT *
FROM HumanResources.EmployeePayHistory AS EPH
WHERE EPH.Rate > $200;
The execution plan shows an estimate of 1 row:
And the Scan Predicate shows a parameter marker:
If simple parameterization is prevented, e.g. by adding a clause that compares a constant with a constant:
SELECT *
FROM HumanResources.EmployeePayHistory AS EPH
WHERE EPH.Rate > $200
AND 1 = 1;
Without parameterization, this plan can never be reused with a different value for the parameter, so the optimizer can statically eliminate table access on the basis that the check constraint prevents any rows being returned. The resuting plan is:
Finally, be careful with types. The type of the Rate column is money, not decimal. Conversions can affect cardinality estimation in complicated ways. Specify a money literal with the $ prefix, or use an explicit CAST
or CONVERT
.
The internal details of cardinality estimation are not documented publicly (probably to avoid endless ever more detailed questions and complaints when things change), but there are a range of resources out there to aid you in this area. Just remember most of this is unoffical and so not supported by anyone. Some aspects are actually dangerous.
Some apply only to the original (pre-2014) cardinality estimator, some explain general principles applicable to both, and some are applicable to the "new" CE only. The following is not meant as an authoritative or complete list, just the ones that came to mind immediately:
- The SQL Server 2014 Cardinality Estimator
- 13 Things You Should Know About Statistics and the Query Optimizer
- Cardinality Estimation (SQL Server)
- Constant Folding and Expression Evaluation During Cardinality Estimation
- Statistics Used by the Query Optimizer
- Plan Caching and Recompilation
- Nacho Alonso Portillo's blog (MSFT) (search for cardinality)
- Dima Piliugin's blog
- SQLblog.com posts
- SQLperformance.com posts
- Microsoft SQL Server Internals (book) Kalen Delaney et al.