Why is query with phone = N'1234' slower than phone = '1234'?
Because nvarchar
has higher datatype precedence than varchar
so it needs to perform an implicit cast of the column to nvarchar
and this prevents an index seek.
Under some collations it is able to still use a seek and just push the cast
into a residual predicate against the rows matched by the seek (rather than needing to do this for every row in the entire table via a scan) but presumably you aren't using such a collation.
The effect of collation on this is illustrated below. When using the SQL collation you get a scan, for the Windows collation it calls the internal function GetRangeThroughConvert
and is able to convert it into a seek.
CREATE TABLE [dbo].[phone]
(
phone1 VARCHAR(500) COLLATE sql_latin1_general_cp1_ci_as CONSTRAINT uq1 UNIQUE,
phone2 VARCHAR(500) COLLATE latin1_general_ci_as CONSTRAINT uq2 UNIQUE,
);
SELECT phone1 FROM [dbo].[phone] WHERE phone1 = N'5554474477';
SELECT phone2 FROM [dbo].[phone] WHERE phone2 = N'5554474477';
The SHOWPLAN_TEXT
is below
Query 1
|--Index Scan(OBJECT:([tempdb].[dbo].[phone].[uq1]), WHERE:(CONVERT_IMPLICIT(nvarchar(500),[tempdb].[dbo].[phone].[phone1],0)=CONVERT_IMPLICIT(nvarchar(4000),[@1],0)))
Query 2
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1005], [Expr1006], [Expr1004]))
|--Compute Scalar(DEFINE:(([Expr1005],[Expr1006],[Expr1004])=GetRangeThroughConvert([@1],[@1],(62))))
| |--Constant Scan
|--Index Seek(OBJECT:([tempdb].[dbo].[phone].[uq2]), SEEK:([tempdb].[dbo].[phone].[phone2] > [Expr1005] AND [tempdb].[dbo].[phone].[phone2] < [Expr1006]), WHERE:(CONVERT_IMPLICIT(nvarchar(500),[tempdb].[dbo].[phone].[phone2],0)=[@1]) ORDERED FORWARD)
In the second case the compute scalar emits the following values
Expr1004 = 62
Expr1005 = '5554474477'
Expr1006 = '5554474478'
the seek predicate shown in the plan is on phone2 > Expr1005 and phone2 < Expr1006
so on the face of it would exclude '5554474477'
but the flag 62
means that this does match.
Other answers already explain what happens; we've seen NVARCHAR
has higher type precedence than VARCHAR
. I want to explain why the database must cast every row for the column as an NVARCHAR
, rather than casting the single supplied value as VARCHAR
, even though the second option is clearly much faster, both intuitively and empirically.
Casting from NVARCHAR
to VARCHAR
is a narrowing conversion. That is, NVARCHAR
has potentially more information than a similar VARCHAR
value. It's not possible to represent every NVARCHAR
input with a VARCHAR
output, so casting from the former to the latter potentially loses some information. But the opposite cast is a widening conversion. Casting from a VARCHAR
value to an NVARCHAR
value never loses information; it's safe.
The principle is for Sql Server to always choose the safe conversion when presented with two mismatched types. It's the same old "correctness trumps performance" mantra. Or, to paraphrase Benjamin Franklin, "He who would trade essential correctness for a little performance deserve neither correctness nor performance." The type precedence rules, then, are designed to ensure the safe conversions are chosen.
Now you and I both know your narrowing conversion is also safe for this particular data, but the Sql Server query optimizer doesn't care about that. For better or worse, it sees the data type information first when building the execution plan and follows the type precedence rules.
Here's the real kicker: now we're making this cast, we have to do it for every row in the table. This is true even for rows which would not otherwise match the comparison filter. Morever, the cast values from the columns are no longer the same as the values stored in an index, such that any index on the column is now worthless for this query.
I think you're very lucky to be getting an index scan for this query, rather than a full table scan, and it's likely because there is a covering index that meets the needs of the query (the optimizer can choose to cast all the records in the index as easily as all the records in the table).
You can fix things for this query by explicitly resolving the type mismatch in a more favorable way. The best way to accomplish this is, of course, supplying a plain VARCHAR
in the first place and avoid any need for casting/conversion at all:
SELECT * FROM [dbo].[phone] WHERE phone = '5554474477'
But I suspect what we're seeing is a value provided by an application, where you don't necessarily control that part of the literal. If so, you can still do this:
SELECT * FROM [dbo].[phone] WHERE phone = cast(N'5554474477' as varchar(20))
Either example favorably resolves the type mismatch from the original code. Even with the latter situation, you may have more control over the literal than you know. For example, if this query was created from a .Net program the problem is possibly related to the AddWithValue()
function. I've written about this issue in the past and how to handle it correctly.
These fixes also help demonstrate why things are this way.
It may be possible at some point in the future the Sql Server developers enhance the query optimizer to look at situations where type precedence rules cause a per-row conversion resulting in a table or index scan, but the opposite conversion involves constant data and could be just an index seek, and in that case first look at the data to see if it would also be safe. However, I find it unlikely they will ever do this. In my opinion, the corrections to queries within the existing system are too easy relative to the additional performance cost completing the evaluation for individual queries and the complexity in understanding what the optimizer is doing ("Why didn't the server follow the documented precedence rules here?") to justify it.
SELECT * FROM [dbo].[phone] WHERE phone = N'5554474477'
is interpreted as
SELECT * from [dbo].[phone] WHERE CAST(phone as NVARCHAR) = N'5554474477'
which prevents index usage