Can I do a sargable first letter match on two tables?

If the lastname column is indexed in at least one of the tables then you could also use LIKE

SELECT *
FROM   persons p
       INNER JOIN persons2 p2
               ON p2.lastname LIKE LEFT(p.lastname, 1) + '%'

enter image description here

The plan for this can have a seek on the table specified to the left of the like.

i.e. ON p.lastname LIKE LEFT(p2.lastname, 1) + '%' would not be able to make use of the index on persons2that was used above but could seek one on persons.

The suggestion in the other answer of indexing a calculated column on both sides is more flexible however. As for a nested loops plan either table can be on the inside and it would also allow a many to many merge join without requiring a sort.

Create a view on the tables with a persisted computed column defined as the LEFT(lastname, 1) of each table, then compare the computed persisted column values.

Here is a test-bed showing how to do that:

Click to copy

CREATE TABLE dbo.Persons
(
    PersonID int NOT NULL
        CONSTRAINT PK_Persons
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
    , FirstName nvarchar(500) NOT NULL
    , LastName nvarchar(500) NOT NULL
);

CREATE TABLE dbo.Persons2
(
    PersonID int NOT NULL
        CONSTRAINT PK_Persons2
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
    , FirstName nvarchar(500) NOT NULL
    , LastName nvarchar(500) NOT NULL
);

GO
CREATE VIEW dbo.PersonsView
WITH SCHEMABINDING
AS
SELECT p1.PersonID
    , p1.FirstName
    , p1.LastName 
    , LastNameInitial = LEFT(p1.LastName, 1)
FROM dbo.Persons p1;
GO
CREATE VIEW dbo.PersonsView2
WITH SCHEMABINDING
AS
SELECT p2.PersonID
    , p2.FirstName
    , p2.LastName 
    , LastNameInitial = LEFT(p2.LastName, 1)
FROM dbo.Persons p2;
GO
CREATE UNIQUE CLUSTERED INDEX CX_PersonsView
ON dbo.PersonsView(PersonID);
CREATE NONCLUSTERED INDEX IX_PersonsView_LastNameInitial
ON dbo.PersonsView(LastNameInitial)
INCLUDE (FirstName, LastName);

CREATE UNIQUE CLUSTERED INDEX CX_PersonsView2
ON dbo.PersonsView2(PersonID);
CREATE NONCLUSTERED INDEX IX_PersonsView2_LastNameInitial
ON dbo.PersonsView2(LastNameInitial)
INCLUDE (FirstName, LastName);

CREATE STATISTICS ST_PersonsView_001
ON dbo.PersonsView(LastName);

CREATE STATISTICS ST_PersonsView2_001
ON dbo.PersonsView2(LastName);

Here, we'll insert some sample data:

Click to copy

INSERT INTO dbo.Persons(FirstName, LastName)
VALUES ('Max', 'Vernon')
    , ('Joe', 'Black');

INSERT INTO dbo.Persons2(FirstName, LastName)
VALUES ('Max', 'Vernon')
    , ('Joe', 'Black');

Here's the SELECT query:

Click to copy

SELECT *
FROM dbo.PersonsView pv1
    INNER JOIN dbo.PersonsView2 pv2 ON pv1.LastNameInitial = pv2.LastNameInitial;

And the results:

Click to copy

+----------+-----------+----------+-----------------+----------+-----------+----------+-----------------+
| PersonID | FirstName | LastName | LastNameInitial | PersonID | FirstName | LastName | LastNameInitial |
+----------+-----------+----------+-----------------+----------+-----------+----------+-----------------+
|        2 | Joe       | Black    | B               |        2 | Joe       | Black    | B               |
|        1 | Max       | Vernon   | V               |        1 | Max       | Vernon   | V               |
+----------+-----------+----------+-----------------+----------+-----------+----------+-----------------+

The execution plan, with only two rows per table (admittedly not a lot of rows!)

enter image description here

I happen to have a table with 3,423 rows and 195 distinct values in Name. I'll call this table P (person) and duplicate it to create P2 (person2). There is a unique, clustered primary key on an integer ID column. I'm using Microsoft SQL Server 2016 (KB3194716) Developer Edition (64-bit) on Windows 10 Pro 6.3 with 32GB RAM.

With the base query

Click to copy

select
    p.pid
from dbo.p
inner join dbo.p2 
    on LEFT(p.name, 1) = LEFT(p2.name, 1);

I get 1.5M rows returned in 3200-3300ms (from statistics io).

enter image description here

By re-writing thus -

Click to copy

select
    p.pid
from dbo.p
where exists
(
    select 1
    from dbo.p2 
    where LEFT(p.name, 1) = LEFT(p2.name, 1)
);

elapsed reduces to 50-60ms and the plan is:

enter image description here

Fewer rows are returned (3,423) because of the matching algorithm. The same plan and row count is achieved by changing the base query to select distinct.

By creating indexed, computed column

Click to copy

alter table dbo.p2
add Name1 as Left(Name, 1);

create index ix1 on dbo.p2(Name1);

The elapsed time drops to 45-50ms.

enter image description here

Can I do a sargable first letter match on two tables?

Tags:

Sql Server

Related

Recent Posts