Can I do a sargable first letter match on two tables?
If the lastname
column is indexed in at least one of the tables then you could also use LIKE
SELECT *
FROM persons p
INNER JOIN persons2 p2
ON p2.lastname LIKE LEFT(p.lastname, 1) + '%'
The plan for this can have a seek on the table specified to the left of the like.
i.e. ON p.lastname LIKE LEFT(p2.lastname, 1) + '%'
would not be able to make use of the index on persons2
that was used above but could seek one on persons
.
The suggestion in the other answer of indexing a calculated column on both sides is more flexible however. As for a nested loops plan either table can be on the inside and it would also allow a many to many merge join without requiring a sort.
Create a view on the tables with a persisted computed column defined as the LEFT(lastname, 1)
of each table, then compare the computed persisted column values.
Here is a test-bed showing how to do that:
CREATE TABLE dbo.Persons
(
PersonID int NOT NULL
CONSTRAINT PK_Persons
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, FirstName nvarchar(500) NOT NULL
, LastName nvarchar(500) NOT NULL
);
CREATE TABLE dbo.Persons2
(
PersonID int NOT NULL
CONSTRAINT PK_Persons2
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, FirstName nvarchar(500) NOT NULL
, LastName nvarchar(500) NOT NULL
);
GO
CREATE VIEW dbo.PersonsView
WITH SCHEMABINDING
AS
SELECT p1.PersonID
, p1.FirstName
, p1.LastName
, LastNameInitial = LEFT(p1.LastName, 1)
FROM dbo.Persons p1;
GO
CREATE VIEW dbo.PersonsView2
WITH SCHEMABINDING
AS
SELECT p2.PersonID
, p2.FirstName
, p2.LastName
, LastNameInitial = LEFT(p2.LastName, 1)
FROM dbo.Persons p2;
GO
CREATE UNIQUE CLUSTERED INDEX CX_PersonsView
ON dbo.PersonsView(PersonID);
CREATE NONCLUSTERED INDEX IX_PersonsView_LastNameInitial
ON dbo.PersonsView(LastNameInitial)
INCLUDE (FirstName, LastName);
CREATE UNIQUE CLUSTERED INDEX CX_PersonsView2
ON dbo.PersonsView2(PersonID);
CREATE NONCLUSTERED INDEX IX_PersonsView2_LastNameInitial
ON dbo.PersonsView2(LastNameInitial)
INCLUDE (FirstName, LastName);
CREATE STATISTICS ST_PersonsView_001
ON dbo.PersonsView(LastName);
CREATE STATISTICS ST_PersonsView2_001
ON dbo.PersonsView2(LastName);
Here, we'll insert some sample data:
INSERT INTO dbo.Persons(FirstName, LastName)
VALUES ('Max', 'Vernon')
, ('Joe', 'Black');
INSERT INTO dbo.Persons2(FirstName, LastName)
VALUES ('Max', 'Vernon')
, ('Joe', 'Black');
Here's the SELECT
query:
SELECT *
FROM dbo.PersonsView pv1
INNER JOIN dbo.PersonsView2 pv2 ON pv1.LastNameInitial = pv2.LastNameInitial;
And the results:
+----------+-----------+----------+-----------------+----------+-----------+----------+-----------------+ | PersonID | FirstName | LastName | LastNameInitial | PersonID | FirstName | LastName | LastNameInitial | +----------+-----------+----------+-----------------+----------+-----------+----------+-----------------+ | 2 | Joe | Black | B | 2 | Joe | Black | B | | 1 | Max | Vernon | V | 1 | Max | Vernon | V | +----------+-----------+----------+-----------------+----------+-----------+----------+-----------------+
The execution plan, with only two rows per table (admittedly not a lot of rows!)
I happen to have a table with 3,423 rows and 195 distinct values in Name
. I'll call this table P
(person) and duplicate it to create P2
(person2). There is a unique, clustered primary key on an integer ID column. I'm using Microsoft SQL Server 2016 (KB3194716) Developer Edition (64-bit) on Windows 10 Pro 6.3 with 32GB RAM.
With the base query
select
p.pid
from dbo.p
inner join dbo.p2
on LEFT(p.name, 1) = LEFT(p2.name, 1);
I get 1.5M rows returned in 3200-3300ms (from statistics io).
By re-writing thus -
select
p.pid
from dbo.p
where exists
(
select 1
from dbo.p2
where LEFT(p.name, 1) = LEFT(p2.name, 1)
);
elapsed reduces to 50-60ms and the plan is:
Fewer rows are returned (3,423) because of the matching algorithm. The same plan and row count is achieved by changing the base query to select distinct
.
By creating indexed, computed column
alter table dbo.p2
add Name1 as Left(Name, 1);
create index ix1 on dbo.p2(Name1);
The elapsed time drops to 45-50ms.