How group with min(date) and select an another column in the same table
I infer that your data looks like this:
Person Table
╔══════════╦═══════╦════════╗
║ PersonID ║ Name ║ Gender ║
╠══════════╬═══════╬════════╣
║ 1 ║ John ║ M ║
║ 2 ║ Vicky ║ F ║
║ 3 ║ Bob ║ M ║
╚══════════╩═══════╩════════╝
Job Table
╔══════════╦═════════════╦════════════╗
║ PersonID ║ JobName ║ HireDate ║
╠══════════╬═════════════╬════════════╣
║ 1 ║ Electrician ║ 2010-01-01 ║
║ 1 ║ Mechanic ║ 2012-05-08 ║
║ 2 ║ Scientific ║ 2012-11-11 ║
╚══════════╩═════════════╩════════════╝
The first task is to find the first job (by hire date) for each person. One neat way to do that is by using a correlated subquery:
SELECT j.*
FROM dbo.Job AS j
WHERE
j.HireDate =
(
SELECT MIN(j2.HireDate)
FROM dbo.Job AS j2
WHERE j2.PersonID = j.PersonID
);
Notice the correlation WHERE j2.PersonID = j.PersonID
between the inner and outer queries there. The output of that query is:
╔══════════╦═════════════╦════════════╗
║ PersonID ║ JobName ║ HireDate ║
╠══════════╬═════════════╬════════════╣
║ 1 ║ Electrician ║ 2010-01-01 ║
║ 2 ║ Scientific ║ 2012-11-11 ║
╚══════════╩═════════════╩════════════╝
The execution plan (given a clustered PRIMARY KEY
on PersonID, HireDate
) is:
The interesting thing about that plan is the Job table is only scanned once, despite there being two references to it in the original query. The plan uses an optimization that I call Segment Top. Essentially the execution engine takes advantage of the index order to detect the start of a new group (segment) and take just the first row from each group (top).
Now that we have that result, all we need do is join it back to the Person table:
SELECT
p.PersonName,
p.Gender,
j.JobName
FROM dbo.Person AS p
LEFT JOIN
(
-- Previous query
SELECT j.*
FROM dbo.Job AS j
WHERE
j.HireDate =
(
SELECT MIN(j2.HireDate)
FROM dbo.Job AS j2
WHERE j2.PersonID = j.PersonID
)
) AS j ON
j.PersonID = p.PersonID
OPTION (MERGE JOIN);
The execution plan is:
The OPTION (MERGE JOIN)
is not required; I just added it to show the plan you are likely to get when the tables contain a larger number of rows than in this small example.
Table definitions and sample data:
CREATE TABLE dbo.Person
(
PersonID integer NOT NULL,
PersonName varchar(30) NOT NULL,
Gender char(1) NOT NULL,
PRIMARY KEY (PersonID)
);
CREATE TABLE dbo.Job
(
PersonID integer NOT NULL,
JobName varchar(30) NOT NULL,
HireDate datetime NOT NULL,
PRIMARY KEY (PersonID, HireDate)
);
INSERT dbo.Person
(PersonID, PersonName, Gender)
SELECT 1, 'John', 'M' UNION ALL
SELECT 2, 'Vicky', 'F' UNION ALL
SELECT 3, 'Bob', 'M';
INSERT dbo.Job
(PersonID, JobName, HireDate)
SELECT 1, 'Mechanic', '20120508' UNION ALL
SELECT 1, 'Electrician', '20100101' UNION ALL
SELECT 2, 'Scientific', '20121111';