SQL join: selecting the last records in a one-to-many relationship
You could also try doing this using a sub select
SELECT c.*, p.*
FROM customer c INNER JOIN
(
SELECT customer_id,
MAX(date) MaxDate
FROM purchase
GROUP BY customer_id
) MaxDates ON c.id = MaxDates.customer_id INNER JOIN
purchase p ON MaxDates.customer_id = p.customer_id
AND MaxDates.MaxDate = p.date
The select should join on all customers and their Last purchase date.
Another approach would be to use a NOT EXISTS
condition in your join condition to test for later purchases:
SELECT *
FROM customer c
LEFT JOIN purchase p ON (
c.id = p.customer_id
AND NOT EXISTS (
SELECT 1 FROM purchase p1
WHERE p1.customer_id = c.id
AND p1.id > p.id
)
)
This is an example of the greatest-n-per-group
problem that has appeared regularly on StackOverflow.
Here's how I usually recommend solving it:
SELECT c.*, p1.*
FROM customer c
JOIN purchase p1 ON (c.id = p1.customer_id)
LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND
(p1.date < p2.date OR (p1.date = p2.date AND p1.id < p2.id)))
WHERE p2.id IS NULL;
Explanation: given a row p1
, there should be no row p2
with the same customer and a later date (or in the case of ties, a later id
). When we find that to be true, then p1
is the most recent purchase for that customer.
Regarding indexes, I'd create a compound index in purchase
over the columns (customer_id
, date
, id
). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN
on MySQL.
Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.