PostgreSQL MAX and GROUP BY
select *
from (
select id,
year,
thing,
max(thing) over (partition by id) as max_thing
from the_table
) t
where thing = max_thing
or:
select t1.id,
t1.year,
t1.thing
from the_table t1
where t1.thing = (select max(t2.thing)
from the_table t2
where t2.id = t1.id);
or
select t1.id,
t1.year,
t1.thing
from the_table t1
join (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
) t on t.id = t1.id and t.max_thing = t1.thing
or (same as the previous with a different notation)
with max_stuff as (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
)
select t1.id,
t1.year,
t1.thing
from the_table t1
join max_stuff t2
on t1.id = t2.id
and t1.thing = t2.max_thing
The shortest (and possibly fastest) query would be with DISTINCT ON
, a PostgreSQL extension of the SQL standard DISTINCT
clause:
SELECT DISTINCT ON (1)
id, count, year
FROM tbl
ORDER BY 1, 2 DESC, 3;
The numbers refer to ordinal positions in the SELECT
list. You can spell out column names for clarity:
SELECT DISTINCT ON (id)
id, count, year
FROM tbl
ORDER BY id, count DESC, year;
The result is ordered by id
etc. which may or may not be welcome. It's better than "undefined" in any case.
It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop year
from the ORDER BY
. Or pick the latest year with year DESC
.
For many rows per id
, other query techniques are (much) faster. See:
- Select first row in each GROUP BY group?
- Optimize GROUP BY query to retrieve latest row per user