SELECT DISTINCT ON, ordered by another column
This is a classic greatest-n-per-group
problem. They frequently arise in a whole host of areas and, like Analytic functions
(see below) are well worth studying.
Nowadays, it is typically solved by using Analytic (aka Window) functions - see the fiddle here
.
You can use this query -
WITH cte AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS rn,
col1, col2, col3
FROM test
ORDER BY col3 DESC
)
SELECT * FROM cte
WHERE rn = 1
Result -
rn col1 col2 col3
1 2 xyz 2015-09-13
1 1 abc 2015-09-11
1 3 tcs 2015-01-18
Analytic functions are well worth getting to know - they are very powerful and you will find that they will repay you many times for any effort you put into learning them. Run the inner query on its own - experiment, it's how I learnt. BTW, it's always worth tagging your question with the version of PostgreSQL that you are using!
A more traditional method of doing this would be
SELECT x, y, mc FROM
(
SELECT col1 AS x, col2 AS y, MAX(col3) AS mc
FROM test
GROUP BY col1, col2
) AS tab
ORDER BY mc
Same result - also on the fiddle.
You can still use DISTINCT ON
. Just wrap it into an outer query to sort to your needs. See:
- Get distinct on one column, order by another
- PostgreSQL DISTINCT ON with different ORDER BY
SELECT *
FROM (
SELECT DISTINCT ON (col1)
col1, col2, col3
FROM test
ORDER BY col1, col3 DESC
) sub
ORDER BY col3 DESC, col2;
Assuming that col2
functionally depends on col1
, so we can ignore it in DISTINCT ON
and ORDER BY
of the inner query. But I added it to the outer ORDER BY
as meaningful tiebreaker. If col2
not unique without col1
, you might append col1
additionally.
Assuming col3
is defined NOT NULL
. Else append NULLS LAST
:
- PostgreSQL sort by datetime asc, null first?
With only few rows per (col1)
, this is typically the fastest solution. See:
- Select first row in each GROUP BY group?
db<>fiddle here
A subquery with the window function row_number()
(like
Vérace suggested) is a valid alternative, but typically slower. I have done many tests, but try yourself. It has to sort twice, just like DISTINCT ON
(which may switch to a hashing algorithm internally if that's expected to be faster), but it keeps all rows after the inner query, adding needless cost. Either way, you don't need ORDER BY
in the inner query:
SELECT col1, col2, col3
FROM (
SELECT col1, col2, col3
, row_number() OVER (PARTITION BY col1 ORDER BY col3 DESC) AS rn
FROM test
) sub
WHERE rn = 1
ORDER BY col3 DESC, col2;
And don't use a CTE if you don't need it. It's typically considerably more expensive (up until Postgres 12, where this was fixed, mostly).
For many rows per col1
, indexing becomes much more important, and there are typically much faster alternatives. See:
- Optimize GROUP BY query to retrieve latest row per user
Aside, unlike Oracle or SQL Server, PostgreSQL does not use the term "analytic functions" for window functions. (What's "analytic" about those functions?)