How to apply ORDER BY and LIMIT in combination with an aggregate function?

I am applying order by p.name desc in two places ... Is there a way to avoid that?

Yes. Aggregate with an ARRAY constructor in the lateral subquery directly:

SELECT c.identifier, p.persons
FROM   class c
CROSS  JOIN LATERAL (
   SELECT ARRAY (
      SELECT identifier
      FROM   person
      WHERE  class_identifier = c.identifier
      ORDER  BY name DESC
      LIMIT  2
      ) AS persons
   ) p
ORDER  BY c.identifier;

You also don't need GROUP BY in the outer SELECT this way. Shorter, cleaner, faster.

I replaced the LEFT JOIN with a plain CROSS JOIN since the ARRAY constructor always returns exactly 1 row. (Like you pointed out in a comment.)

db<>fiddle here.

Preserve order of array elements after join

Order of rows in subqueries

To address your comment:

I learned that order of rows in a subquery is never guaranteed to be preserved in the outer query.

Well, no. While the SQL standard does not offer any guarantees, there are limited guarantees in Postgres. The manual:

This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work. For example:
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
Beware that this approach can fail if the outer query level contains additional processing, such as a join, because that might cause the subquery's output to be reordered before the aggregate is computed.

If all you do in the next level is to aggregate rows, the order is positively guaranteed. Any yes, what we feed to the ARRAY constructor is a subquery, too. That's not the point. It would work with array_agg() as well:

SELECT c.identifier, p.persons
FROM   class c
CROSS  JOIN LATERAL (
   SELECT array_agg(identifier) AS persons
   FROM  (
      SELECT identifier
      FROM   person
      WHERE  class_identifier = c.identifier
      ORDER  BY name DESC
      LIMIT  2
      ) sub
   ) p
ORDER  BY c.identifier;

But I expect the ARRAY constructor to be faster for the case. See:

Why is array_agg() slower than the non-aggregate ARRAY() constructor?
convert right side of join of many to many into array

Here's an alternative, but it is not any better than what you already have:

with enumeration (class_identifier, identifier, name, n) as (
    select  p.class_identifier, p.identifier, p.name
         , row_number() over (partition by p.class_identifier 
                              order by p.name desc)
    from     person as p
)
select c.identifier, array_agg(e.identifier order by e.n) as persons
from class as c
left join  enumeration e
    on c.identifier = e.class_identifier
where e.n <= 2
group by   c.identifier
order by   c.identifier;

How to apply ORDER BY and LIMIT in combination with an aggregate function?

Order of rows in subqueries

Tags:

Postgresql

Subquery

Aggregate

Order By

Related

Recent Posts