Does it make sense to have CASE .. END in an ORDER BY?

General answer

First, I want to address ambiguity in the premise:

It is common to use parameters in the WHERE part of queries, and to have some computed columns in the SELECT part, but not that common to parameterize the ORDER BY clause.

Computed columns in the SELECT part are hardly ever relevant for the query plan or performance. But "in the WHERE part" is ambiguous.

It's common to parameterize values in the WHERE clause, which works for prepared statements. (And PL/pgSQL works with prepared statements internally.) A generic query plan often makes sense regardless of provided values. That is, unless tables have a very uneven data distribution, but since Postgres 9.2 PL/pgSQL re-plans queries a couple of times to test whether the generic plan seems good enough:

PostgreSQL Stored Procedure Performance

But it's not as common to parameterize whole predicates (including identifiers) in the WHERE clause, which is impossible with prepared statements to begin with. You need dynamic SQL with EXECUTE, or you assemble the query strings in the client.

Dynamic ORDER BY expressions are somewhere in between both. You can do it with a CASE expression, but that's very hard to optimize in general. Postgres might use indexes with a plain ORDER BY, but not with CASE expressions hiding the eventual sort order. The planner is smart, but not an AI. Depending on the rest of the query (ORDER BY may be relevant for the plan or not - it's relevant in your example), you may end up with a sub-optimal query plan all the time.
Plus, you add the minor cost of the CASE expression(s). And in your particular example also for multiple useless ORDER BY columns.

Typically, dynamic SQL with EXECUTE is faster or much faster for this.

Maintainability should not be a problem if you maintain a clear and readable code format in your function body.

Fix demo function

The function in the question is broken. The return type is defined to return an anonymous record:

RETURNS record AS

But the query actually returns a set of records, it would have to be:

RETURNS SETOF record AS

But that's still unhelpful. You'd have to provide a column definition list with every call. Your query returns columns of well-known type. Declare the return type accordingly! I am guessing here, use actual data types of returned columns / expressions:

RETURNS TABLE (make text, model text, year int, euro_price int, colour text) AS

I use the same column names for convenience. Columns in the RETURNS TABLE clause are effectively OUT parameters, visible in every SQL statement in the body (but not inside EXECUTE). So table-qualify columns in queries in the function body to avoid possible naming conflicts. The demo function would work like this:

CREATE or REPLACE FUNCTION get_car_list (
    _colour            text, 
    _min_price         numeric, 
    _max_price         numeric, 
    _sorting_criterium car_sorting_criteria) 
  RETURNS TABLE (make text, model text, year int, euro_price numeric, colour text) AS
$func$
      SELECT c.make, c.model, c.year, c.euro_price, c.colour
      FROM   cars c
      WHERE  c.euro_price BETWEEN _min_price AND _max_price
      AND    c.colour = _colour
      ORDER  BY CASE WHEN _sorting_criterium = 'colour' THEN c.colour     END
              , CASE WHEN _sorting_criterium = 'price'  THEN c.euro_price END;
$func$  LANGUAGE sql;

Do not confuse the RETURNS key word in the function declaration with the plpgsql RETURN command like Evan did in his answer. Details:

Difference between return next and return record

General difficulty of the example query

Predicate on some columns (even worse: range predicates), other columns in ORDER BY, that's already hard to optimize. But you mentioned in a comment:

Actual result sets can be in the order of several 1.000 rows (and thus, sorted and paginated in smaller chunks server-side)

So you'll add LIMIT and OFFSET to these queries, returning the n "best" matches first. Or some smarter pagination technique:

Improve performance for order by with columns from many tables

You need a matching index to make this fast. I don't see how this could possibly work with CASE expressions in ORDER BY.

Consider:

Optimize a query with small LIMIT, predicate on one column and order by another
Can spatial index help a "range - order by - limit" query

Three points that I would raise,

This is a very basic query, even in your applied version. Create a VIEW for it. Have your users customize the WHERE conditions using the VIEW. Functions are black boxes to the query planner. It's horrible to use them inside of other functions, only SQL is inlined. And, dynamic functions do not get cached plans.
If you want to continue to use plpgsql, use RETURNS QUERY (or RETURNS QUERY EXECUTE) not RETURNS SETOF. There is no reason to use RETURNS SETOF with a sort. It has to be buffered anyway, afaik. You will encounter problems with either one if your resultset is greater than work_mem.
I'm not sure what you're app is written in. I assume web. I was in the automotive industry for a decade though, made lots of things like Craigslist and posting tools for Craigslist. "Don't sort stuff for users in the database" is generally a good rule of thumb. There is no reason for it. Drop that stuff out in JSON, and let them handle it in the browser. Unless you're showing more than 1000 rows, it's not even worth thinking about. Just consider the round trip time from a cell phone tower. You'll never get that time back wondering about this problem.

Moving forward, I'd even consider wrapping a service like PostgREST which handles thoroughly arbitrary ordering,

If you care where nulls are sorted, add nullsfirst or nullslast:

GET /people?order=age.nullsfirst
GET /people?order=age.desc.nullslast

Does it make sense to have CASE .. END in an ORDER BY?

General answer

Fix demo function

General difficulty of the example query

Tags:

Postgresql

Dynamic Sql

Postgresql 9.6

Functions

Order By

Related

Recent Posts