Does it make sense to have CASE .. END in an ORDER BY?
General answer
First, I want to address ambiguity in the premise:
It is common to use parameters in the WHERE part of queries, and to have some computed columns in the SELECT part, but not that common to parameterize the ORDER BY clause.
Computed columns in the SELECT
part are hardly ever relevant for the query plan or performance. But "in the WHERE
part" is ambiguous.
It's common to parameterize values in the WHERE
clause, which works for prepared statements. (And PL/pgSQL works with prepared statements internally.) A generic query plan often makes sense regardless of provided values. That is, unless tables have a very uneven data distribution, but since Postgres 9.2 PL/pgSQL re-plans queries a couple of times to test whether the generic plan seems good enough:
- PostgreSQL Stored Procedure Performance
But it's not as common to parameterize whole predicates (including identifiers) in the WHERE
clause, which is impossible with prepared statements to begin with. You need dynamic SQL with EXECUTE
, or you assemble the query strings in the client.
Dynamic ORDER BY
expressions are somewhere in between both. You can do it with a CASE
expression, but that's very hard to optimize in general. Postgres might use indexes with a plain ORDER BY
, but not with CASE
expressions hiding the eventual sort order. The planner is smart, but not an AI. Depending on the rest of the query (ORDER BY
may be relevant for the plan or not - it's relevant in your example), you may end up with a sub-optimal query plan all the time.
Plus, you add the minor cost of the CASE
expression(s). And in your particular example also for multiple useless ORDER BY
columns.
Typically, dynamic SQL with EXECUTE
is faster or much faster for this.
Maintainability should not be a problem if you maintain a clear and readable code format in your function body.
Fix demo function
The function in the question is broken. The return type is defined to return an anonymous record:
RETURNS record AS
But the query actually returns a set of records, it would have to be:
RETURNS SETOF record AS
But that's still unhelpful. You'd have to provide a column definition list with every call. Your query returns columns of well-known type. Declare the return type accordingly! I am guessing here, use actual data types of returned columns / expressions:
RETURNS TABLE (make text, model text, year int, euro_price int, colour text) AS
I use the same column names for convenience. Columns in the RETURNS TABLE
clause are effectively OUT
parameters, visible in every SQL statement in the body (but not inside EXECUTE
). So table-qualify columns in queries in the function body to avoid possible naming conflicts. The demo function would work like this:
CREATE or REPLACE FUNCTION get_car_list (
_colour text,
_min_price numeric,
_max_price numeric,
_sorting_criterium car_sorting_criteria)
RETURNS TABLE (make text, model text, year int, euro_price numeric, colour text) AS
$func$
SELECT c.make, c.model, c.year, c.euro_price, c.colour
FROM cars c
WHERE c.euro_price BETWEEN _min_price AND _max_price
AND c.colour = _colour
ORDER BY CASE WHEN _sorting_criterium = 'colour' THEN c.colour END
, CASE WHEN _sorting_criterium = 'price' THEN c.euro_price END;
$func$ LANGUAGE sql;
Do not confuse the RETURNS
key word in the function declaration with the plpgsql RETURN
command like Evan did in his answer. Details:
- Difference between return next and return record
General difficulty of the example query
Predicate on some columns (even worse: range predicates), other columns in ORDER BY
, that's already hard to optimize. But you mentioned in a comment:
Actual result sets can be in the order of several 1.000 rows (and thus, sorted and paginated in smaller chunks server-side)
So you'll add LIMIT
and OFFSET
to these queries, returning the n "best" matches first. Or some smarter pagination technique:
- Improve performance for order by with columns from many tables
You need a matching index to make this fast. I don't see how this could possibly work with CASE
expressions in ORDER BY
.
Consider:
- Optimize a query with small LIMIT, predicate on one column and order by another
- Can spatial index help a "range - order by - limit" query
Three points that I would raise,
- This is a very basic query, even in your applied version. Create a
VIEW
for it. Have your users customize theWHERE
conditions using theVIEW
. Functions are black boxes to the query planner. It's horrible to use them inside of other functions, onlySQL
is inlined. And, dynamic functions do not get cached plans. - If you want to continue to use plpgsql, use
RETURNS QUERY
(orRETURNS QUERY EXECUTE
) notRETURNS SETOF
. There is no reason to useRETURNS SETOF
with a sort. It has to be buffered anyway, afaik. You will encounter problems with either one if your resultset is greater thanwork_mem
. - I'm not sure what you're app is written in. I assume web. I was in the automotive industry for a decade though, made lots of things like Craigslist and posting tools for Craigslist. "Don't sort stuff for users in the database" is generally a good rule of thumb. There is no reason for it. Drop that stuff out in JSON, and let them handle it in the browser. Unless you're showing more than 1000 rows, it's not even worth thinking about. Just consider the round trip time from a cell phone tower. You'll never get that time back wondering about this problem.
Moving forward, I'd even consider wrapping a service like PostgREST which handles thoroughly arbitrary ordering,
If you care where nulls are sorted, add nullsfirst or nullslast:
GET /people?order=age.nullsfirst
GET /people?order=age.desc.nullslast