"ORDER BY ... USING" clause in PostgreSQL
Nice answers, but they didn't mentioned one real valuable case for ´USING´.
When you have created an index with non default operators family, for example varchar_pattern_ops
( ~>~
, ~<~
, ~>=~
, ... ) instead of <
, >
, >=
then if you search based on index and you want to use index in order by clause you need to specify USING
with the appropriate operator.
This can be illustrated with such example:
CREATE INDEX index_words_word ON words(word text_pattern_ops);
Lets compare this two queries:
SELECT * FROM words WHERE word LIKE 'o%' LIMIT 10;
and
SELECT * FROM words WHERE word LIKE 'o%' ORDER BY word LIMIT 10;
The difference between their executions is nearly 100 times in a 500K words DB! And also results may not be correct within non-C locale.
How this could happend?
When you making search with LIKE
and ORDER BY
clause, you actually make this call:
SELECT * FROM words WHERE word ~>=~ 'o' AND word ~<~'p' ORDER BY word USING < LIMIT 10;
Your index created with ~<~
operator in mind, so PG cannot use given index in a given ORDER BY
clause. To get things done right query must be rewritten to this form:
SELECT * FROM words WHERE word ~>=~ 'o' AND word ~<~'p' ORDER BY word USING ~<~ LIMIT 10;
or
SELECT * FROM words WHERE word LIKE 'o%' ORDER BY word USING ~<~ LIMIT 10;
A very simple example would be:
> SELECT * FROM tab ORDER BY col USING <
But this is boring, because this is nothing you can't get with the traditional ORDER BY col ASC
.
Also the standard catalog doesn't mention anything exciting about strange comparison functions/operators. You can get a list of them:
> SELECT amoplefttype::regtype, amoprighttype::regtype, amopopr::regoper
FROM pg_am JOIN pg_amop ON pg_am.oid = pg_amop.amopmethod
WHERE amname = 'btree' AND amopstrategy IN (1,5);
You will notice, that there are mostly <
and >
functions for primitive types like integer
, date
etc and some more for arrays and vectors and so on. None of these operators will help you to get a custom ordering.
In most cases where custom ordering is required you can get away using something like ... ORDER BY somefunc(tablecolumn) ...
where somefunc
maps the values appropriately. Because that works with every database this is also the most common way. For simple things you can even write an expression instead of a custom function.
Switching gears up
ORDER BY ... USING
makes sense in several cases:
- The ordering is so uncommon, that the
somefunc
trick doesn't work. - You work with a non-primitive type (like
point
,circle
or imaginary numbers) and you don't want to repeat yourself in your queries with strange calculations. - The dataset you want to sort is so large, that support by an index is desired or even required.
I will focus on the complex datatypes: often there is more than one way to sort them in a reasonable way. A good example is point
: You can "order" them by the distance to (0,0), or by x first, then by y or just by y or anything else you want.
Of course, PostgreSQL has predefined operators for point
:
> CREATE TABLE p ( p point );
> SELECT p <-> point(0,0) FROM p;
But none of them is declared usable for ORDER BY
by default (see above):
> SELECT * FROM p ORDER BY p;
ERROR: could not identify an ordering operator for type point
TIP: Use an explicit ordering operator or modify the query.
Simple operators for point
are the "below" and "above" operators <^
and >^
. They compare simply the y
part of the point. But:
> SELECT * FROM p ORDER BY p USING >^;
ERROR: operator > is not a valid ordering operator
TIP: Ordering operators must be "<" or ">" members of __btree__ operator families.
ORDER BY USING
requires an operator with defined semantics: Obviously it must be a binary operator, it must accept the same type as arguments and it must return boolean. I think it must also be transitive (if a < b and b < c then a < c). There may be more requirements. But all these requirements are also necessary for proper btree-index ordering. This explains the strange error messages containing the reference to btree.
ORDER BY USING
also requires not just one operator to be defined but an operator class and an operator family. While one could implement sorting with only one operator, PostgreSQL tries to sort efficiently and minimize comparisons. Therefore, several operators are used even when you specify only one - the others must adhere to certain mathematical constraints - I've already mentioned transitivity, but there are more.
Switching Gears up
Let's define something suitable: An operator for points which compares only the y
part.
The first step is to create a custom operator family which can be used by the btree index access method. see
> CREATE OPERATOR FAMILY xyzfam USING btree; -- superuser access required!
CREATE OPERATOR FAMILY
Next we must provide a comparator function which returns -1, 0, +1 when comparing two points. This function WILL be called internally!
> CREATE FUNCTION xyz_v_cmp(p1 point, p2 point) RETURNS int
AS $$BEGIN RETURN btfloat8cmp(p1[1],p2[1]); END $$ LANGUAGE plpgsql;
CREATE FUNCTION
Next we define the operator class for the family. See the manual for an explanation of the numbers.
> CREATE OPERATOR CLASS xyz_ops FOR TYPE point USING btree FAMILY xyzfam AS
OPERATOR 1 <^ ,
OPERATOR 3 ?- ,
OPERATOR 5 >^ ,
FUNCTION 1 xyz_v_cmp(point, point) ;
CREATE OPERATOR CLASS
This step combines several operators and functions and also defines their relationship and meaning. For example OPERATOR 1
means: This is the operator for less-than
tests.
Now the operators <^
and >^
can be used in ORDER BY USING
:
> INSERT INTO p SELECT point(floor(random()*100), floor(random()*100)) FROM generate_series(1, 5);
INSERT 0 5
> SELECT * FROM p ORDER BY p USING >^;
p
---------
(17,8)
(74,57)
(59,65)
(0,87)
(58,91)
Voila - sorted by y.
To sum it up: ORDER BY ... USING
is an interesting look under the hood of PostgreSQL. But nothing you will require anytime soon unless you work in very specific areas of database technology.
Another example can be found in the Postgres docs. with source code for the example here and here. This example also shows how to create the operators.
Samples:
CREATE TABLE test
(
id serial NOT NULL,
"number" integer,
CONSTRAINT test_pkey PRIMARY KEY (id)
)
insert into test("number") values (1),(2),(3),(0),(-1);
select * from test order by number USING > //gives 3=>2=>1=>0=>-1
select * from test order by number USING < //gives -1=>0=>1=>2=>3
So, it is equivalent to desc
and asc
. But you may use your own operator, that's the essential feature of USING