Postgres UNIQUE CONSTRAINT for array

@mu already demonstrated how an index on an expression can solve your problem.

My attention was caught by the used functions. Both seem like an overkill for array of two integers. This may be a simplification of the real situation. (?)

Anyway, I was intrigued and ran a test with a couple of variants.

Test setup

-- temporary table with 10000 random pairs of integer
CREATE TEMP TABLE arr (i int[]);

INSERT INTO arr 
SELECT ARRAY[(random() * 1000)::int, (random() * 1000)::int]
FROM   generate_series(1,10000);

Test candidates with a short comment to explain each one:

-- 1) mu's query
CREATE OR REPLACE FUNCTION sort_array1(integer[])  RETURNS int[] AS
$$
    SELECT array_agg(n) FROM (SELECT n FROM unnest($1) AS t(n) ORDER BY n) AS a;
$$ LANGUAGE sql STRICT IMMUTABLE;

-- 2) simplified with ORDER BY inside aggregate (pg 9.0+)
CREATE OR REPLACE FUNCTION sort_array2(int[])  RETURNS int[] AS
$$
SELECT array_agg(n ORDER BY n) FROM unnest($1) AS t(n);
$$ LANGUAGE sql STRICT IMMUTABLE;


-- 3) uralbash's query
CREATE OR REPLACE FUNCTION sort_array3(anyarray)  RETURNS anyarray AS
$$
SELECT ARRAY(
    SELECT $1[i]
    FROM   generate_series(array_lower($1,1), array_upper($1,1)) g(i)
    ORDER  BY 1)
$$ LANGUAGE sql STRICT IMMUTABLE;

-- 4) change parameters to int[]
CREATE OR REPLACE FUNCTION sort_array4(int[])  RETURNS int[] AS
$$
SELECT ARRAY(
    SELECT $1[i]
    FROM   generate_series(array_lower($1,1), array_upper($1,1)) g(i)
    ORDER  BY 1)
$$ LANGUAGE sql STRICT IMMUTABLE;

-- 5) simplify array_lower() - it's always 1
CREATE OR REPLACE FUNCTION sort_array5(int[])  RETURNS int[] AS
$$
SELECT ARRAY(
    SELECT $1[i]
    FROM   generate_series(1, array_upper($1,1)) g(i)
    ORDER  BY 1)
$$ LANGUAGE sql STRICT IMMUTABLE;

-- 6) further simplify to case with 2 elements
CREATE OR REPLACE FUNCTION sort_array6(int[])  RETURNS int[] AS
$$
SELECT ARRAY(
    SELECT i
    FROM  (VALUES ($1[1]),($1[2])) g(i)
    ORDER  BY 1)
$$ LANGUAGE sql STRICT IMMUTABLE;


-- 7) my radically simple query
CREATE OR REPLACE FUNCTION sort_array7(int[])  RETURNS int[] AS
$$
SELECT CASE WHEN $1[1] > $1[2] THEN ARRAY[$1[2], $1[1]] ELSE $1 END;
$$ LANGUAGE sql STRICT IMMUTABLE;

-- 8) without STRICT modifier
CREATE OR REPLACE FUNCTION sort_array8(int[])  RETURNS int[] AS
$$
SELECT CASE WHEN $1[1] > $1[2] THEN ARRAY[$1[2], $1[1]] ELSE $1 END;
$$ LANGUAGE sql IMMUTABLE;

Results

I executed each around 20 times and took the best result from EXPLAIN ANALYZE.

SELECT sort_array1(i) FROM arr  -- Total runtime: 183 ms
SELECT sort_array2(i) FROM arr  -- Total runtime: 175 ms

SELECT sort_array3(i) FROM arr  -- Total runtime: 183 ms
SELECT sort_array4(i) FROM arr  -- Total runtime: 183 ms
SELECT sort_array5(i) FROM arr  -- Total runtime: 177 ms
SELECT sort_array6(i) FROM arr  -- Total runtime: 144 ms

SELECT sort_array7(i) FROM arr  -- Total runtime: 103 ms
SELECT sort_array8(i) FROM arr  -- Total runtime:  43 ms (!!!)

These are the results from a v9.0.5 server on Debian Squeeze. Similar results on v.8.4.

I also tested plpgsql variants which were a bit slower as expected: too much overhead for a tiny operation, no query plan to cache.

The simple function (nr. 7) is substantially faster than the others. That was to be expected, the overhead of the other variants is just too much for a tiny array.

But that leaving away the STRICT modifier more than doubles the speed was not to be expected. At least I didn't. I posted a question about this phenomenon here.

I don't think you can use a function with a unique constraint but you can with a unique index. So given a sorting function something like this:

create function sort_array(anyarray) returns anyarray as $$
    select array_agg(distinct n order by n) from unnest($1) as t(n);
$$ language sql immutable;

Then you could do this:

create table mytable (
    interface integer[2] 
);
create unique index mytable_uniq on mytable (sort_array(interface));

Then the following happens:

=> insert into mytable (interface) values (array[11,23]);
INSERT 0 1
=> insert into mytable (interface) values (array[11,23]);
ERROR:  duplicate key value violates unique constraint "mytable_uniq"
DETAIL:  Key (sort_array(interface))=({11,23}) already exists.
=> insert into mytable (interface) values (array[23,11]);
ERROR:  duplicate key value violates unique constraint "mytable_uniq"
DETAIL:  Key (sort_array(interface))=({11,23}) already exists.
=> insert into mytable (interface) values (array[42,11]);
INSERT 0 1

Just create a unique index on the two values:

create unique index ix on 
  mytable(least(interface[1], interface[2]), greatest(interface[1], interface[2]));

Postgres UNIQUE CONSTRAINT for array

Test setup

Results

Tags:

Arrays

Postgresql

Unique Constraint

Related

Recent Posts