Multicolumn indexes and OR clause
For your use case, you actually need two different indexes:
CREATE INDEX ON messages(recipient_id);
CREATE INDEX ON messages(sender_id);
Your multi-column indexed would be used by the recipient_id = ??
part of the query, but it would not be used by the other (it can be used by other databases that know how to do Skip Scanning, such as Oracle, although it might not be extremely efficient).
Depending on how your PRIMARY KEY
is for the message(s)
table, you might not need one of them (the implicity index associated with the PK will do the job).
You can then either use your query as is
, and let PostgreSQL do a BitmapOr, or convert the query to a UNION
and avoid the OR
. Timings should be about the same in both cases. If you are sure that there are not messages where both the sender_id
and the recipient_id
are the same (i.e.: nobody sends a message to him/herself, or if s/he does, you don't mind showing it twice), then a UNION ALL
(which is slightly quicker) could be used instead with the same result.
OR
conditions tend not to be handled too well by most databases. This specific case, which is rather common and rather simple, is handled well by PostgreSQL.
Experimental check
Table definitions (simplified)
CREATE TABLE users
(
user_id integer PRIMARY KEY,
user_name text
) ;
CREATE TABLE messages
(
sender_id integer NOT NULL REFERENCES users(user_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
recipient_id integer NOT NULL REFERENCES users(user_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
send_date timestamp NOT NULL DEFAULT now(),
message_text text,
PRIMARY KEY(sender_id, recipient_id, send_date)
) ;
CREATE INDEX ON messages (recipient_id) ;
-- Following index not needed, given our primary key
-- CREATE INDEX ON messages (sender_id) ;
We populate the tables with some simulated data:
-- Create 1000 users
INSERT INTO users (user_id, user_name)
SELECT
user_id, 'user' || user_id AS user_name
FROM
generate_series(1, 1000) AS x(user_id) ;
-- Create (aprox.) 100000 messages
INSERT INTO messages (sender_id, recipient_id, send_date)
SELECT
(random()*999+1)::integer AS sender_id,
(random()*999+1)::integer AS recipient_id,
send_date
FROM
generate_series(timestamp '2017-01-01', timestamp '2017-01-31', interval '25 seconds') AS x(send_date);
ANALYZE;
And, at this point, we check which are the execution plans that we get:
-- Checking the query with OR
EXPLAIN ANALYZE
SELECT *
FROM messages
WHERE recipient_id = 123 OR sender_id = 123
ORDER BY send_date ;
gives ...
QUERY PLAN
1 Sort (cost=420.21..420.71 rows=200 width=48) (actual time=0.430..0.445 rows=192 loops=1)
2 Sort Key: send_date
3 Sort Method: quicksort Memory: 34kB
4 -> Bitmap Heap Scan on messages (cost=10.31..412.56 rows=200 width=48) (actual time=0.092..0.389 rows=192 loops=1)
5 Recheck Cond: ((recipient_id = 123) OR (sender_id = 123))
6 Heap Blocks: exact=157
7 -> BitmapOr (cost=10.31..10.31 rows=200 width=0) (actual time=0.061..0.061 rows=0 loops=1)
8 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.033..0.033 rows=97 loops=1)
9 Index Cond: (recipient_id = 123)
10 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.17 rows=100 width=0) (actual time=0.025..0.025 rows=95 loops=1)
11 Index Cond: (sender_id = 123)
12 Planning time: 0.379 ms
13 Execution time: 0.527 ms
And using UNION
:
-- Not using OR, but UNION-ing
EXPLAIN ANALYZE
SELECT *
FROM messages
WHERE recipient_id = 123
UNION
SELECT *
FROM messages
WHERE sender_id = 123
ORDER BY send_date ;
you get the following query plan:
QUERY PLAN
1 Sort (cost=538.87..539.37 rows=200 width=48) (actual time=0.387..0.399 rows=192 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 34kB
4 -> HashAggregate (cost=529.22..531.22 rows=200 width=48) (actual time=0.268..0.317 rows=192 loops=1)
5 Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6 -> Append (cost=5.07..527.22 rows=200 width=48) (actual time=0.038..0.192 rows=192 loops=1)
7 -> Bitmap Heap Scan on messages (cost=5.07..262.55 rows=100 width=48) (actual time=0.038..0.094 rows=97 loops=1)
8 Recheck Cond: (recipient_id = 123)
9 Heap Blocks: exact=89
10 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.022..0.022 rows=97 loops=1)
11 Index Cond: (recipient_id = 123)
12 -> Bitmap Heap Scan on messages messages_1 (cost=5.19..262.67 rows=100 width=48) (actual time=0.033..0.085 rows=95 loops=1)
13 Recheck Cond: (sender_id = 123)
14 Heap Blocks: exact=86
15 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.17 rows=100 width=0) (actual time=0.021..0.021 rows=95 loops=1)
16 Index Cond: (sender_id = 123)
17 Planning time: 0.178 ms
18 Execution time: 0.521 ms
For the sake of completeness: UNION ALL
:
QUERY PLAN
1 Sort (cost=537.11..537.62 rows=201 width=48) (actual time=0.213..0.229 rows=160 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 32kB
4 -> Append (cost=5.08..529.42 rows=201 width=48) (actual time=0.034..0.173 rows=160 loops=1)
5 -> Bitmap Heap Scan on messages (cost=5.08..264.74 rows=101 width=48) (actual time=0.034..0.086 rows=82 loops=1)
6 Recheck Cond: (recipient_id = 123)
7 Heap Blocks: exact=77
8 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.05 rows=101 width=0) (actual time=0.022..0.022 rows=82 loops=1)
9 Index Cond: (recipient_id = 123)
10 -> Bitmap Heap Scan on messages messages_1 (cost=5.19..262.67 rows=100 width=48) (actual time=0.030..0.075 rows=78 loops=1)
11 Recheck Cond: (sender_id = 123)
12 Heap Blocks: exact=72
13 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.17 rows=100 width=0) (actual time=0.020..0.020 rows=78 loops=1)
14 Index Cond: (sender_id = 123)
15 Planning time: 0.222 ms
16 Execution time: 0.280 ms
You can see that, in both cases, the execution plan is nearly the same for the two firsts approaches, and better for the UNION ALL.
You can check all this at Rextester.
Your original setup
If you do exactly the same but don't have the CREATE INDEX ON messages (recipient_id) ;
statement, what you would get is:
QUERY PLAN
1 Sort (cost=2123.86..2124.36 rows=200 width=48) (actual time=27.305..27.320 rows=227 loops=1)
2 Sort Key: send_date
3 Sort Method: quicksort Memory: 35kB
4 -> Seq Scan on messages (cost=0.00..2116.22 rows=200 width=48) (actual time=0.108..27.097 rows=227 loops=1)
5 Filter: ((recipient_id = 123) OR (sender_id = 123))
6 Rows Removed by Filter: 103454
7 Planning time: 0.474 ms
8 Execution time: 27.392 ms
QUERY PLAN
1 Sort (cost=2135.60..2136.10 rows=201 width=48) (actual time=15.716..15.803 rows=227 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 35kB
4 -> HashAggregate (cost=2125.90..2127.91 rows=201 width=48) (actual time=15.571..15.621 rows=227 loops=1)
5 Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6 -> Append (cost=0.00..2123.89 rows=201 width=48) (actual time=0.061..15.371 rows=227 loops=1)
7 -> Seq Scan on messages (cost=0.00..1857.01 rows=100 width=48) (actual time=0.060..15.092 rows=117 loops=1)
8 Filter: (recipient_id = 123)
9 Rows Removed by Filter: 103564
10 -> Bitmap Heap Scan on messages messages_1 (cost=5.20..264.87 rows=101 width=48) (actual time=0.086..0.248 rows=110 loops=1)
11 Recheck Cond: (sender_id = 123)
12 Heap Blocks: exact=101
13 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.18 rows=101 width=0) (actual time=0.066..0.066 rows=110 loops=1)
14 Index Cond: (sender_id = 123)
15 Planning time: 0.333 ms
16 Execution time: 16.006 ms
QUERY PLAN
1 Sort (cost=2131.58..2132.08 rows=201 width=48) (actual time=14.847..14.865 rows=227 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 35kB
4 -> Append (cost=0.00..2123.89 rows=201 width=48) (actual time=0.076..14.731 rows=227 loops=1)
5 -> Seq Scan on messages (cost=0.00..1857.01 rows=100 width=48) (actual time=0.076..14.497 rows=117 loops=1)
6 Filter: (recipient_id = 123)
7 Rows Removed by Filter: 103564
8 -> Bitmap Heap Scan on messages messages_1 (cost=5.20..264.87 rows=101 width=48) (actual time=0.082..0.209 rows=110 loops=1)
9 Recheck Cond: (sender_id = 123)
10 Heap Blocks: exact=101
11 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.18 rows=101 width=0) (actual time=0.060..0.060 rows=110 loops=1)
12 Index Cond: (sender_id = 123)
13 Planning time: 0.284 ms
14 Execution time: 14.931 ms
... which are much worse query plans, because you need some sequential scans. You can check this approach also at this Rextester.