Concurrent transactions result in race condition with unique constraint on insert
Try the insert
first, with on conflict ... do nothing
and returning id
.
If the value already exists, you will get no result from this statement, so you have then to execute a select
to get the ID.
If two transactions try to do this at the same time, one of them will block on the insert
(because the database does not yet know if the other transaction will commit or rollback), and continue only after the other transaction has finished.
The root of the problem is that, with default READ COMMITTED
isolation level, each concurrent UPSERT (or any query, for that matter) can only see rows that were visible at the start of the query. The manual:
When a transaction uses this isolation level, a
SELECT
query (without aFOR UPDATE
/SHARE
clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions.
But a UNIQUE
index is absolute and still has to consider concurrently entered rows - even yet invisible rows. So you can get an exception for a unique violation, but you still cannot see the conflicting row within the same query. The manual:
INSERT
with anON CONFLICT DO NOTHING
clause may have insertion not proceed for a row due to the outcome of another transaction whose effects are not visible to theINSERT
snapshot. Again, this is only the case in Read Committed mode.
The brute-force "solution" to this problem is to overwrite conflicting rows with ON CONFLICT ... DO UPDATE
. The new row version is then visible within the same query. But there are several side effects and I would advice against it. One of them is that UPDATE
triggers get fired - the thing you want to avoid expressly. Closely related answer on SO:
- How to use RETURNING with ON CONFLICT in PostgreSQL?
The remaining option is to start a new command (in the same transaction), which then can see these conflicting rows from the previous query. Both existing answers suggest as much. The manual again:
However,
SELECT
does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Also note that two successiveSELECT
commands can see different data, even though they are within a single transaction, if other transactions commit changes after the firstSELECT
starts and before the secondSELECT
starts.
But you want more:
-- Continue, using articleId to represent the article for next operations...
If concurrent write operations might be able to change or delete the row, to be absolutely sure, you also have to lock the selected row. (The inserted row is locked anyway.)
And since you seem to have very competitive transactions, to make sure you succeed, loop until success. Wrapped into a plpgsql function:
CREATE OR REPLACE FUNCTION f_articleid(_title text, _content text, OUT _articleid int) AS
$func$
BEGIN
LOOP
SELECT articleid
FROM articles
WHERE title = _title
FOR UPDATE -- or maybe a weaker lock
INTO _articleid;
EXIT WHEN FOUND;
INSERT INTO articles AS a (title, content)
VALUES (_title, _content)
ON CONFLICT (title) DO NOTHING -- (new?) _content is discarded
RETURNING a.articleid
INTO _articleid;
EXIT WHEN FOUND;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Detailed explanation:
- Is SELECT or INSERT in a function prone to race conditions?
I think the best solution is to just do the insert, and catch the error and handle it properly. If you are prepared to handle errors, serializable isolation level is (apparently) unnecessary for your case. If you are not prepared to handle errors, serializable isolation level won't help--it will just create even more errors you aren't prepared to handle.
Another option would be to do the ON CONFLICT DO NOTHING and then if nothing happens, follow up by doing the query you are already doing to get the must-be-there-now value. In other words, move select id into articleId from articles where title = 'my new blog';
from a pre-emptive step to a step only executed if ON CONFLICT DO NOTHING does in fact do nothing. If it is possible for a record to be inserted and then deleted again, then you should do this in a retry loop.