Concurrent transactions result in race condition with unique constraint on insert

Try the insert first, with on conflict ... do nothing and returning id. If the value already exists, you will get no result from this statement, so you have then to execute a select to get the ID.

If two transactions try to do this at the same time, one of them will block on the insert (because the database does not yet know if the other transaction will commit or rollback), and continue only after the other transaction has finished.

The root of the problem is that, with default READ COMMITTED isolation level, each concurrent UPSERT (or any query, for that matter) can only see rows that were visible at the start of the query. The manual:

When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions.

But a UNIQUE index is absolute and still has to consider concurrently entered rows - even yet invisible rows. So you can get an exception for a unique violation, but you still cannot see the conflicting row within the same query. The manual:

INSERT with an ON CONFLICT DO NOTHING clause may have insertion not proceed for a row due to the outcome of another transaction whose effects are not visible to the INSERT snapshot. Again, this is only the case in Read Committed mode.

The brute-force "solution" to this problem is to overwrite conflicting rows with ON CONFLICT ... DO UPDATE. The new row version is then visible within the same query. But there are several side effects and I would advice against it. One of them is that UPDATE triggers get fired - the thing you want to avoid expressly. Closely related answer on SO:

How to use RETURNING with ON CONFLICT in PostgreSQL?

The remaining option is to start a new command (in the same transaction), which then can see these conflicting rows from the previous query. Both existing answers suggest as much. The manual again:

However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.

But you want more:

-- Continue, using articleId to represent the article for next operations...

If concurrent write operations might be able to change or delete the row, to be absolutely sure, you also have to lock the selected row. (The inserted row is locked anyway.)

And since you seem to have very competitive transactions, to make sure you succeed, loop until success. Wrapped into a plpgsql function:

Click to copy

CREATE OR REPLACE FUNCTION f_articleid(_title text, _content text, OUT _articleid int) AS
$func$
BEGIN
   LOOP
      SELECT articleid
      FROM   articles
      WHERE  title = _title
      FOR    UPDATE          -- or maybe a weaker lock 
      INTO   _articleid;

      EXIT WHEN FOUND;

      INSERT INTO articles AS a (title, content)
      VALUES (_title, _content)
      ON     CONFLICT (title) DO NOTHING  -- (new?) _content is discarded
      RETURNING a.articleid
      INTO   _articleid;

      EXIT WHEN FOUND;
   END LOOP;
END
$func$ LANGUAGE plpgsql;

Detailed explanation:

Is SELECT or INSERT in a function prone to race conditions?

I think the best solution is to just do the insert, and catch the error and handle it properly. If you are prepared to handle errors, serializable isolation level is (apparently) unnecessary for your case. If you are not prepared to handle errors, serializable isolation level won't help--it will just create even more errors you aren't prepared to handle.

Another option would be to do the ON CONFLICT DO NOTHING and then if nothing happens, follow up by doing the query you are already doing to get the must-be-there-now value. In other words, move select id into articleId from articles where title = 'my new blog'; from a pre-emptive step to a step only executed if ON CONFLICT DO NOTHING does in fact do nothing. If it is possible for a record to be inserted and then deleted again, then you should do this in a retry loop.

Concurrent transactions result in race condition with unique constraint on insert

Tags:

Postgresql

Upsert

Concurrency

Plpgsql

Transaction

Related

Recent Posts