When to use SELECT ... FOR UPDATE?
Short answers:
Q1: Yes.
Q2: Doesn't matter which you use.
Long answer:
A select ... for update
will (as it implies) select certain rows but also lock them as if they have already been updated by the current transaction (or as if the identity update had been performed). This allows you to update them again in the current transaction and then commit, without another transaction being able to modify these rows in any way.
Another way of looking at it, it is as if the following two statements are executed atomically:
select * from my_table where my_condition;
update my_table set my_column = my_column where my_condition;
Since the rows affected by my_condition
are locked, no other transaction can modify them in any way, and hence, transaction isolation level makes no difference here.
Note also that transaction isolation level is independent of locking: setting a different isolation level doesn't allow you to get around locking and update rows in a different transaction that are locked by your transaction.
What transaction isolation levels do guarantee (at different levels) is the consistency of data while transactions are in progress.
The only portable way to achieve consistency between rooms and tags and making sure rooms are never returned after they had been deleted is locking them with SELECT FOR UPDATE
.
However in some systems locking is a side effect of concurrency control, and you achieve the same results without specifying FOR UPDATE
explicitly.
To solve this problem, Thread 1 should
SELECT id FROM rooms FOR UPDATE
, thereby preventing Thread 2 from deleting fromrooms
until Thread 1 is done. Is that correct?
This depends on the concurrency control your database system is using.
MyISAM
inMySQL
(and several other old systems) does lock the whole table for the duration of a query.In
SQL Server
,SELECT
queries place shared locks on the records / pages / tables they have examined, whileDML
queries place update locks (which later get promoted to exclusive or demoted to shared locks). Exclusive locks are incompatible with shared locks, so eitherSELECT
orDELETE
query will lock until another session commits.In databases which use
MVCC
(likeOracle
,PostgreSQL
,MySQL
withInnoDB
), aDML
query creates a copy of the record (in one or another way) and generally readers do not block writers and vice versa. For these databases, aSELECT FOR UPDATE
would come handy: it would lock eitherSELECT
or theDELETE
query until another session commits, just asSQL Server
does.
When should one use
REPEATABLE_READ
transaction isolation versusREAD_COMMITTED
withSELECT ... FOR UPDATE
?
Generally, REPEATABLE READ
does not forbid phantom rows (rows that appeared or disappeared in another transaction, rather than being modified)
In
Oracle
and earlierPostgreSQL
versions,REPEATABLE READ
is actually a synonym forSERIALIZABLE
. Basically, this means that the transaction does not see changes made after it has started. So in this setup, the lastThread 1
query will return the room as if it has never been deleted (which may or may not be what you wanted). If you don't want to show the rooms after they have been deleted, you should lock the rows withSELECT FOR UPDATE
In
InnoDB
,REPEATABLE READ
andSERIALIZABLE
are different things: readers inSERIALIZABLE
mode set next-key locks on the records they evaluate, effectively preventing the concurrentDML
on them. So you don't need aSELECT FOR UPDATE
in serializable mode, but do need them inREPEATABLE READ
orREAD COMMITED
.
Note that the standard on isolation modes does prescribe that you don't see certain quirks in your queries but does not define how (with locking or with MVCC
or otherwise).
When I say "you don't need SELECT FOR UPDATE
" I really should have added "because of side effects of certain database engine implementation".