MySQL Update query - Will the 'where' condition respected on race condition and row locking? (php, PDO, MySQL, InnoDB)
The where condition will be respected during a race situation, but you must be careful how you check to see who won the race.
Consider the following demonstration of how this works and why you have to be careful.
First, setup some minimal tables.
CREATE TABLE table1 (
`id` TINYINT UNSIGNED NOT NULL PRIMARY KEY,
`locked` TINYINT UNSIGNED NOT NULL,
`updated_by_connection_id` TINYINT UNSIGNED DEFAULT NULL
) ENGINE = InnoDB;
CREATE TABLE table2 (
`id` TINYINT UNSIGNED NOT NULL PRIMARY KEY
) ENGINE = InnoDB;
INSERT INTO table1
(`id`,`locked`)
VALUES
(1,0);
id
plays the role of id
in your table, updated_by_connection_id
acts like assignedPhone
, and locked
like reservationCompleted
.
Now lets start the race test. You should have 2 commandline/terminal windows open, connected to mysql and using the database where you have created these tables.
Connection 1
start transaction;
Connection 2
start transaction;
Connection 1
UPDATE table1
SET locked = 1,
updated_by_connection_id = 1
WHERE id = 1
AND locked = 0;
Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0
Connection 2
UPDATE table1
SET locked = 1,
updated_by_connection_id = 2
WHERE id = 1
AND locked = 0;
Connection 2 is now waiting
Connection 1
SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 1 | 1 | +----+--------+--------------------------+
commit;
At this point, connection 2 is released to continue and outputs the following:
Connection 2
Query OK, 0 rows affected (23.25 sec) Rows matched: 0 Changed: 0 Warnings: 0
SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 1 | 1 | +----+--------+--------------------------+
commit;
Everything looks fine. We see that yes, the WHERE clause was respected in a race situation.
The reason I said you had to be careful though, is because in a real application things are not always this simple. You MAY have other actions going on within the transaction, and that can actually change the results.
Let's reset the database with the following:
delete from table1;
INSERT INTO table1
(`id`,`locked`)
VALUES
(1,0);
And now, consider this situation, where a SELECT is performed before the UPDATE.
Connection 1
start transaction;
SELECT * FROM table2;
Empty set (0.00 sec)
Connection 2
start transaction;
SELECT * FROM table2;
Empty set (0.00 sec)
Connection 1
UPDATE table1
SET locked = 1,
updated_by_connection_id = 1
WHERE id = 1
AND locked = 0;
Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0
Connection 2
UPDATE table1
SET locked = 1,
updated_by_connection_id = 2
WHERE id = 1
AND locked = 0;
Connection 2 is now waiting
Connection 1
SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 1 | 1 | +----+--------+--------------------------+ 1 row in set (0.00 sec)
SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 1 | 1 | +----+--------+--------------------------+ 1 row in set (0.00 sec)
commit;
At this point, connection 2 is released to continue and outputs the following:
Query OK, 0 rows affected (20.47 sec) Rows matched: 0 Changed: 0 Warnings: 0
Ok, let's see who won:
Connection 2
SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 0 | NULL | +----+--------+--------------------------+
Wait, what? Why is locked
0 and updated_by_connection_id
NULL??
This is the being careful I mentioned. The culprit is actually due to the fact that we did a select at the beginning. To get the correct result, we could run the following:
SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 1 | 1 | +----+--------+--------------------------+
commit;
By using SELECT ... FOR UPDATE we can get the right result. This can be very confusing (as it was for me, originally), as a SELECT and a SELECT ... FOR UPDATE are giving two different results.
The reason this happens is because of the default isolation level READ-REPEATABLE
. When the first SELECT is made, right after the start transaction;
, a snapshot is created. All future non-updating reads will be done from that snapshot.
Therefore, if you just naively SELECT after you do the update, it will pull the information from that original snapshot, which is before the row has been updated. By doing a SELECT ... FOR UPDATE you force it to get the correct information.
However, again, in a real application this could be an issue. Say, for example, your request is wrapped in a transaction, and after performing the update you want to output some information. Collecting and output that information may be handled by separate, reusable code, that you DON'T want to litter with FOR UPDATE clauses "just in case." That would lead to lots of frustration due to unnecessary locking.
Instead, you'll want to take a different track. You have many options here.
One, is to make sure you commit the transaction after the UPDATE has completed. In most case, this is probably the best, simplest choice.
Another option is to not try using SELECT to determine the result. Instead, you may be able to read the rows affected, and use that (1 row updated vs 0 rows update) to determine if the UPDATE was a success.
Another option, and one that I use frequently, as I like to keep a single request (like an HTTP request) fully wrapped in a single transaction, is to make sure that the first statement executed in a transaction is either the UPDATE or a SELECT ... FOR UPDATE. That will cause the snapshot to NOT be taken until the connection is allowed to proceed.
Let's reset our test database again and see how this works.
delete from table1;
INSERT INTO table1
(`id`,`locked`)
VALUES
(1,0);
Connection 1
start transaction;
SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 0 | NULL | +----+--------+--------------------------+
Connection 2
start transaction;
SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
Connection 2 is now waiting.
Connection 1
UPDATE table1
SET locked = 1,
updated_by_connection_id = 1
WHERE id = 1
AND locked = 0;
Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0
SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 1 | 1 | +----+--------+--------------------------+
SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 1 | 1 | +----+--------+--------------------------+
commit;
Connection 2 is now released.
Connection 2
+----+--------+--------------------------+
| id | locked | updated_by_connection_id |
+----+--------+--------------------------+
| 1 | 1 | 1 |
+----+--------+--------------------------+
Here you could actually have your server side code check the results of this SELECT and know it is accurate, and not even continue with the next steps. But, for completeness, I'll finish as before.
UPDATE table1
SET locked = 1,
updated_by_connection_id = 2
WHERE id = 1
AND locked = 0;
Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0
SELECT * FROM table1 WHERE id = 1;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 1 | 1 | +----+--------+--------------------------+
SELECT * FROM table1 WHERE id = 1 FOR UPDATE;
+----+--------+--------------------------+ | id | locked | updated_by_connection_id | +----+--------+--------------------------+ | 1 | 1 | 1 | +----+--------+--------------------------+
commit;
Now you can see that in Connection 2 the SELECT and SELECT ... FOR UPDATE give the same result. This is because the snapshot that the SELECT reads from was not created until after Connection 1 had been committed.
So, back to your original question: Yes, the WHERE clause is checked by the UPDATE statement, in all cases. However, you have to be careful with any SELECTs you may be doing, to avoid incorrectly determining the result of that UPDATE.
(Yes another option is to change the transaction isolation level. However, I don't really have experience with that and any gotchya's that might exist, so I'm not going to go into it.)
The answer is: it will validate the WHERE
condition before updating the data.
Well, I have to say that this is a very interesting question. I've never thought about such a question before, and it enforces me to have a better understanding of how it works inside MySQL. Thank you!
How I get the answer:
I did my test for this situation at first. I know it should work like this even before I did my test, but I just didn't understand why.
Why:
Finally, I found something useful in the Index Condition Pushdown section.
This is how it works inside MySQL:
MySQL Server
↑ ↑
↓ ↓
Storage Engine(InnoDB here)
- MySQL Server parses SQL statements coming from external applications.
- MySQL Server tells InnoDB to retrieve rows for it.
- InnoDB locates rows(Index, Locking), then return them to MySQL Server.
- MySQL Server evaluates
WHERE
conditions for rows. - Some other things ...
As you can see, locking occurs inside InnoDB, and MySQL Server evaluates the WHERE
condition after obtaining the rows. For your situation, the row(id = 5) is locked by the first UPDATE
, and the second UPDATE
gets stuck when fetching the same row. And the evaluation for the second UPDATE
's WHERE
condition occurs after obtaining the lock for the row.
What's more, if you have created an index on id
, Index Condition Pushdown will take place in your query.