SQL Multiple Updates vs single Update performance
The single UPDATE
is faster.
I have tested with MySQL 5.1.73
CREATE TABLE test_random (
val char(40) NOT NULL default '',
num int NOT NULL default '0',
KEY val (val)
) TYPE=MyISAM;
INSERT INTO test_random (val, num) VALUES
('MXZJBXUGNFOZMMQMYZEMLKZZKTCIGEU',889),
('ZTEBMDHOJGYBYEOPZIIPPJQQOKXMTKU',351),
... [200K records total inserted] ...
('ADLDYZQHDEZMYBHORKGJYMIOVUETQCM',786);
Then here is random-update-single.sql:
UPDATE test_random SET num=1 WHERE val IN (
'PXTUKCZMRFZDTWUPULAPENPNQCSPFQJ',
'GDIMLSCDRSNCMUNUZLQIDFZSEELNZLR',
... [100K records] ...
'ADLDYZQHDEZMYBHORKGJYMIOVUETQCM');
And here is random-update-multiple.sql:
UPDATE test_random SET num=2 WHERE val='PXTUKCZMRFZDTWUPULAPENPNQCSPFQJ';
UPDATE test_random SET num=2 WHERE val='GDIMLSCDRSNCMUNUZLQIDFZSEELNZLR';
... [100K records] ...
UPDATE test_random SET num=2 WHERE val='ADLDYZQHDEZMYBHORKGJYMIOVUETQCM';
Here is the result:
> time mysql -uroot test < random-update-single.sql
0.075u 0.009s 0:01.78 3.9% 0+0k 0+0io 0pf+0w
> time mysql -uroot test < random-update-single.sql
0.074u 0.009s 0:01.76 3.9% 0+0k 0+0io 0pf+0w
> time mysql -uroot test < random-update-single.sql
0.069u 0.013s 0:01.57 4.4% 0+0k 0+0io 0pf+0w
> time mysql -uroot test < random-update-multiple.sql
1.746u 1.515s 0:11.14 29.1% 0+0k 0+0io 0pf+0w
> time mysql -uroot test < random-update-multiple.sql
2.183u 2.150s 0:14.83 29.1% 0+0k 0+0io 0pf+0w
> time mysql -uroot test < random-update-multiple.sql
1.961u 1.949s 0:13.96 27.9% 0+0k 0+0io 0pf+0w
That is, multiple UPDATE
turned out to be 5-6 times slower than single UPDATE
.
SQL is supposed to be a declarative language; it does not expect from the user to say "how" to get the result, only "what" the desired result is. So in principal I would use the in()
construct, as this is the most concise (from a logical viewpoint) way to ask for the results, and let the DBMS (any DBMS!) decide what's best.
The single update will typically be faster since the entire thing would normally be done in a single commit. 10,000 shouldn't be a problem in my experience, but depending on your platform, generating the big SQL like that might not be the best.
What is determining whether a server is going live? Which platform are you on? Is it possible you have a table you would send to a procedure as a parameter (SQL Server Table-Valued Parameters) and then use directly in the UPDATE query?
Ultimately, you can only tell with looking at the execution plan and benchmarking yourself.