SQLAlchemy update multiple rows in one transaction
As an extension to @Martijn's excellent answer i've done it in a way that allows you to increment rather than set the value:
# List of items to increment [id, increment value]
increments = [
[1,1],
[2,5],
[3,8]
]
# Extract ids
ids = [v[0] for v in increments]
# Create payload
payload = {
v[0]:Table.col2 + v[1] for v in increments
}
# Get the product
query(Table) \
.filter(Products.col1.in_(ids)) \
.update({
Table.col2: case(
payload,
value=Table.col1
)
})
Yes, updating a larger number of rows with a single bulk UPDATE
statement will be a lot faster than using individual UPDATE
s on each and every object. An IN
filter would only help you limit what rows are updated, but you still need to tell the database what value to use for the col2
updates.
You can use a CASE ... WHEN ... THEN
construct for that, with the case()
function:
from sqlalchemy.sql import case
query(MyTable).filter(
MyTable.col1.in_(payload)
).update({
MyTable.col2: case(
payload,
value=MyTable.col1,
)
}, synchronize_session=False)
The above a) selects rows where the col1
value is a key in the payload
dictionary, then b) updates the col2
column value using a CASE
statement that picks values from that same dictionary to update that column based on matching col1
against the keys.
With payload
set to {'x': 'y', 'a': 'b', 'c': 'd'}
, the above executes the following query (give or take the exact order of WHEN
clauses and values in the IN
test):
UPDATE mytable
SET
col2=CASE mytable.col1
WHEN 'x' THEN 'y'
WHEN 'a' THEN 'b'
WHEN 'c' THEN 'd'
END
WHERE
mytable.col1 IN ('x', 'a', 'c')
I set synchronize_session
to False
there, as updating all possible cached MyTable
instances at once is perhaps not the best idea when updating a large number of rows. Your other options are 'evaluate'
and 'fetch'
.
We can't use the default
'evaluate'
(which would find existing objects in the session that match thewhere
clause, to update in-place), because SQLAlchemy currently doesn't know how to process anIN
filter (you get anUnevaluatableError
exception).If you do use
'fetch'
then all instances ofMyTable
cached in the session that were affected are updated with new values forcol2
(as mapped by their primary key).
Note that a commit would expire the session anyway, so you'd only want to use 'fetch'
if you need to do some more work with the updated rows before you can commit the current transaction.
See the Query.update()
documentation for more information on what synchronize_session
options you have.
Another solution that I found before the chosen answer which also works fast would be:
# payload = {'x': 'y', 'a': 'b', 'c': 'd'}
all_rows = query(MyTable).filter(
MyTable.col1.in_(payload)
)
for row in all_rows:
row.col2=payload[row.col1]
This would do aSELECT
though, which will add a couple of seconds, but is helpful to someone who finds it more convenient having the full row, also giving more flexibility.