Google BQ - how to upsert existing data in tables?
BigQuery is by design append-only preferred. That means that you better let duplicate rows from the same entity in the table and write your queries to always read most recent row.
Updating rows as you know in transactional tables possible with limitations. Your project can make up to 1,500 table operations per table per day. That's very limited and their purpose is totally different. 1 operation can touch multiple rows, but still 1500 operation per table per day. So if you want individual updates to rows, that's not working out as it limits to 1500 rows per day.
Since BQ is used as data lake, you should just stream new rows every time the user eg: updates their profile. You will end up having from 20 saves 20 rows for the same user. Later you can rematerilize your table to have unique rows by removing duplicate data.
See the most question for the later: BigQuery - DELETE statement to remove duplicates
BigQuery now supports MERGE
, which can combine both an INSERT
and UPDATE
in one atomic operation i.e. UPSERT
.
Using Mikhail's example tables, it would look like:
MERGE merge_example.table_data T
USING merge_example.table_changes S
ON T.id = S.id
WHEN MATCHED THEN
UPDATE SET value = s.value
WHEN NOT MATCHED THEN
INSERT (id, value) VALUES(id, value)
See here.
BigQuery does not support UPSERT
directly, but if you really need it - you can use UPDATE
and INSERT
one after another to achieve the same. See below simplified example
Assume you have two tables as below - one that holds your data (yourproject.yourdadtaset.table_data
) and another (yourproject.yourdadtaset.table_changes
) that contains your changes that you want to apply to first table
table_data
table_changes
Now below queries run one after another do the trick:
Update Query:
#standardSQL
UPDATE `yourproject.yourdadtaset.table_data` t
SET t.value = s.value
FROM `yourproject.yourdadtaset.table_changes` s
WHERE t.id = s.id
result will be
And now - INSERT Query
#standardSQL
INSERT `yourproject.yourdadtaset.table_data` (id, value)
SELECT id, value
FROM `yourproject.yourdadtaset.table_changes`
WHERE NOT id IN (SELECT id FROM `yourproject.yourdadtaset.table_data`)
with result as (and we are done here)
Hope above example simple and clear, so you can apply it in your case