Google BQ - how to upsert existing data in tables?

BigQuery is by design append-only preferred. That means that you better let duplicate rows from the same entity in the table and write your queries to always read most recent row.

Updating rows as you know in transactional tables possible with limitations. Your project can make up to 1,500 table operations per table per day. That's very limited and their purpose is totally different. 1 operation can touch multiple rows, but still 1500 operation per table per day. So if you want individual updates to rows, that's not working out as it limits to 1500 rows per day.

Since BQ is used as data lake, you should just stream new rows every time the user eg: updates their profile. You will end up having from 20 saves 20 rows for the same user. Later you can rematerilize your table to have unique rows by removing duplicate data.

See the most question for the later: BigQuery - DELETE statement to remove duplicates


BigQuery now supports MERGE, which can combine both an INSERT and UPDATE in one atomic operation i.e. UPSERT.

Using Mikhail's example tables, it would look like:

MERGE merge_example.table_data T
USING merge_example.table_changes S
ON T.id = S.id
WHEN MATCHED THEN
  UPDATE SET value = s.value
WHEN NOT MATCHED THEN
  INSERT (id, value) VALUES(id, value)

enter image description here

See here.


BigQuery does not support UPSERT directly, but if you really need it - you can use UPDATE and INSERT one after another to achieve the same. See below simplified example

Assume you have two tables as below - one that holds your data (yourproject.yourdadtaset.table_data) and another (yourproject.yourdadtaset.table_changes) that contains your changes that you want to apply to first table

table_data
enter image description here

table_changes
enter image description here

Now below queries run one after another do the trick:

Update Query:

#standardSQL
UPDATE `yourproject.yourdadtaset.table_data` t
SET t.value = s.value
FROM `yourproject.yourdadtaset.table_changes` s
WHERE t.id = s.id

result will be
enter image description here

And now - INSERT Query

#standardSQL
INSERT `yourproject.yourdadtaset.table_data` (id, value)
SELECT id, value
FROM `yourproject.yourdadtaset.table_changes`
WHERE NOT id IN (SELECT id FROM `yourproject.yourdadtaset.table_data`)

with result as (and we are done here)
enter image description here

Hope above example simple and clear, so you can apply it in your case