PostgreSQL - insert rows based on select from another table, and update an FK in that table with the newly inserted rows
There are several ways to solve the problem.
1. temporarily add a column
As others mentioned, the straight-forward way is to temporarily add a column reminder_id
to the dateset
. Populate it with original IDs
from reminder
table. Use it to join reminder
with the dateset
table. Drop the temporary column.
2. when start is unique
If values of the start
column is unique it is possible to do it without extra column by joining reminder
table with the dateset
table on the start
column.
INSERT INTO dateset (start)
SELECT start FROM reminder;
WITH
CTE_Joined
AS
(
SELECT
reminder.id AS reminder_id
,reminder.dateset_id AS old_dateset_id
,dateset.id AS new_dateset_id
FROM
reminder
INNER JOIN dateset ON dateset.start = reminder.start
)
UPDATE CTE_Joined
SET old_dateset_id = new_dateset_id
;
3. when start is not unique
It is possible to do it without temporary column even in this case. The main idea is the following. Let's have a look at this example:
We have two rows in reminder
with the same start
value and IDs 3 and 7:
reminder
id start dateset_id
3 2015-01-01 NULL
7 2015-01-01 NULL
After we insert them into the dateset
, there will be new IDs generated, for example, 1 and 2:
dateset
id start
1 2015-01-01
2 2015-01-01
It doesn't really matter how we link these two rows. The end result could be
reminder
id start dateset_id
3 2015-01-01 1
7 2015-01-01 2
or
reminder
id start dateset_id
3 2015-01-01 2
7 2015-01-01 1
Both of these variants are correct. Which brings us to the following solution.
Simply insert all rows first.
INSERT INTO dateset (start)
SELECT start FROM reminder;
Match/join two tables on start
column knowing that it is not unique. "Make it" unique by adding ROW_NUMBER
and joining by two columns. It is possible to make the query shorter, but I spelled out each step explicitly:
WITH
CTE_reminder_rn
AS
(
SELECT
id
,start
,dateset_id
,ROW_NUMBER() OVER (PARTITION BY start ORDER BY id) AS rn
FROM reminder
)
,CTE_dateset_rn
AS
(
SELECT
id
,start
,ROW_NUMBER() OVER (PARTITION BY start ORDER BY id) AS rn
FROM dateset
)
,CTE_Joined
AS
(
SELECT
CTE_reminder_rn.id AS reminder_id
,CTE_reminder_rn.dateset_id AS old_dateset_id
,CTE_dateset_rn.id AS new_dateset_id
FROM
CTE_reminder_rn
INNER JOIN CTE_dateset_rn ON
CTE_dateset_rn.start = CTE_reminder_rn.start AND
CTE_dateset_rn.rn = CTE_reminder_rn.rn
)
UPDATE CTE_Joined
SET old_dateset_id = new_dateset_id
;
I hope it is clear from the code what it does, especially when you compare it to the simpler version without ROW_NUMBER
. Obviously, the complex solution will work even if start
is unique, but it is not as efficient, as a simple solution.
This solution assumes that dateset
is empty before this process.
Update based on changes in Postgres:
Using INSERT RETURNING in subqueries is, according to the documentation, supported, for Postgres versions 9.1 and after. The hypothetical DML subquery in the original answer should work for Postgres >= 9.1:
UPDATE reminder SET dateset_id = (
INSERT INTO dateset (start)
VALUES (reminder.start)
RETURNING dateset.id));
Original answer:
Here's another way of doing it, distinct from the 3 ways Vladimir suggested so far.
A temporary function will let you read the id of the new rows created as well as other values in the query:
--minimal demonstration schema
CREATE TABLE dateset (
id SERIAL PRIMARY KEY,
start TIMESTAMP
-- other things here...
);
CREATE TABLE reminder (
id SERIAL PRIMARY KEY,
start TIMESTAMP,
dateset_id INTEGER REFERENCES dateset(id)
-- other things here...
);
--pre-migration data
INSERT INTO reminder (start) VALUES ('2014-02-14'), ('2014-09-06'), ('1984-01-01'), ('2014-02-14');
--all at once
BEGIN;
CREATE FUNCTION insertreturning(ts TIMESTAMP) RETURNS INTEGER AS $$
INSERT INTO dateset (start)
VALUES (ts)
RETURNING dateset.id;
$$ LANGUAGE SQL;
UPDATE reminder SET dateset_id = insertreturning(reminder.start);
DROP FUNCTION insertreturning(TIMESTAMP);
ALTER TABLE reminder DROP COLUMN start;
END;
This approach to the problem suggested itself after I realized that writing INSERT ... RETURNING
as a subquery would solve the issue; although INSERT
s are not allowed as subqueries, calls to functions certainly are.
Intriguingly, this suggests that DML subqueries that return values might be broadly useful. If they were possible, we would just write:
UPDATE reminder SET dateset_id = (
INSERT INTO dateset (start)
VALUES (reminder.start)
RETURNING dateset.id));
You can only return columns using RETURNING from the INSERT-part, not from the selected table. So, if you are willing to add a column reminder_id to your dateset-table,
ALTER TABLE dateset ADD COLUMN reminder_id integer;
the following statement will work:
WITH inserted_datesets AS (
INSERT INTO dateset (start, reminder_id)
SELECT start, id FROM reminder
RETURNING reminder_id, id AS dateset_id
)
UPDATE reminder
SET dateset_id = ids.dateset_id
FROM inserted_datesets AS ids
WHERE id = reminder_id
Only if the values of the column start in reminders are all unique, the following 2 statements will work as well:
INSERT INTO dateset(start) SELECT start FROM reminder;
UPDATE reminder SET dateset_id = (SELECT id FROM dateset WHERE start=reminder.start);