How do I insert a row which contains a foreign key?
Your syntax is almost good, needs some parenthesis around the subqueries and it will work:
INSERT INTO bar (description, foo_id) VALUES
( 'testing', (SELECT id from foo WHERE type='blue') ),
( 'another row', (SELECT id from foo WHERE type='red' ) );
Tested at DB-Fiddle
Another way, with shorter syntax if you have a lot of values to insert:
WITH ins (description, type) AS
( VALUES
( 'more testing', 'blue') ,
( 'yet another row', 'green' )
)
INSERT INTO bar
(description, foo_id)
SELECT
ins.description, foo.id
FROM
foo JOIN ins
ON ins.type = foo.type ;
Plain INSERT
INSERT INTO bar (description, foo_id)
SELECT val.description, f.id
FROM (
VALUES
(text 'testing', text 'blue') -- explicit type declaration; see below
, ('another row' , 'red' )
, ('new row1' , 'purple') -- purple does not exist in foo, yet
, ('new row2' , 'purple')
) val (description, type)
LEFT JOIN foo f USING (type);
LEFT [OUTER] JOIN
instead of[INNER] JOIN
means that rows fromval
all rows are kept, even when no match is found infoo
. Instead,NULL
is entered forfoo_id
(which raises an exception if the column is definedNOT NULL
).The
VALUES
expression in the subquery does the same as @ypercube's CTE. Common Table Expressions offer additional features and are easier to read in big queries, but they also pose as optimization barriers (up to Postgres 12). Subqueries are typically a bit faster when none of the above is needed.You may need explicit type casts. Since the
VALUES
expression is not directly attached to a table (like inINSERT ... VALUES ...
), types cannot be derived and default data types are used unless typed explicitly. This may not work in all cases. It's enough to do it in the first row, the rest falls in line.
INSERT
missing FK rows at the same time
To create missing entries in foo
on the fly, in a single SQL statement, CTEs are instrumental:
WITH sel AS (
SELECT val.description, val.type, f.id AS foo_id
FROM (
VALUES
(text 'testing', text 'blue')
, ('another row', 'red' )
, ('new row1' , 'purple')
, ('new row2' , 'purple')
) val (description, type)
LEFT JOIN foo f USING (type)
)
, ins AS (
INSERT INTO foo (type)
SELECT DISTINCT type FROM sel WHERE foo_id IS NULL
RETURNING id AS foo_id, type
)
INSERT INTO bar (description, foo_id)
SELECT sel.description, COALESCE(sel.foo_id, ins.foo_id)
FROM sel
LEFT JOIN ins USING (type);
Old sqlfiddle for Postgres 9.6 - works the same in 9.1. Also see new fiddle below!
Note the two additional rows to insert. Both are purple, which does not exist in foo
, yet. Two rows to illustrate the need for DISTINCT
in the first INSERT
statement.
Step-by-step explanation
The 1st CTE
sel
provides multiple rows of input data. The subqueryval
with theVALUES
expression can be replaced with a table or subquery as source. ImmediatelyLEFT JOIN
tofoo
to append thefoo_id
for pre-existingtype
rows. All other rows getfoo_id IS NULL
this way.The 2nd CTE
ins
inserts distinct new types (foo_id IS NULL
) intofoo
, and returns the newly generatedfoo_id
- together with thetype
to join back to insert rows.The final outer
INSERT
can now insert afoo_id
for every row: either the type pre-existed, or it was inserted in step 2.
Strictly speaking, both inserts happen "in parallel", but since this is a single statement, default FOREIGN KEY
constraints will not complain. Referential integrity is enforced at the end of the statement by default.
There is a tiny race condition if you run multiple of these queries concurrently. See:
- Atomic UPDATE .. SELECT in Postgres
- Is SELECT or INSERT in a function prone to race conditions?
- How do I implement insert-if-not-found for transactions at serializable isolation level?
Really only happens under heavy concurrent load, if ever. In comparison to caching solutions like advertised in another answer, the chance is super-tiny.
Function for repeated use
Create an SQL function that takes an array of composite type as parameter and use unnest(param)
in place of the VALUES
expression.
Or, if the syntax for such an array seems too messy, use a comma-separated string as parameter _param
. For instance of the form:
'description1,type1;description2,type2;description3,type3'
Then use this to replace the VALUES
expression in above statement:
SELECT split_part(x, ',', 1) AS description
split_part(x, ',', 2) AS type
FROM unnest(string_to_array(_param, ';')) x;
Function with UPSERT in Postgres 9.5 or later
Create a custom row type for parameter passing. We could do without it, but it's simpler:
CREATE TYPE foobar AS (description text, type text);
Function:
CREATE OR REPLACE FUNCTION f_insert_foobar(VARIADIC _val foobar[])
RETURNS void
LANGUAGE sql AS
$func$
WITH val AS (SELECT * FROM unnest(_val)) -- well-known row type
, typ AS (
SELECT v.type, f.id -- id NOT NULL where type already exists
FROM (SELECT DISTINCT type FROM val) v -- DISTINCT!
LEFT JOIN foo f USING (type) -- assuming no concurrent update/delete on foo
-- else you might lock rows here.
)
, ins AS (
INSERT INTO foo AS f (type)
SELECT type
FROM typ
WHERE id IS NULL
ON CONFLICT (type) DO UPDATE -- RARE cases of concurrent inserts
SET type = EXCLUDED.type -- overwrite to make visible
RETURNING f.type, f.id
)
INSERT INTO bar AS b (description, foo_id)
SELECT v.description, COALESCE(t.id, i.id) -- assuming most types pre-exist
FROM val v
LEFT JOIN typ t USING (type) -- already existed
LEFT JOIN ins i USING (type) -- newly inserted
ON CONFLICT (description) DO UPDATE -- description already exists
SET foo_id = EXCLUDED.foo_id -- real UPSERT this time
WHERE b.foo_id <> EXCLUDED.foo_id; -- only if actually changed
$func$;
Call:
SELECT f_insert_foobar(
'(testing,blue)'
, '(another row,red)'
, '(new row1,purple)'
, '(new row2,green)'
, '("with,comma",green)' -- added to demonstrate row syntax
);
db<>fiddle here
Fast and rock-solid for environments with concurrent transactions.
In addition to the queries above, this function ...
... applies
SELECT
orINSERT
onfoo
: Anytype
that doesn't exist in the FK table, yet, is inserted. Assuming most types pre-exist.... applies
INSERT
orUPDATE
(true "UPSERT") onbar
: If thedescription
already exists, itstype
is updated - but only if it actually changes. See:- How do I (or can I) SELECT DISTINCT on multiple columns?
... passes values as well-known row types with a
VARIADIC
function parameter. Note the default maximum of 100 function parameters! See:- Select rows such that names match elements of input array for pgsql function
There are many other ways to pass multiple rows ...
Related:
- How to use RETURNING with ON CONFLICT in PostgreSQL?
- How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
Lookup. You basically need the foo id's to insert them into bar.
Not postgres specific, btw. (and you did not tag it like that) - this is generally how SQL works. No shortcuts here.
Application wise, though, you may have a cache of foo items in memory. My tables often have up to 3 unique fields:
- Id (integer or something) that is the table level primary key.
- Identifier, which is a GUID that is used as stable ID application level wise (and may be exposed to the customer in URL's etc.)
- Code - a string that may be there and has to be unique if it is there (sql server: filtered unique index on not null). That is a customer set identifier.
Example:
- Account (in a trading application) -> Id is a int used for foreign keys. -> Identifier is a Guid and used in the web portals etc. - always accepted. -> Code is manually set. Rule: once set it does not change.
Obviously when you want to link something to an account - you first must, technically, get the Id - but given both Identifier and Code never change once they are there, a positive cache in memory kan stop most lookups from hitting the database.