How can I import a JSON file into PostgreSQL?
In near big-data cases, the most efficient way to import json from a file, not using any external tool, appear to not import a single json from a file but rather a single column csv: A list of one-line jsons:
data.json.csv:
{"id": 23635,"name": "Jerry Green","comment": "Imported from facebook."}
{"id": 23636,"name": "John Wayne","comment": "Imported from facebook."}
then, under psql:
create table t ( j jsonb )
\copy t from 'd:\path\data.json.csv'
One record per json (line) will be added into t table.
"\copy from" import was made for csv, and as such loads data line by line. As a result reading one json per line rather than a single json array to be later splited, will not use any intermediate table and will achieve high throughput.
More of that you will less likely hit the max input line-size limitation that will arise if your input json file is too big.
I would thus first convert your input into a single column csv to then import it using the copy command.
You can feed the JSON into a SQL statement that extracts the information and inserts that into the table. If the JSON attributes have exactly the name as the table columns you can do something like this:
with customer_json (doc) as (
values
('[
{
"id": 23635,
"name": "Jerry Green",
"comment": "Imported from facebook."
},
{
"id": 23636,
"name": "John Wayne",
"comment": "Imported from facebook."
}
]'::json)
)
insert into customer (id, name, comment)
select p.*
from customer_json l
cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update
set name = excluded.name,
comment = excluded.comment;
New customers will be inserted, existing ones will be updated. The "magic" part is the json_populate_recordset(null::customer, doc)
which generates a relational representation of the JSON objects.
The above assumes a table definition like this:
create table customer
(
id integer primary key,
name text not null,
comment text
);
If the data is provided as a file, you need to first put that file into some table in the database. Something like this:
create unlogged table customer_import (doc json);
Then upload the file into a single row of that table, e.g. using the \copy
command in psql
(or whatever your SQL client offers):
\copy customer_import from 'customers.json' ....
Then you can use the above statement, just remove the CTE and use the staging table:
insert into customer (id, name, comment)
select p.*
from customer_import l
cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update
set name = excluded.name,
comment = excluded.comment;
It turns out there's an easy way to import a multi-line JSON object into a JSON column in a postgres database using the command line psql tool, without needing to explicitly embed the JSON into the SQL statement. The technique is documented in the postgresql docs, but it's a bit hidden.
The trick is to load the JSON into a psql variable using backticks. For example, given a multi-line JSON file in /tmp/test.json such as:
{
"dog": "cat",
"frog": "frat"
}
We can use the following SQL to load it into a temporary table:
sql> \set content `cat /tmp/test.json`
sql> create temp table t ( j jsonb );
sql> insert into t values (:'content');
sql> select * from t;
which gives the result:
j
────────────────────────────────
{"dog": "cat", "frog": "frat"}
(1 row)
You can also perform operations on the data directly:
sql> select :'content'::jsonb -> 'dog';
?column?
──────────
"cat"
(1 row)
Under the covers this is just embedding the JSON in the SQL, but it's a lot neater to let psql perform the interpolation itself.