How to ignore errors with psql \copy meta-command
Workaround: remove the reported errant line using sed
and run \copy
again
Later versions of Postgres (including Postgres 13), will report the line number of the error. You can then remove that line with sed
and run \copy again, e.g.,
#!/bin/bash
bad_line_number=5 # assuming line 5 is the bad line
sed ${bad_line_number}d < input.csv > filtered.csv
[per the comment from @Botond_Balázs ]
Here's one solution -- import the batch file one line at a time. The performance can be much slower, but it may be sufficient for your scenario:
#!/bin/bash
input_file=./my_input.csv
tmp_file=/tmp/one-line.csv
cat $input_file | while read input_line; do
echo "$input_line" > $tmp_file
psql my_database \
-c "\
COPY my_table \
FROM `$tmp_file` \
DELIMITER '|'\
CSV;\
"
done
Additionally, you could modify the script to capture the psql
stdout/stderr and exit
status, and if the exit status is non-zero, echo $input_line
and the captured stdout/stderr to stdin and/or append it to a file.
It is too bad that in 25 years Postgres doesn't have -ignore-errors
flag or option for COPY
command. In this era of BigData you get a lot of dirty records and it can be very costly for the project to fix every outlier.
I had to make a work-around this way:
- Copy the original table and call it
dummy_original_table
- in the original table, create a trigger like this:
CREATE OR REPLACE FUNCTION on_insert_in_original_table() RETURNS trigger AS $$
DECLARE
v_rec RECORD;
BEGIN
-- we use the trigger to prevent 'duplicate index' error by returning NULL on duplicates
SELECT * FROM original_table WHERE primary_key=NEW.primary_key INTO v_rec;
IF v_rec IS NOT NULL THEN
RETURN NULL;
END IF;
BEGIN
INSERT INTO original_table(datum,primary_key) VALUES(NEW.datum,NEW.primary_key)
ON CONFLICT DO NOTHING;
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
RETURN NULL;
END;
- Run a copy into the dummy table. No record will be inserted there, but all of them will be inserted in the original_table
psql dbname -c \copy dummy_original_table(datum,primary_key) FROM '/home/user/data.csv' delimiter E'\t'
You cannot skip the errors without skipping the whole command up to and including Postgres 14. There is currently no more sophisticated error handling.
\copy
is just a wrapper around SQL COPY
that channels results through psql. The manual for COPY
:
COPY
stops operation at the first error. This should not lead to problems in the event of aCOPY TO
, but the target table will already have received earlier rows in aCOPY FROM
. These rows will not be visible or accessible, but they still occupy disk space. This might amount to a considerable amount of wasted disk space if the failure happened well into a large copy operation. You might wish to invokeVACUUM
to recover the wasted space.
Bold emphasis mine. And:
COPY FROM
will raise an error if any line of the input file contains more or fewer columns than are expected.
COPY
is an extremely fast way to import / export data. Sophisticated checks and error handling would slow it down.
There was an attempt to add error logging to COPY
in Postgres 9.0 but it was never committed.
Solution
Fix your input file instead.
If you have one or more additional columns in your input file and the file is otherwise consistent, you might add dummy columns to your table isa
and drop those afterwards. Or (cleaner with production tables) import to a temporary staging table and INSERT
selected columns (or expressions) to your target table isa
from there.
Related answers with detailed instructions:
- How to update selected rows with values from a CSV file in Postgres?
- COPY command: copy only specific columns from csv