Check if PDF files are corrupted using command line on Linux
You can try doing it with pdfinfo
(here on Fedora in the poppler-utils
package). pdfinfo
gets information about the PDF file from its dictionary, so if it finds it the file should be ok
for f in *.pdf; do
if ! pdfinfo "$f" &> /dev/null; then
echo "$f" is broken
fi
done
find . -iname '*.pdf' | while read -r f
do
if pdftotext "$f" - &> /dev/null; then
echo "$f" was ok;
else
mv "$f" "$f.broken";
echo "$f" is broken;
fi;
done
My tool of choice for checking PDFs is qpdf
. qpdf
has a --check
argument that does well to find problems in PDFs.
Check a single PDF with qpdf
:
qpdf --check test_file.pdf
Check all PDFs in a directory with qpdf
:
find ./directory_to_scan/ -type f -iname '*.pdf' \( -exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \; -o -exec echo "{}": FAILED \; \)
Command Explanation:
find ./directory_to_scan/ -type f -iname '*.pdf'
Find all files with '.pdf' extension-exec sh -c 'qpdf --check "{}" > /dev/null && echo "{}": OK' \;
Executeqpdf
for each file found and pipe all output to/dev/null
. Also print filename followed by ': OK' if return status ofqpdf
is 0 (i.e. no errors)-o -exec echo "{}": FAILED \; \)
This gets executed if errors are found: Print filename followed by ": FAILED"
Where to get qpdf
:
qpdf
has both Linux and Windows binaries available at: https://github.com/qpdf/qpdf/releases. You could also use your package manager of choice to get it. For example on Ubuntu you can install qpdf using apt with the command:
apt install qpdf