How can I grep in PDF files?
Install the package pdfgrep
, then use the command:
find /path -iname '*.pdf' -exec pdfgrep pattern {} +
——————
Simplest way to do that:
pdfgrep 'pattern' *.pdf
pdfgrep 'pattern' file.pdf
If you have poppler-utils
installed (default on Ubuntu Desktop), you could "convert" it on the fly and pipe it to grep
:
pdftotext my.pdf - | grep 'pattern'
This won't create a .txt file.
pdfgrep was written for exactly this purpose and is available in Ubuntu.
It tries to be mostly compatible to grep
and thus provides "the power of grep", only specialized for PDFs. That includes common grep options, such as --recursive
, --ignore-case
or --color
.
In contrast to pdftotext | grep
, pdfgrep can output the page number of a match in a performant way and is generally faster when it doesn't have to search the whole document (e.g. --max-count
or --quiet
).
The basic usage is:
pdfgrep PATTERN FILE..
where PATTERN
is your search string and FILE
a list of filenames (or wildcards in a shell).
See the manpage for more infos.