Howto search in PDFs using regular expressions?
several options:
- Agent Ransack (top answer in Best way to *confidently* search files and contents in Windows without using an indexing service? )
- DnGrep which is a Free and Open source software. Unfortunately it is at the moment only available on Windows. (a feature request has been opened for other platforms here)
- Agent Ransack is free (lite) and supports PDF as its release notes confirm.
- PowerGREP is a commercial product.
Just as you said, the evident alternative is to convert PDFs to text. One way for a programmer to set that up for bulk processing is by using the Python package PDFMiner. Agent Ransack uses "pdftotext" from the Xpdf project (and you can too).