How to find out why is text not searchable in a PDF (and make it searchable)

  • It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.

  • It may render characters individually out of sequence

  • It may have had characters flattened to paths

See Stack Overflow questions How do you debug PDF files? and the now deleted PDF Font encoding — why can't I copy text from a PDF?

To make it text searchable, the best way may be to go back to the original source (e.g. a Word document) and use a different process to produce the PDF. Alternatively you could try rendering your current PDF as a bitmap and then using OCR, but this will be tedious and produce poor results.

Tags:

Pdf

Search