How to programmatically search a PDF document in c#

There are a few libraries available out there. Check out http://www.codeproject.com/KB/cs/PDFToText.aspx and http://itextsharp.sourceforge.net/

It takes a little bit of effort but it's possible.


You can use Docotic.Pdf library to search for text in PDF files.

Here is a sample code:

static void searchForText(string path, string text)
{
    using (PdfDocument pdf = new PdfDocument(path))
    {
        for (int i = 0; i < pdf.Pages.Count; i++)
        {
            string pageText = pdf.Pages[i].GetText();
            int index = pageText.IndexOf(text, 0, StringComparison.CurrentCultureIgnoreCase);
            if (index != -1)
                Console.WriteLine("'{0}' found on page {1}", text, i);
        }
    }
}

The library can also extract formatted and plain text from the whole document or any document page.

Disclaimer: I work for Bit Miracle, vendor of the library.

Tags:

C#

.Net

Pdf

Search