Cloud Vision API - PDF OCR

On April 6, 2018, support for PDF and TIFF files in document text detection was added to Google Cloud Vision API (see Release Notes).

According to the documentation:

  • The Vision API can detect and transcribe text from PDF and TIFF files stored in Google Cloud Storage.

  • Document text detection from PDF and TIFF must be requested using the asyncBatchAnnotate function, which performs an asynchronous request and provides its status using the operations resources.

  • Output from a PDF/TIFF request is written to a JSON file created in the specified Google Cloud Storage bucket.


Example:

1) Upload a file to your Google Cloud Storage

enter image description here

2) Make a POST request to perform PDF/TIFF document text detection

Request:

POST https://vision.googleapis.com/v1p2beta1/files:asyncBatchAnnotate
Authorization: Bearer <your access token>

{
  "requests":[
    {
      "inputConfig": {
        "gcsSource": {
          "uri": "gs://<your bucket name>/input.pdf"
        },
        "mimeType": "application/pdf"
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ],
      "outputConfig": {
        "gcsDestination": {
          "uri": "gs://<your bucket name>/output/"
        },
        "batchSize": 1
      }
    }
  ]
}

Response:

{
  "name": "operations/9b1f9d773d216406"
}

3) Make a GET request to check if document text detection is done

Request:

GET https://vision.googleapis.com/v1/operations/9b1f9d773d216406
Authorization: Bearer <your access token>

Response:

{
    "name": "operations/9b1f9d773d216406",
    "metadata": {
        "@type": "type.googleapis.com/google.cloud.vision.v1p2beta1.OperationMetadata",
        "state": "RUNNING",
        "updateTime": "2018-06-17T20:18:09.117787733Z"
    },
    "done": true,
    "response": {
        "@type": "type.googleapis.com/google.cloud.vision.v1p2beta1.AsyncBatchAnnotateFilesResponse",
        "responses": [
            {
                "outputConfig": {
                    "gcsDestination": {
                        "uri": "gs://<your bucket name>/output/"
                    },
                    "batchSize": 1
                }
            }
        ]
    }
}

4) Check the results in the specified Google Cloud Storage folder

enter image description here


https://cloud.google.com/vision/docs/pdf

I know this question is old, but now Google Vision released support for PDF!


In 2016 PDF and TIFF formats was not supported for Cloud Vision.

The accepted formats are : (taken from the the doc)

  • JPEG
  • PNG8
  • PNG24
  • GIF
  • Animated GIF (first frame only)
  • BMP
  • WEBP
  • RAW
  • ICO

But now are added.

Docs for jpg:

https://cloud.google.com/vision/docs/ocr

Docs for pdf

https://cloud.google.com/vision/docs/pdf