Cloud Vision API - PDF OCR
On April 6, 2018, support for PDF and TIFF files in document text detection was added to Google Cloud Vision API (see Release Notes).
According to the documentation:
The Vision API can detect and transcribe text from PDF and TIFF files stored in Google Cloud Storage.
Document text detection from PDF and TIFF must be requested using the asyncBatchAnnotate function, which performs an asynchronous request and provides its status using the operations resources.
Output from a PDF/TIFF request is written to a JSON file created in the specified Google Cloud Storage bucket.
Example:
1) Upload a file to your Google Cloud Storage
2) Make a POST request to perform PDF/TIFF document text detection
Request:
POST https://vision.googleapis.com/v1p2beta1/files:asyncBatchAnnotate
Authorization: Bearer <your access token>
{
"requests":[
{
"inputConfig": {
"gcsSource": {
"uri": "gs://<your bucket name>/input.pdf"
},
"mimeType": "application/pdf"
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
],
"outputConfig": {
"gcsDestination": {
"uri": "gs://<your bucket name>/output/"
},
"batchSize": 1
}
}
]
}
Response:
{
"name": "operations/9b1f9d773d216406"
}
3) Make a GET request to check if document text detection is done
Request:
GET https://vision.googleapis.com/v1/operations/9b1f9d773d216406
Authorization: Bearer <your access token>
Response:
{
"name": "operations/9b1f9d773d216406",
"metadata": {
"@type": "type.googleapis.com/google.cloud.vision.v1p2beta1.OperationMetadata",
"state": "RUNNING",
"updateTime": "2018-06-17T20:18:09.117787733Z"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.vision.v1p2beta1.AsyncBatchAnnotateFilesResponse",
"responses": [
{
"outputConfig": {
"gcsDestination": {
"uri": "gs://<your bucket name>/output/"
},
"batchSize": 1
}
}
]
}
}
4) Check the results in the specified Google Cloud Storage folder
https://cloud.google.com/vision/docs/pdf
I know this question is old, but now Google Vision released support for PDF!
In 2016 PDF and TIFF formats was not supported for Cloud Vision.
The accepted formats are : (taken from the the doc)
- JPEG
- PNG8
- PNG24
- GIF
- Animated GIF (first frame only)
- BMP
- WEBP
- RAW
- ICO
But now are added.
Docs for jpg
:
https://cloud.google.com/vision/docs/ocr
Docs for pdf
https://cloud.google.com/vision/docs/pdf