Timeout issues on Google Cloud Functions and AWS Lambda
I don't know another cloud provider than AWS, so I'll answer based on AWS. I always use AWS Lambda whenever I can except when the running time is greater than 15 minutes. In this case, I use AWS Batch (AWS Batch – Run Batch Computing Jobs on AWS).
You can also use AWS Fargate, but you'll have to configure clusters and a docker image.
EDIT 1:
Batch can be sent events via API Gateway like you would to Lambda I assume?
I've never triggered a Batch Job via API Gateway directly (I don't know if this is possible). I've always used API Gateway to trigger a Lambda and Lambda trigger Batch (check out this workflow, please, to have a better idea).
Also, you may use AWS CloudWatch events to trigger an AWS Batch Job. For instance, if you upload a file to S3 before transcript, you may trigger AWS Batch Job by S3 events (check out this step by step, please).
How simple is it to convert a zipped Lambda function to a AWS Fargate image?
It's no so difficult if you know about Docker, AWS ECR and ECS clusters.
First, you need to create a Docker image with your source code. Check out this step by step, please. Basically, you'll unzip your code, copy to the docker image, run
npm install
and run a command in a Dockerfile.After that, you may create an AWS ECR in which you'll upload your Docker image.
Create an AWS ECS cluster
Create an AWS Fargate task
Finally, run the task via Lambda.
If you don't have experience with Docker and AWS Fargate, AWS Batch is easier to implement.
You might take a look at Architecture for using Cloud Pub/Sub for long-running tasks and Cloud Speech-to-Text as part of Google solutions.
In the first link explains the architecture and workflow for how to use Cloud Pub/Sub as a queuing system for processing potentially long-running tasks (automatic transcription of audio files as an example).
Talking about Cloud Speech-to-Text, enables easy integration of Google speech recognition technologies into developer applications. Send audio and receive a text transcription from the Speech-to-Text API service.
Use asynchronous requests instead of synchronous ones. Your current workflow involves requesting a speech-to-text task and then blocking until the API responds. On top of the timeout problem you've outlined there's a problem of extra costs: you end up paying for Lambda invocation time for however long the speech-to-text API spends processing your request, i.e., you're paying for Lambda doing essentially nothing.
In an asynchronous workflow you send a request and the API responds right away with an operation identifier. The Lambda function can terminate at this point while the speech-to-text API proceeds processing your task in the background. Now you can either use the operation identifier to poll for task completion (using a scheduled Cloud Function, for example), or in case of AWS Polly use the SnsTopicArn
or OutputS3BucketName
properties to fire off another Lambda upon task completion.
Check the AWS and GCP API docs on asynchronous requests for more info. Also, AWS provides more in-depth docs on asynchronous audio processing here and here.