In Kubernetes, how can I scale a Deployment to zero when idle

I ended up implementing a custom solution:

Compared to Knative, it's more of a "toy" project, fairly small, and has no dependency on istio. It's been working well for my use case, though I do not recommend others use it without being willing to adopt the code as your own.

Actually Kubernetes supports the scaling to zero only by means of an API call, since the Horizontal Pod Autoscaler does support scaling down to 1 replica only.

Anyway there are a few Operator which allow you to overtake that limitation by intercepting the requests coming to your pods or by inspecting some metrics.

You can take a look at Knative or Keda. They enable your application to be serverless and they do so in different ways.

Knative, by means of Istio intercept the requests and if there's an active pod serving them, it redirects the incoming request to that one, otherwise it trigger a scaling.

By contrast, Keda best fits event-driven architecture, because it is able to inspect predefined metrics, such as lag, queue lenght or custom metrics (collected from Prometheus, for example) and trigger the scaling.

Both support scale to zero in case predefined conditions are met in a equally predefined window.

Hope it helped.