Cluster autoscaler not downscaling

Answering myself for visibility.

The problem is that the CA never considers moving anything unless all the requirements mentioned in the FAQ are met at the same time. So lets say I have 100 nodes with 51% CPU requests. It still wont consider downscaling.

One solution is to increase the value at which CA checks, now 50%. But unfortunately that is not supported by GKE, see answer from google support @GalloCedrone:

Moreover I know that this value might sound too low and someone could be interested to keep as well a 85% or 90% to avoid your scenario. Currently there is a feature request open to give the user the possibility to modify the flag "--scale-down-utilization-threshold", but it is not implemented yet.

The workaround I found is to decrease the CPU request (100m instead of 300m) of the pods and have the Horizontal Pod Autoscaler (HPA) create more on demand. This is fine for me but if your application is not suitable for many small instances you are out of luck. Perhaps a cron job that cordons a node if the total utilization is low?


I agree that according to [Documentation][1] it seems that "gke-name-cluster-default-pool" could be safely deleted, conditions:

  • The sum of cpu and memory requests of all pods running on this node is smaller than 50% of the node's allocatable.
  • All pods running on the node (except these that run on all nodes by default, like manifest-run pods or pods created by DaemonSets) can be moved to other nodes.
  • It doesn't have scale-down disabled annotation Therefore there should remove it after 10 minutes it is considered not needed.

However checking the [Documentation][2] I found:

What types of pods can prevent CA from removing a node?

[...] Kube-system pods that are not run on the node by default, * [..]

heapster-v1.5.2--- is running on the node and it is a Kube-system pod that is not run on the node by default.

I will update the answer if I discover more interesting information.

UPDATE

The fact that the node it is the last one in the zone is not an issue.

To prove it I tested on a brand new cluster with 3 nodes each one in a different zone, one of them was without any workload apart from "kube-proxy" and "fluentd" and was correctly deleted even if it was bringing the size of the zone to zero. [1]: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md [2]: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node