Setup Cloud NAT for public GKE clusters
Using google's Cloud NAT with public GKE clusters works!
First a cloud NAT gateway and router needs to be setup using a reserved external IP.
Once that's done the ip-masq-agent
configuration needs to be changed to not masquerade the pod IPs for the external IPs that are the target of requests from inside the cluster. Changing the configuration is done in the nonMasqueradeCidrs
list in the ConfigMap for the ip-masq-agent.
The way this works is that for every outgoing requests to an IP in the nonMasqueradeCidrs
list IP masquerading is not done. So the requests does not seem to originate from the node IP but from the pod IP. This internal IP is then automatically NATed by the Cloud NAT gateway/router. The result is that the request seems to originate from the (stable) IP of the cloud NAT router.
Sources:
- https://rajathithanrajasekar.medium.com/google-cloud-public-gke-clusters-egress-traffic-via-cloud-nat-for-ip-whitelisting-7fdc5656284a
- https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent
"Unfortunately, this is not currently the case. While Cloud NAT is still in Beta, certain settings are not fully in place and thus the pods are still using SNAT even with IP aliasing. Because of the SNAT to the node's IP, the pods will not use Cloud NAT."
Indeed, as Patrick W says above, it's not currently working as documented. I tried as well, and spoke with folks on the GCP Slack group in the Kubernetes Engine channel. They also confirmed in testing that it only works with a GKE private cluster. We haven't started playing with private clusters yet. I can't find solid documentation on this simple question: If I create a private cluster, can I still have public K8S services (aka load balancers) in that cluster? All of the docs about private GKE clusters indicate you don't want any external traffic coming in, but we're running production Internet-facing services on our GKE clusters.
I filed a ticket with GCP support about the Cloud NAT issue, and here's what they said:
"I have been reviewing your configuration and the reason that Cloud NAT is not working is because your cluster is not private. To use Cloud NAT with GKE you have to create a private cluster. In the non-private cluster the public IP addresses of the cluster are used for communication between the master and the nodes. That’s why GKE is not taking into consideration the Cloud NAT configuration you have. Creating a private cluster will allow you to combine Cloud NAT and GKE.
I understand this is not very clear from our documentation and I have reported this to be clarified and explained exactly how it is supposed to work."
I responded asking them to please make it work as documented, rather than changing their documentation. I'm waiting for an update from them...
The idea here is that if you use native VPC (Ip alias) for the cluster, your pods will not use SNAT when routing out of the cluster. With no SNAT, the pods will not use the node's external IP and thus should use the Cloud NAT.
Unfortunately, this is not currently the case. While Cloud NAT is still in Beta, certain settings are not fully in place and thus the pods are still using SNAT even with IP aliasing. Because of the SNAT to the node's IP, the pods will not use Cloud NAT.
This being said, why not use a private cluster? It's more secure and will work with Cloud NAT. You can't SSH directly into a node, but A) you can create a bastion VM instance in your project that can SSH using the internal IP flag and B) you generally do not need to SSH into the node on most occassions.