Why an iptables NAT does not happen in the network namespace separated transparent proxy setup?
I have to assume that the transparent proxy is acting as a router, at least for ICMP, so will route back the ICMP echo where it came from (veth0).
Finding the problem
When reproducing your setup and witnessing your problem, I added a TRACE on the host using iptables (legacy, which might have slight differences with iptables-nft's version) like this (I also forced the creation of the filter table (iptables -S
) to have it in the traces):
iptables -t raw -A PREROUTING -j TRACE
And a single ping shows in kernel logs (hint, if host isn't the actual initial host: sysctl -w net.netfilter.nf_log_all_netns=1
):
TRACE: raw:PREROUTING:policy:2 IN=client-veth0 OUT= MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1
TRACE: nat:PREROUTING:policy:1 IN=client-veth0 OUT= MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1
TRACE: filter:FORWARD:policy:1 IN=client-veth0 OUT=proxy-veth0 MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1
TRACE: nat:POSTROUTING:policy:2 IN=client-veth0 OUT=proxy-veth0 MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1
TRACE: raw:PREROUTING:policy:2 IN=proxy-veth0 OUT= MAC=16:c9:3c:d4:ad:8c:8a:84:06:5d:88:e2:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1
TRACE: filter:FORWARD:policy:1 IN=proxy-veth0 OUT=enp4s0 MAC=16:c9:3c:d4:ad:8c:8a:84:06:5d:88:e2:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1
At the same time, having conntrack -E
running on the host shows the matching:
# conntrack -E
[NEW] icmp 1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=3508 [UNREPLIED] src=8.8.8.8 dst=10.10.1.1 type=0 code=0 id=3508
What happened:
- conntrack (which handles NAT) doesn't care about routes (eg: there's no interface in the conntrack database), only about addresses,
- the nat table will only see packets in NEW states,
- the time when conntrack added a NEW entry in its database was when the packet was routed from client-veth0 to proxy-veth0: not matching the POSTROUTING rule,
- the second round when routing from proxy-veth0 to enp4s0 the packet matched an entry in conntrack and the nat table was not called again,
- packet leaves to Internet non-NATed.
Since this conntrack's limitation hindered some use cases in the past, like yours, an additional feature was added:
conntrack zones
A zone is simply a numerical identifier associated with a network device that is incorporated into the various hashes and used to distinguish entries in addition to the connection tuples.
[...]
This is mainly useful when connecting multiple private networks using the same addresses (which unfortunately happens occasionally) to pass the packets through a set of veth devices and SNAT each network to a unique address, after which they can pass through the "main" zone and be handled like regular non-clashing packets and/or have NAT applied a second time based f.i. on the outgoing interface.
It allows to sort-of duplicate the conntrack facility, including NAT handling, but has to be done manually and match the problem: here the routing topology.
So here the client <-> proxy traffic, in conntrack's point of view, must be split from other traffic.
I would have preferred to also split the proxy <-> Internet traffic from the generic host traffic, but this is too difficult, because the raw table, where zones must be assigned to a packet, sees only the non-de-NATed traffic, so Internet replies will all arrive with destination 172.16.202.30). Anyway There's no duplicated flow here between both like with the client <-> proxy flow, so that's not really needed.
zone 0 (0 means no special zone): generic host traffic along with proxy <-> Internet traffic.
Nothing special to do, this is the default.
zone 1: client <-> proxy traffic. The
CT --zone
target is used. The value here is chosen arbitrarily and not needed anywhere else for this case.iptables -t raw -A PREROUTING -i client-veth0 -j CT --zone 1 iptables -t raw -A PREROUTING -i proxy-veth0 -d 10.10.1.0/24 -j CT --zone 1
The correct results (I merged both tools' outputs) are now:
TRACE: raw:PREROUTING:rule:2 IN=client-veth0 OUT= MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1
TRACE: raw:PREROUTING:policy:4 IN=client-veth0 OUT= MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1
TRACE: nat:PREROUTING:policy:1 IN=client-veth0 OUT= MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1
TRACE: filter:FORWARD:policy:1 IN=client-veth0 OUT=proxy-veth0 MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1
TRACE: nat:POSTROUTING:policy:2 IN=client-veth0 OUT=proxy-veth0 MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1
[NEW] icmp 1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 [UNREPLIED] src=8.8.8.8 dst=10.10.1.1 type=0 code=0 id=4079 zone=1
TRACE: raw:PREROUTING:policy:4 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1
TRACE: nat:PREROUTING:policy:1 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1
TRACE: filter:FORWARD:policy:1 IN=proxy-veth0 OUT=enp4s0 MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1
TRACE: nat:POSTROUTING:rule:1 IN=proxy-veth0 OUT=enp4s0 MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1
[NEW] icmp 1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 [UNREPLIED] src=8.8.8.8 dst=172.16.202.30 type=0 code=0 id=4079
TRACE: raw:PREROUTING:policy:4 IN=enp4s0 OUT= MAC=5e:e8:0c:bf:96:d9:b2:e7:bc:df:1f:8e:08:00 SRC=8.8.8.8 DST=172.16.202.30 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1
TRACE: filter:FORWARD:policy:1 IN=enp4s0 OUT=proxy-veth0 MAC=5e:e8:0c:bf:96:d9:b2:e7:bc:df:1f:8e:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1
[UPDATE] icmp 1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 src=8.8.8.8 dst=172.16.202.30 type=0 code=0 id=4079
TRACE: raw:PREROUTING:rule:3 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=60 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1
TRACE: raw:PREROUTING:policy:4 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=60 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1
TRACE: filter:FORWARD:policy:1 IN=proxy-veth0 OUT=client-veth0 MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=59 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1
[UPDATE] icmp 1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 src=8.8.8.8 dst=10.10.1.1 type=0 code=0 id=4079 zone=1
Here a single first packet from a new flow triggers twice iptables' nat table, the first time with no effect. Actually conntrack considers there are two flows, because the first flow has the additional attribute zone=1
.