Deep Dive into Ambient Mesh - Traffic Path

Ambient Mesh has been announced for a while, and there have been quite a few articles describing its usage and architecture. This article will deeply sort out the path of data plane traffic in the ambient mode to help you fully understand the implementation scheme of the ambient data plane.

Before reading this article, please read the introduction to Ambient Mesh: https://istio.io/latest/blog/2022/introducing-ambient-mesh/ to understand the basic architecture of Ambient Mesh.

For the convenience of reading and synchronous practice, the environment used in this article is deployed according to the way Ambient uses: https://istio.io/latest/blog/2022/get-started-ambient/.

01

from
the moment the request is initiated

In order to explore the traffic path, first we analyze the mutual access between two services in ambient mode (only L4 mode, different nodes).

After the ambient mode is enabled in the default namespace, all services will have the capability of grid governance.

Our analysis starts with this command: kubectl exec deploy/sleep -- curl -s [http://productpage:9080/](http://productpage:9080/) | head -n1

In Sidecar mode, Istio intercepts traffic through iptables. When curl is executed in the sleep Pod, the traffic will be forwarded by iptables to port 15001 of Sidecar for processing. But in the ambient mode, there is no sidecar in the pod, and the pod does not need to be restarted when the ambient mode is turned on, so how to ensure that its request is processed by the ztunnel?

02

Egress traffic interception

To understand the solution for egress traffic interception, we can first look at the control plane components:

kebe@pc $ kubectl -n istio-system get poNAME                                   READY   STATUS    RESTARTS   AGEistio-cni-node-5rh5z                   1/1     Running   0          20histio-cni-node-qsvsz                   1/1     Running   0          20histio-cni-node-wdffp                   1/1     Running   0          20histio-ingressgateway-5cfcb57bd-kx9hx   1/1     Running   0          20histiod-6b84499b75-ncmn7                1/1     Running   0          20hztunnel-nptf6                          1/1     Running   0          20hztunnel-vxv4b                          1/1     Running   0          20hztunnel-xkz4s                          1/1     Running   0          20h

istio-cni becomes the default component in ambient mode. In Sidecar mode, istio-cni is mainly a CNI plug-in to avoid permission leakage caused by using the istio-init container to process iptables rules. However, in Ambient mode, Sidecar is not theoretically needed, so why istio-cni needed? ?

We can take a look at the log:

kebe@pc $ kubectl -n istio-system logs istio-cni-node-qsvsz...2022-10-12T07:34:33.224957Z  info  ambient  Adding route for reviews-v1-6494d87c7b-zrpks/default: [table 100 10.244.1.4/32 via 192.168.126.2 dev istioin src 10.244.1.1]2022-10-12T07:34:33.226054Z  info  ambient  Adding pod 'reviews-v2-79857b95b-m4q2g/default' (0ff78312-3a13-4a02-b39d-644bfb91e861) to ipset2022-10-12T07:34:33.228305Z  info  ambient  Adding route for reviews-v2-79857b95b-m4q2g/default: [table 100 10.244.1.5/32 via 192.168.126.2 dev istioin src 10.244.1.1]2022-10-12T07:34:33.229967Z  info  ambient  Adding pod 'reviews-v3-75f494fccb-92nq5/default' (e41edf7c-a347-45cb-a144-97492faa77bf) to ipset2022-10-12T07:34:33.232236Z  info  ambient  Adding route for reviews-v3-75f494fccb-92nq5/default: [table 100 10.244.1.6/32 via 192.168.126.2 dev istioin src 10.244.1.1]

We can see that for Pods in ambient mode, istio-cni does two things:

  1. Add pods to ipset

  2. Added a routing rule to table 100 (used later)

We can check the contents of ipset on the node where it is located (note that the kind cluster is used here, and docker exec needs to be used to enter the host first):

kebe@pc $ docker exec -it ambient-worker2 bashroot@ambient-worker2:/# ipset listName: ztunnel-pods-ipsType: hash:ipRevision: 0Header: family inet hashsize 1024 maxelem 65536Size in memory: 520References: 1Number of entries: 5Members:10.244.1.510.244.1.710.244.1.810.244.1.410.244.1.6

We found that there is an ipset on the node where the Pod is located, in which many IPs are saved, and these IPs are Pod IPs:

kebe@pc $ kubectl get po -o wideNAME                              READY   STATUS    RESTARTS   AGE   IP           NODE              NOMINATED NODE   READINESS GATESdetails-v1-76778d6644-wn4d2       1/1     Running   0          20h   10.244.1.9   ambient-worker2   <none>           <none>notsleep-6d6c8669b5-pngxg         1/1     Running   0          20h   10.244.2.5   ambient-worker    <none>           <none>productpage-v1-7c548b785b-w9zl6   1/1     Running   0          20h   10.244.1.7   ambient-worker2   <none>           <none>ratings-v1-85c74b6cb4-57m52       1/1     Running   0          20h   10.244.1.8   ambient-worker2   <none>           <none>reviews-v1-6494d87c7b-zrpks       1/1     Running   0          20h   10.244.1.4   ambient-worker2   <none>           <none>reviews-v2-79857b95b-m4q2g        1/1     Running   0          20h   10.244.1.5   ambient-worker2   <none>           <none>reviews-v3-75f494fccb-92nq5       1/1     Running   0          20h   10.244.1.6   ambient-worker2   <none>           <none>sleep-7b85956664-z6qh7            1/1     Running   0          20h   10.244.2.4   ambient-worker    <none>           <none>

Therefore, this ipset saves the list of all Pod IPs in ambient mode on the current node.

Where can this ipset be used?

We can look at the iptables rules and find:

root@ambient-worker2:/# iptables-save*mangle...-A POSTROUTING -j ztunnel-POSTROUTING...-A ztunnel-PREROUTING -p tcp -m set --match-set ztunnel-pods-ips src -j MARK --set-xmark 0x100/0x100

Through this, we know that when a Pod in ambient mode (in ztunnel-pods-ips ipset) on a node initiates a request, its connection will be marked with 0x100/0x100.

Generally, in this case, it will be related to routing. Let's take a look at the routing rules:

root@ambient-worker2:/# ip rule0:  from all lookup local100:  from all fwmark 0x200/0x200 goto 32766101:  from all fwmark 0x100/0x100 lookup 101102:  from all fwmark 0x40/0x40 lookup 102103:  from all lookup 10032766:  from all lookup main32767:  from all lookup default

It can be seen that the traffic marked with 0x100/0x100 will go through the routing table of table 101. We can view the routing table:

root@ambient-worker2:/# ip r show table 101default via 192.168.127.2 dev istioout10.244.1.2 dev veth5db63c11 scope link

It can be seen that the default gateway has been replaced with 192.168.127.2, and the istioout network card has been removed.

But there is a problem here. The IP 192.168.127.2 does not belong to any of Node IP, Pod IP, and Cluster IP. The istioout network card should not exist by default, so who created this IP? Because the traffic needs to be sent to the ztunnel in the end, we can look at the configuration of the ztunnel to see if we can find the answer. ​​​​​​​​

kebe@pc $ kubectl -n istio-system get po ztunnel-vxv4b -o yamlapiVersion: v1kind: Podmetadata:  ...  name: ztunnel-vxv4b  namespace: istio-system  ...spec:  ...  initContainers:  - command:      ...      OUTBOUND_TUN=istioout      ...      OUTBOUND_TUN_IP=192.168.127.1      ZTUNNEL_OUTBOUND_TUN_IP=192.168.127.2
      ip link add name p$INBOUND_TUN type geneve id 1000 remote $HOST_IP      ip addr add $ZTUNNEL_INBOUND_TUN_IP/$TUN_PREFIX dev p$INBOUND_TUN
      ip link add name p$OUTBOUND_TUN type geneve id 1001 remote $HOST_IP      ip addr add $ZTUNNEL_OUTBOUND_TUN_IP/$TUN_PREFIX dev p$OUTBOUND_TUN
      ip link set p$INBOUND_TUN up      ip link set p$OUTBOUND_TUN up      ...

As above, ztunnel will be responsible for creating the istioout network card, and we can see the corresponding network card on the node. ​​​​​​​​

root@ambient-worker2:/# ip a11: istioout: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default    link/ether 0a:ea:4e:e0:8d:26 brd ff:ff:ff:ff:ff:ff    inet 192.168.127.1/30 brd 192.168.127.3 scope global istioout       valid_lft forever preferred_lft forever

Where is the gateway IP of 192.168.127.2? It is allocated inside the ztunnel.

kebe@pc $ kubectl -n istio-system exec -it ztunnel-nptf6 -- ip aDefaulted container "istio-proxy" out of: istio-proxy, istio-init (init)2: eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default    link/ether 46:8a:46:72:1d:3b brd ff:ff:ff:ff:ff:ff link-netnsid 0    inet 10.244.2.3/24 brd 10.244.2.255 scope global eth0       valid_lft forever preferred_lft forever    inet6 fe80::448a:46ff:fe72:1d3b/64 scope link       valid_lft forever preferred_lft forever4: pistioout: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000    link/ether c2:d0:18:20:3b:97 brd ff:ff:ff:ff:ff:ff    inet 192.168.127.2/30 scope global pistioout       valid_lft forever preferred_lft forever    inet6 fe80::c0d0:18ff:fe20:3b97/64 scope link       valid_lft forever preferred_lft forever

It can be seen now that the traffic will go to the ztunnel, but at this time, no other operations are performed on the traffic, it is simply routed to the ztunnel, how can Envoy in the ztunnel process the traffic?

Let's continue to look at the configuration of ztunnel, he wrote a lot of iptables rules, we can enter ztunnel to see the specific rules:​​​​​​​​

kebe@pc $ kubectl -n istio-system exec -it ztunnel-nptf6 -- iptables-saveDefaulted container "istio-proxy" out of: istio-proxy, istio-init (init)...*mangle-A PREROUTING -i pistioout -p tcp -j TPROXY --on-port 15001 --on-ip 127.0.0.1 --tproxy-mark 0x400/0xfff...COMMIT

It can be clearly seen that when the traffic enters the ztunnel, TPROXY will be used to transfer the traffic to port 15001 for processing. Here, 15001 is the port that Envoy actually listens for processing Pod egress traffic. About TPROXY, you can learn relevant information by yourself. This article will not repeat them.

So, to sum up, when the Pod is in ambient mode, its egress traffic path is roughly:

  1. Initiate traffic from processes in the Pod.

  2. The traffic flows through the node network where it is located, and is marked by the node's iptables.

  3. The routing table on the node will forward the traffic to the current node's ztunnel Pod.

  4. When the traffic arrives at the ztunnel, it will be transparently proxied by TPROXY through iptables, and the traffic will be handed over to port 15001 of Envoy in the current Pod for processing.

So far, we can see that in ambient mode, the processing of pod egress traffic is relatively complex and the path is relatively long , unlike sidecar mode, which directly completes traffic forwarding inside the pod.

03

Ingress traffic interception

With the above experience, it is not difficult to find that in the ambient mode, the interception of traffic is mainly through MARK routing + TPROXY, and the ingress traffic should be similar.

Let's take a look at the simplest analysis method. When the process on the node or the program on the other host accesses the Pod on the current node, the traffic will pass through the routing table of the host. Let's check when the corresponding access to productpage- Routing information at v1-7c548b785b-w9zl6(10.244.1.7):

root@ambient-worker2:/# ip r get 10.244.1.710.244.1.7 via 192.168.126.2 dev istioin table 100 src 10.244.1.1 uid 0    cache

We can see that when accessing 10.244.1.7, the traffic will be routed to 192.168.126.2, and this rule is added by istio-cni above.

The same IP 192.168.126.2 belongs to ztunnel:

kebe@pc $ kubectl -n istio-system exec -it ztunnel-nptf6 -- ip aDefaulted container "istio-proxy" out of: istio-proxy, istio-init (init)2: eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default    link/ether 46:8a:46:72:1d:3b brd ff:ff:ff:ff:ff:ff link-netnsid 0    inet 10.244.2.3/24 brd 10.244.2.255 scope global eth0       valid_lft forever preferred_lft forever    inet6 fe80::448a:46ff:fe72:1d3b/64 scope link       valid_lft forever preferred_lft forever3: pistioin: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000    link/ether 7e:b2:e6:f9:a4:92 brd ff:ff:ff:ff:ff:ff    inet 192.168.126.2/30 scope global pistioin       valid_lft forever preferred_lft forever    inet6 fe80::7cb2:e6ff:fef9:a492/64 scope link       valid_lft forever preferred_lft forever

Following the same analysis method, let's look at the iptables rules:

kebe@pc $ kubectl -n istio-system exec -it ztunnel-nptf6 -- iptables-save...-A PREROUTING -i pistioin -p tcp -m tcp --dport 15008 -j TPROXY --on-port 15008 --on-ip 127.0.0.1 --tproxy-mark 0x400/0xfff-A PREROUTING -i pistioin -p tcp -j TPROXY --on-port 15006 --on-ip 127.0.0.1 --tproxy-mark 0x400/0xfff...

If you directly access the PodIP+Pod port on the node, it will be forwarded to port 15006 of the ztunnel, and this is the port where Istio processes ingress traffic.

As for the traffic whose target port is port 15008, this is the port used by ztunnel for Layer 4 traffic tunneling. We'll talk about that later.

04

For Envoy's
own traffic processing

We know that in Sidecar mode, Envoy and business containers run in the same network NS. For the traffic of business containers, we need to intercept all traffic to ensure complete control of traffic, but is it necessary in Ambient mode?

The answer is no, because Envoy has been separated into other Pods, and the traffic sent by Envoy does not require special processing. In other words, for ztunnel, we only need to deal with ingress traffic, so the rules in ztunnel seem relatively simple.

05

to be continued

Above we mainly analyzed the solution for pod traffic interception in ambient mode, but not yet involved in the processing of 70% of traffic, and the specific principle of ztunnel implementation. Later, we will analyze the detailed processing path of traffic in ztunnel and waypoint proxy.


 author of this article 

Kebe Liu

"DaoCloud Daoke" service grid technical expert, member of the Istio community steering committee, contributor to Istio and Kubernetes code, initiator of the Merbridge open source project

Guess you like

Origin blog.csdn.net/DaoCloud_daoke/article/details/127498148