Thoughts triggered by a "quarrel" caused by a network failure

Authors: Zheng Mingquan, Yu Kai

Why are you arguing, what are you arguing about?

"What are you talking about? What does it have to do with the local network card that my K8s ecs node cannot access the address of clb..." An angry tone came from the other end of the phone, and both ends of the phone were silent at this time. After a while, the sweet announcement from the subway lady interrupted the silence just now, "You must wear a mask all the way on the subway, and the next stop is West Lake Cultural Square...".

The pod needs to access the 443 monitor of clb, but if it is accessed in the cluster (the latter refers to the K8s node or POD in the cluster), the following error will appear Connection refused:

insert image description here

So I took a look at the customer link as follows:

insert image description here

what is the specific phenomenon

It is not possible to access 192.168.1.200:443 in the node or pod, but it is normal to access 192.168.1.200:80. At the same time, it is normal for ECS 192.168.3.100 outside the cluster to access 192.168.1.200:443 and 192.168.1.200:80.

Further analysis to see

The IP192.168.1.200 of CLB1 is bound to the kube-ipvs0 network card of the node node of K8s. This is a dummy network card, refer to dummy interface. Since SVC1 is of LoadBalancer type, and this CLB1 is reused at the same time, the associated endpoint is POD1192.168.1.101:80, so it can explain why accessing 192.168.1.200:80 is normal, because kube-proxy creates ipvs according to the configuration of SVC1 The rule also mounts accessible backend services. However, accessing 192.168.1.200:443 in the cluster is unavailable, because after the IP is bound to the dummy network card, it will not go out of the node to access CLB1, and there is no corresponding ipvs rule for 443, so it is directly rejected.

At this time, if there is no ipvs rule in the node (ipvs takes precedence over monitoring) but can be accessed, you can check whether there is a local service that monitors 0.0.0.0:443, then all network card IP+443 can communicate at this time, but access It is a local service, not a real CLB backend service.

insert image description here

is there a way to solve it

most recommended way

The best way to split is to use two CLBs for services inside and outside the cluster.

Ali cloud svc annotation method

SVC1 uses this annotation service.beta.kubernetes.io/alibaba-cloud-loadbalancer-hostname to occupy a seat, so that the IP of CLB will not be bound to the network card of kube-ipvs0, and the IP of accessing CLB in the cluster will be displayed. The cluster accesses the CLB, but it should be noted that if the monitoring protocol is TCP or UDP, there will be a loopback access problem when accessing the CLB IP in the cluster. For details, see Client Cannot Access Load Balancer CLB [ 1] .

This annotation is supported only when the CCM version is v2.3.0 or above. For details, refer to: Configuring traditional load balancing CLB through Annotation [ 2 ]

insert image description here

demo:

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-hostname: "${your_service_hostname}"
  name: nginx-svc
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer

Does the access to ExternalTrafficPolicy in the cluster affect it?

We all know that the nodeport and loadbalancer modes of K8s can adjust the external traffic policy, so how to explain the "external policy is Local/Cluster, all cluster nodes create IPVS rules are different" in the figure, and access nodePort in the cluster What happens when /CLBIP.

insert image description here

The following are all cases where the internalTrafficPolicy of svc is Cluster or default. This ServiceInternalTrafficPolicy feature is enabled by default in K8s 1.22. For details, refer to service-traffic-policy [ 3 ]

For details on the data links of Alibaba Cloud containers under different network CNI conditions, you can refer to the following articles:

Here we only discuss the behavior change of ipvs TrafficPolicy Local in Kubernetes from 1.22 to 1.24.

Changes in Kubernetes 1.24 IPVS

The following take the IPVS mode of kube-proxy as an example:

  • When the externalTrafficPolicy is Cluster mode or the default, the nodePort/CLBIP backend in the ipvs rule will mount all Endpoint IPs. At this time, the access in the cluster will lose the source IP, because the node will do a layer of SNAT.
  • When externalTrafficPolicy is Local
    • When there is an Endpoint corresponding to the service on the node, the nodePort/CLBIP backend in the ipvs rule only mounts the IP of the Endpoint of its own node, and the access within the cluster will retain the source IP.
    • When there is no Endpoint corresponding to the service on the node
    • Versions prior to 1.24 will have an empty backend, and access within the cluster will be denied.
    • In the K8s cluster after 1.24, when there is no Endpoint corresponding to the service on the node, the nodePort/CLB IP backend in the ipvs rule will mount all Endpoint IPs. At this time, the access in the cluster will lose the source IP, because the node Will do a layer of SNAT. The community has adjusted the rule mounting strategy of the Local policy backend service. For details, refer to the community PR [ 4] .

https://github.com/kubernetes/kubernetes/pull/97081/commits/61085a75899a820b5eebfa71801e17423c1ca4da

Access SLB outside the cluster

If you access SLB from outside the cluster, CCM will only mount Local type nodes. The situation is the same as before 1.24 kubernetes. I won’t elaborate here. Please refer to the link above.

Access NodePort outside the cluster

Versions prior to 1.24 Kubernetes

  • Access the NodePort of the node with Endpoint, you can communicate, you can keep the source IP

Nginx is distributed on cn-hongkong.10.0.4.174 and cn-hongkong.10.0.2.84 nodes.

insert image description here

From the external 10.0.3.72 node, you can access port 30479 of cn-hongkong.10.0.2.84 on the node where the backend pod is located.

insert image description here

There are related IPVS rules on the cn-hongkong.10.0.0.140 node, but only the backend Pod IP on this node.

insert image description here

It can be reached through the conntrack table. This is because on the cn-hongkong.10.0.0.140 node, the relevant link is dnat, and finally it is nginx-7d6877d777-tzbf7 10.0.2.87 on the pod cn-hongkong.10.0.2.84 node Returning to the source, all relevant conversions are on this node, so the TCP layer 4 connection can be established successfully.

insert image description here

  • Accessing the NodePort of a node without an Endpoint fails, because there is no relevant ipvs forwarding rule on the node

Access from the external 10.0.3.72 node to port 30479 of cn-hongkong.10.0.0.140 of the node where the backend pod is located cannot be accessed.

insert image description here

Looking at the cn-hongkong.10.0.0.140 node, there is no relevant ipvs forwarding rule, so dnat cannot be performed, and the access will fail.

insert image description here

After 1.24 Kubernetes version (inclusive)

Access the NodePort with the Endpoint node, you can communicate, you can keep the source IP

Access NodePort without Endpoint node:

  • terway ENIIP or host network: unreachable

Nginx is distributed on cn-hongkong.10.0.2.77 and cn-hongkong.10.0.0.171 nodes.

insert image description here

From the external 10.0.3.72 node, access port 30745 of cn-hongkong.10.0.5.168 of the node where no backend pod is located. You can see that the access fails.

insert image description here

There are related IPVS rules on the cn-hongkong.10.0.5.168 node, and all backend Pod IPs will be added to the IPVS rules.

insert image description here

It can be obtained through the conntrack table. This is because on the cn-hongkong.10.0.5.168 node, the relevant link is dnat, and finally returned by nginx-79fc6bc6d-8vctc 10.0.2.78 on the pod cn-hongkong.10.0.2.77 node Source, after the source accepts this link, it will find that it does not match its own quintuple and discard it directly. The three-way handshake must fail, so the connection establishment fails.

insert image description here

  • flannel network: it can be connected, but the source IP cannot be reserved

Nginx is distributed in cn-hongkong.10.0.2.86.

insert image description here

Access port 31218 of cn-hongkong.10.0.4.176 from the outside, and the access can be successful.

insert image description here

cn-hongkong.10.0.4.176 recorded the src as 10.0.3.72, and made dnat as 172.16.160.135, expecting it to return to port 58825 of 10.0.4.176.

insert image description here

The node where the backend ep is located is cn-hongkong.10.0.2.86, and the conntrack table records that the src is 10.0.4.176 and the sport is 58825. So you can see that the source IP recorded by the application pod is 10.0.4.176, and the source IP is lost.

insert image description here

Access SLB or NodePort in the cluster

Versions prior to 1.24 Kubernetes

  • It can be accessed on the node with Endpoint, and the source IP can be reserved

Nginx is distributed on ap-southeast-1.192.168.100.209 and ap-southeast-1.192.168.100.208 nodes, and ap-southeast-1.192.168.100.210 node does not have Nginx pod.

insert image description here

Access the NodePort 31565 port of ap-southeast-1.192.168.100.209 of the node where the backend pod is located from any node in the cluster (in this example, node 209), and you can access it.

insert image description here

Access port 80 of SLB 8.222.252.252 from the node ap-southeast-1.192.168.100.209 where the backend pod is located, and it can be accessed.

insert image description here

The ap-southeast-1.192.168.100.209 node has IPVS rules for NodePort and SLB, but only the backend Pod IP on this node.

insert image description here

It can be reached through the conntrack table. This is because on the ap-southeast-1.192.168.100.209 node, the related link is dnat, and finally the pod is nginx-7d6877d777 on the ap-southeast-1.192.168.100.209 node -2wh4s 192.168.100.222 returns to the source, and all relevant transformations are on this node, so the TCP layer 4 connection can be established successfully.

insert image description here

  • It cannot be accessed on a node without an Endpoint, because there is no relevant ipvs forwarding rule on the node

Access from any node in the cluster (node ​​210 in this example) to access the NodePort 31565 port or SLB of ap-southeast-1.192.168.100.210 of the node where the backend pod is not located cannot be accessed.

It is further confirmed that the access to the SLB associated with the svc in the cluster does not go out of the node. Even if the SLB has other listening ports, access to other ports of the SLB will be rejected.

insert image description here

View the ap-southeast-1.192.168.100.210 node, there is no relevant ipvs forwarding rule, so dnat cannot be performed, and access will fail.

insert image description here

After 1.24 Kubernetes version (inclusive)

  • There is access on the Endpoint node, it can be communicated, and the source IP can be reserved

It is the same as the previous 1.24 Kubernetes cluster access above, and you can refer to the above description.

  • There is no access on the Endpoint node:

Nginx is distributed on cn-hongkong.10.0.2.77 and cn-hongkong.10.0.0.171 nodes, so test on cn-hongkong.10.0.4.141 node without Nginx.

insert image description here

There are the following situations:

  • terway or backend is hostNetwork
    • Node access through NodePort (the source IP is ECS IP, no SNAT is required), the source IP cannot be reserved

It can be seen that all backend PODs of Nginx added to the IPVS rule of NodePort 110.0.4.141:30745 of the node without Endpoint are nginx-79fc6bc6d-8vctc 10.0.2.78 and nginx-79fc6bc6d-j587w 10.0.0.172.

insert image description here

The node in the cluster itself can access the NodePort 30745/TCP port of cn-hongkong.10.0.4.141 of the node where the backend pod is not located.

insert image description here

It can be found through the conntrack table that on the cn-hongkong.10.0.4.141 node, the relevant links are dnat, and finally the backing Nginx pod nginx-79fc6bc6d-8vctc 10.0.2.78 returns to the source.

insert image description here

However, the conntrack table on the node cn-hongkong.10.0.2.77 where nginx-79fc6bc6d-8vctc 10.0.2.78 is located records that 10.04.141 accesses 10.0.2.78, and expects 10.0.2.78 to directly return port 39530 of 10.0.4.141.

insert image description here

There is an endpoint node in the cluster to access the NodePort 32292 port of ap-southeast-1.192.168.100.131 of the node where the backend pod is not located. It cannot be accessed. It is consistent with the access outside the cluster after (including) the 1.24 Kubernetes version above. You can refer to the above describe.

    • The node cannot access the SLB IP (the source IP is the SLB IP, no one does SNAT)

You can see that all backend PODs of Nginx added to the IPVS rule of the SLB IP of the node without Endpoint are nginx-79fc6bc6d-8vctc 10.0.2.78 and nginx-79fc6bc6d-j587w 10.0.0.172.

insert image description here

When accessing SLB 47.243.247.219 on a node without Endpoint, the access does time out.

insert image description here

Through the conntrack table, you can access the IP of the SLB on a node without ep, and you can see that the backend pod is expected to return to the SLB IP. The SLB IP has been virtual occupied by kube-ipvs on the node, so there is no snat, resulting in inaccessibility.

insert image description here

  • flannel and the backend is a common pod, which can be accessed, but the source IP cannot be reserved

Nginx is distributed in cn-hongkong.10.0.2.86.

insert image description here

Accessing SLB 47.242.86.39 from cn-hongkong.10.0.4.176 is successful.

insert image description here

From the conntrack table of the cn-hongkong.10.0.4.176 node, it can be seen that both src and dst are 47.242.86.39, but what is expected is that nginx pod172.16.160.135 returns to port 54988 of 10.0.4.176, and 47.242.86.39 snat becomes 10.0. 4.176.

insert image description here

The node where the backend ep is located is cn-hongkong.10.0.2.86, and the conntrack table records that the src is 10.0.4.176 and the sport is 54988. So you can see that the source IP recorded by the application pod is 10.0.4.176, and the source IP is lost.

insert image description here

Related Links:

[1] The client cannot access the load balancing CLB

https://help.aliyun.com/document_detail/55206.htm

[2] Configure traditional load balancing CLB through Annotation

https://www.yuque.com/r/goto?url=https%3A%2F%2Fhelp.aliyun.com%2Fzh%2Fack%2Fack-managed-and-ack-dedicated%2Fuser-guide%2Fadd-annotations-to-the-yaml-file-of-a-service-to-configure-clb-instances

[3] service-traffic-policy

https://kubernetes.io/zh-cn/docs/concepts/services-networking/service-traffic-policy/

[4] Community PR

https://github.com/kubernetes/kubernetes/pull/97081/commits/61085a75899a820b5eebfa71801e17423c1ca4da

Guess you like

Origin blog.csdn.net/alisystemsoftware/article/details/132338038