Authors: Zheng Mingquan, Yu Kai
Why are you arguing, what are you arguing about?
"What are you talking about? What does it have to do with the local network card that my K8s ecs node cannot access the address of clb..." An angry tone came from the other end of the phone, and both ends of the phone were silent at this time. After a while, the sweet announcement from the subway lady interrupted the silence just now, "You must wear a mask all the way on the subway, and the next stop is West Lake Cultural Square...".
The pod needs to access the 443 monitor of clb, but if it is accessed in the cluster (the latter refers to the K8s node or POD in the cluster), the following error will appear Connection refused:
So I took a look at the customer link as follows:
what is the specific phenomenon
It is not possible to access 192.168.1.200:443 in the node or pod, but it is normal to access 192.168.1.200:80. At the same time, it is normal for ECS 192.168.3.100 outside the cluster to access 192.168.1.200:443 and 192.168.1.200:80.
Further analysis to see
The IP192.168.1.200 of CLB1 is bound to the kube-ipvs0 network card of the node node of K8s. This is a dummy network card, refer to dummy interface. Since SVC1 is of LoadBalancer type, and this CLB1 is reused at the same time, the associated endpoint is POD1192.168.1.101:80, so it can explain why accessing 192.168.1.200:80 is normal, because kube-proxy creates ipvs according to the configuration of SVC1 The rule also mounts accessible backend services. However, accessing 192.168.1.200:443 in the cluster is unavailable, because after the IP is bound to the dummy network card, it will not go out of the node to access CLB1, and there is no corresponding ipvs rule for 443, so it is directly rejected.
At this time, if there is no ipvs rule in the node (ipvs takes precedence over monitoring) but can be accessed, you can check whether there is a local service that monitors 0.0.0.0:443, then all network card IP+443 can communicate at this time, but access It is a local service, not a real CLB backend service.
is there a way to solve it
most recommended way
The best way to split is to use two CLBs for services inside and outside the cluster.
Ali cloud svc annotation method
SVC1 uses this annotation service.beta.kubernetes.io/alibaba-cloud-loadbalancer-hostname to occupy a seat, so that the IP of CLB will not be bound to the network card of kube-ipvs0, and the IP of accessing CLB in the cluster will be displayed. The cluster accesses the CLB, but it should be noted that if the monitoring protocol is TCP or UDP, there will be a loopback access problem when accessing the CLB IP in the cluster. For details, see Client Cannot Access Load Balancer CLB [ 1] .
This annotation is supported only when the CCM version is v2.3.0 or above. For details, refer to: Configuring traditional load balancing CLB through Annotation [ 2 ]
demo:
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-hostname: "${your_service_hostname}"
name: nginx-svc
namespace: default
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
type: LoadBalancer
Does the access to ExternalTrafficPolicy in the cluster affect it?
We all know that the nodeport and loadbalancer modes of K8s can adjust the external traffic policy, so how to explain the "external policy is Local/Cluster, all cluster nodes create IPVS rules are different" in the figure, and access nodePort in the cluster What happens when /CLBIP.
The following are all cases where the internalTrafficPolicy of svc is Cluster or default. This ServiceInternalTrafficPolicy feature is enabled by default in K8s 1.22. For details, refer to service-traffic-policy [ 3 ]
For details on the data links of Alibaba Cloud containers under different network CNI conditions, you can refer to the following articles:
- Panoramic Analysis of Alibaba Cloud Container Network Data Links (1) - Flannel
- Panoramic Analysis of Alibaba Cloud Container Network Data Links (2) - Terway ENI
- Panoramic Analysis of Alibaba Cloud Container Network Data Links (3) - Terway ENIIP
- Panoramic Analysis of Alibaba Cloud Container Network Data Links (4)——Terway IPVLAN+EBPF
- Panoramic Analysis of Alibaba Cloud Container Network Data Links (5) - Terway ENI-Trunking
- Panoramic Analysis of Alibaba Cloud Container Network Data Link (6)——ASM Istio
Here we only discuss the behavior change of ipvs TrafficPolicy Local in Kubernetes from 1.22 to 1.24.
Changes in Kubernetes 1.24 IPVS
The following take the IPVS mode of kube-proxy as an example:
- When the externalTrafficPolicy is Cluster mode or the default, the nodePort/CLBIP backend in the ipvs rule will mount all Endpoint IPs. At this time, the access in the cluster will lose the source IP, because the node will do a layer of SNAT.
- When externalTrafficPolicy is Local
-
- When there is an Endpoint corresponding to the service on the node, the nodePort/CLBIP backend in the ipvs rule only mounts the IP of the Endpoint of its own node, and the access within the cluster will retain the source IP.
- When there is no Endpoint corresponding to the service on the node
- Versions prior to 1.24 will have an empty backend, and access within the cluster will be denied.
- In the K8s cluster after 1.24, when there is no Endpoint corresponding to the service on the node, the nodePort/CLB IP backend in the ipvs rule will mount all Endpoint IPs. At this time, the access in the cluster will lose the source IP, because the node Will do a layer of SNAT. The community has adjusted the rule mounting strategy of the Local policy backend service. For details, refer to the community PR [ 4] .
https://github.com/kubernetes/kubernetes/pull/97081/commits/61085a75899a820b5eebfa71801e17423c1ca4da
Access SLB outside the cluster
If you access SLB from outside the cluster, CCM will only mount Local type nodes. The situation is the same as before 1.24 kubernetes. I won’t elaborate here. Please refer to the link above.
Access NodePort outside the cluster
Versions prior to 1.24 Kubernetes
- Access the NodePort of the node with Endpoint, you can communicate, you can keep the source IP
Nginx is distributed on cn-hongkong.10.0.4.174 and cn-hongkong.10.0.2.84 nodes.
From the external 10.0.3.72 node, you can access port 30479 of cn-hongkong.10.0.2.84 on the node where the backend pod is located.
There are related IPVS rules on the cn-hongkong.10.0.0.140 node, but only the backend Pod IP on this node.
It can be reached through the conntrack table. This is because on the cn-hongkong.10.0.0.140 node, the relevant link is dnat, and finally it is nginx-7d6877d777-tzbf7 10.0.2.87 on the pod cn-hongkong.10.0.2.84 node Returning to the source, all relevant conversions are on this node, so the TCP layer 4 connection can be established successfully.
- Accessing the NodePort of a node without an Endpoint fails, because there is no relevant ipvs forwarding rule on the node
Access from the external 10.0.3.72 node to port 30479 of cn-hongkong.10.0.0.140 of the node where the backend pod is located cannot be accessed.
Looking at the cn-hongkong.10.0.0.140 node, there is no relevant ipvs forwarding rule, so dnat cannot be performed, and the access will fail.
After 1.24 Kubernetes version (inclusive)
Access the NodePort with the Endpoint node, you can communicate, you can keep the source IP
Access NodePort without Endpoint node:
- terway ENIIP or host network: unreachable
Nginx is distributed on cn-hongkong.10.0.2.77 and cn-hongkong.10.0.0.171 nodes.
From the external 10.0.3.72 node, access port 30745 of cn-hongkong.10.0.5.168 of the node where no backend pod is located. You can see that the access fails.
There are related IPVS rules on the cn-hongkong.10.0.5.168 node, and all backend Pod IPs will be added to the IPVS rules.
It can be obtained through the conntrack table. This is because on the cn-hongkong.10.0.5.168 node, the relevant link is dnat, and finally returned by nginx-79fc6bc6d-8vctc 10.0.2.78 on the pod cn-hongkong.10.0.2.77 node Source, after the source accepts this link, it will find that it does not match its own quintuple and discard it directly. The three-way handshake must fail, so the connection establishment fails.
- flannel network: it can be connected, but the source IP cannot be reserved
Nginx is distributed in cn-hongkong.10.0.2.86.
Access port 31218 of cn-hongkong.10.0.4.176 from the outside, and the access can be successful.
cn-hongkong.10.0.4.176 recorded the src as 10.0.3.72, and made dnat as 172.16.160.135, expecting it to return to port 58825 of 10.0.4.176.
The node where the backend ep is located is cn-hongkong.10.0.2.86, and the conntrack table records that the src is 10.0.4.176 and the sport is 58825. So you can see that the source IP recorded by the application pod is 10.0.4.176, and the source IP is lost.
Access SLB or NodePort in the cluster
Versions prior to 1.24 Kubernetes
- It can be accessed on the node with Endpoint, and the source IP can be reserved
Nginx is distributed on ap-southeast-1.192.168.100.209 and ap-southeast-1.192.168.100.208 nodes, and ap-southeast-1.192.168.100.210 node does not have Nginx pod.
Access the NodePort 31565 port of ap-southeast-1.192.168.100.209 of the node where the backend pod is located from any node in the cluster (in this example, node 209), and you can access it.
Access port 80 of SLB 8.222.252.252 from the node ap-southeast-1.192.168.100.209 where the backend pod is located, and it can be accessed.
The ap-southeast-1.192.168.100.209 node has IPVS rules for NodePort and SLB, but only the backend Pod IP on this node.
It can be reached through the conntrack table. This is because on the ap-southeast-1.192.168.100.209 node, the related link is dnat, and finally the pod is nginx-7d6877d777 on the ap-southeast-1.192.168.100.209 node -2wh4s 192.168.100.222 returns to the source, and all relevant transformations are on this node, so the TCP layer 4 connection can be established successfully.
- It cannot be accessed on a node without an Endpoint, because there is no relevant ipvs forwarding rule on the node
Access from any node in the cluster (node 210 in this example) to access the NodePort 31565 port or SLB of ap-southeast-1.192.168.100.210 of the node where the backend pod is not located cannot be accessed.
It is further confirmed that the access to the SLB associated with the svc in the cluster does not go out of the node. Even if the SLB has other listening ports, access to other ports of the SLB will be rejected.
View the ap-southeast-1.192.168.100.210 node, there is no relevant ipvs forwarding rule, so dnat cannot be performed, and access will fail.
After 1.24 Kubernetes version (inclusive)
- There is access on the Endpoint node, it can be communicated, and the source IP can be reserved
It is the same as the previous 1.24 Kubernetes cluster access above, and you can refer to the above description.
- There is no access on the Endpoint node:
Nginx is distributed on cn-hongkong.10.0.2.77 and cn-hongkong.10.0.0.171 nodes, so test on cn-hongkong.10.0.4.141 node without Nginx.
There are the following situations:
- terway or backend is hostNetwork
-
- Node access through NodePort (the source IP is ECS IP, no SNAT is required), the source IP cannot be reserved
It can be seen that all backend PODs of Nginx added to the IPVS rule of NodePort 110.0.4.141:30745 of the node without Endpoint are nginx-79fc6bc6d-8vctc 10.0.2.78 and nginx-79fc6bc6d-j587w 10.0.0.172.
The node in the cluster itself can access the NodePort 30745/TCP port of cn-hongkong.10.0.4.141 of the node where the backend pod is not located.
It can be found through the conntrack table that on the cn-hongkong.10.0.4.141 node, the relevant links are dnat, and finally the backing Nginx pod nginx-79fc6bc6d-8vctc 10.0.2.78 returns to the source.
However, the conntrack table on the node cn-hongkong.10.0.2.77 where nginx-79fc6bc6d-8vctc 10.0.2.78 is located records that 10.04.141 accesses 10.0.2.78, and expects 10.0.2.78 to directly return port 39530 of 10.0.4.141.
There is an endpoint node in the cluster to access the NodePort 32292 port of ap-southeast-1.192.168.100.131 of the node where the backend pod is not located. It cannot be accessed. It is consistent with the access outside the cluster after (including) the 1.24 Kubernetes version above. You can refer to the above describe.
-
- The node cannot access the SLB IP (the source IP is the SLB IP, no one does SNAT)
You can see that all backend PODs of Nginx added to the IPVS rule of the SLB IP of the node without Endpoint are nginx-79fc6bc6d-8vctc 10.0.2.78 and nginx-79fc6bc6d-j587w 10.0.0.172.
When accessing SLB 47.243.247.219 on a node without Endpoint, the access does time out.
Through the conntrack table, you can access the IP of the SLB on a node without ep, and you can see that the backend pod is expected to return to the SLB IP. The SLB IP has been virtual occupied by kube-ipvs on the node, so there is no snat, resulting in inaccessibility.
- flannel and the backend is a common pod, which can be accessed, but the source IP cannot be reserved
Nginx is distributed in cn-hongkong.10.0.2.86.
Accessing SLB 47.242.86.39 from cn-hongkong.10.0.4.176 is successful.
From the conntrack table of the cn-hongkong.10.0.4.176 node, it can be seen that both src and dst are 47.242.86.39, but what is expected is that nginx pod172.16.160.135 returns to port 54988 of 10.0.4.176, and 47.242.86.39 snat becomes 10.0. 4.176.
The node where the backend ep is located is cn-hongkong.10.0.2.86, and the conntrack table records that the src is 10.0.4.176 and the sport is 54988. So you can see that the source IP recorded by the application pod is 10.0.4.176, and the source IP is lost.
Related Links:
[1] The client cannot access the load balancing CLB
https://help.aliyun.com/document_detail/55206.htm
[2] Configure traditional load balancing CLB through Annotation
[3] service-traffic-policy
https://kubernetes.io/zh-cn/docs/concepts/services-networking/service-traffic-policy/
[4] Community PR
https://github.com/kubernetes/kubernetes/pull/97081/commits/61085a75899a820b5eebfa71801e17423c1ca4da