The working mode of Kubernetes Service is changed from iptables to ipvs

Using IPVS has certain performance advantages in large-scale clusters, but it also requires certain conditions to support it. If it is not met, it will automatically downgrade to use iptables mode. The current environment is that the kernel version is 3.10.0 at the beginning, and the system is CentOS. 7.6, the kernel is upgraded to version 4.1 to fully support it.

1. Overview of SVC

1.1 Overview of Proxy Mode

In a Kubernetes cluster, each Node runs a kube-proxy process. kube-proxy is responsible for implementing a form of VIP (virtual IP) for Service instead of ExternalName.

In Kubernetes v1.0, the proxy is entirely in userspace. In the Kubernetes v1.1 version, the iptables proxy is new, but it is not the default mode of operation. Since Kubernetes v1.2, the default is the iptables proxy. In Kubernetes v1.8.0-beta.0, the ipvs proxy was added. Starting with Kubernetes version 1.14, the ipvs proxy is used by default.

Kubernetes Service Discovery - Proxy Mode

1.2 Advantages of IPVS

Why choose IPVS for Kubernetes?

As the use of Kubernetes grows, the scalability of its resources becomes increasingly important. In particular, the scalability of services is critical to the adoption of Kubernetes by developers/companies running large workloads.

Kube-proxy is the building block of service routing, which relies on hardened iptables to support core service types such as ClusterIP and NodePort. However, iptables has difficulty scaling to thousands of services because it is designed purely for firewalling and is based on kernel rule lists.

Although Kubernetes already supports 5000 nodes in version v1.6, kube-proxy using iptables is actually the bottleneck for scaling the cluster to 5000 nodes. An example is, using NodePort services in a 5000 node cluster, if we have 2000 services and each service has 10 pods, this will generate at least 20000 iptable records on each worker node, which can make the kernel very busy.

On the other hand, using IPVS-based intra-cluster service load balancing can help a lot in this case. IPVS is designed for load balancing and uses a more efficient data structure (hash table) that allows for almost unlimited scaling.

type Application introduction
rr round robin
lc Minimum number of connections
dh target hash
sh source hash
sed minimum expected delay
nq no queue scheduling

2. Upgrade the kernel

If the kernel version is already greater than 4.1, this step can be ignored, because this kernel version can perfectly support IPVS. Taking my current kernel version as an example, if the Service mode is changed from iptables to ipvs, the following problems may occur:

  • The back-end node pointed to by ipvs is wrong, and it is correct if the svc is deleted and rebuilt again
  • When the number of replicas changes, ipvs cannot immediately perceive
  • If you directly use the domain name to access svc, it will fail to resolve
# 开始环境的系统信息
# uname -r 
3.10.0-1160.41.1.el7.x86_64
# cat /etc/redhat-release 
CentOS Linux release 7.9.2009 (Core)
# 在 3.10 内核版本中改用ipvs,kube-proxy 会报如下错误

# kubectl -n kube-system logs kube-proxy-6pp2d
E0415 09:46:55.397466       1 proxier.go:1192] Failed to sync endpoint for service: 172.16.0.82:30080/TCP, err: parseIP Error ip=[172 16 100 7 0 0 0 0 0 0 0 0 0 0 0 0]
E0415 09:46:55.398639       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[172 16 100 7 0 0 0 0 0 0 0 0 0 0 0 0]
E0415 09:46:55.398658       1 proxier.go:1533] Failed to sync endpoint for service: 10.10.1.35:30080/TCP, err: parseIP Error ip=[172 16 100 7 0 0 0 0 0 0 0 0 0 0 0 0]
E0415 09:46:55.398751       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[172 16 100 7 0 0 0 0 0 0 0 0 0 0 0 0]
...

2.1 Kernel package download

1) Download from Alibaba Cloud

Alibaba Cloud Mirror Warehouse
Alibaba Cloud Mirror Warehouse
kernel-4.9

2.2 Kernel package installation

# 更新内核时,对系统的cpu的负载很高,因此建议一台一台更新

# yum install kernel-4.9.215-36.el7.x86_64.rpm -y
# init 6 

# uname -r
4.9.215-36.el7.x86_64

3. Change mode

3.1 Open ipvs

Check whether the kernel has loaded ipvs, if not, load and install it separately.

# lsmod | grep -e ip_vs
ip_vs_sh               16384  0 
ip_vs_wrr              16384  0 
ip_vs_rr               16384  9 
ip_vs                 147456  15 ip_vs_wrr,ip_vs_rr,ip_vs_sh
nf_conntrack          106496  8 ip_vs,nf_conntrack_proto_sctp,nf_conntrack_ipv4,nf_conntrack_netlink,nf_nat_masquerade_ipv4,xt_conntrack,nf_nat_ipv4,nf_nat
libcrc32c              16384  2 ip_vs,xfs

# lsmod |grep nf_conntrack_ipv4
nf_conntrack_ipv4      16384  8 
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
nf_conntrack          106496  8 ip_vs,nf_conntrack_proto_sctp,nf_conntrack_ipv4,nf_conntrack_netlink,nf_nat_masquerade_ipv4,xt_conntrack,nf_nat_ipv4,nf_na

# cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF

# chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

# yum -y install ipset ipvsadm

3.2 svc mode modification

# kubectl edit configmap kube-proxy -n kube-system
...
    kind: KubeProxyConfiguration
    metricsBindAddress: ""
    mode: "ipvs"                        # "" 改成 ipvs
    nodePortAddresses: null
    oomScoreAdj: null
...

configmap/kube-proxy edited

--
# 删除现有的 kube-proxy pod,相当于重启后才会变更成 ipvs
# kubectl get pods -n kube-system|grep kube-proxy
kube-proxy-6pp2d                          1/1     Running   1          8h
kube-proxy-mcfp6                          1/1     Running   1          8h
kube-proxy-qmdhp                          1/1     Running   1          8h

# kubectl delete pod/kube-proxy-6pp2d -n kube-system
pod "kube-proxy-6pp2d" deleted
# kubectl delete pod/kube-proxy-mcfp6-n kube-system
pod "kube-proxy-mcfp6" deleted
# kubectl delete pod/kube-proxy-qmdhp -n kube-system
pod "kube-proxy-qmdhp" deleted

3.3 View verification

# kubectl -n kube-system logs kube-proxy-qmdhp|grep ipvs
I0415 10:48:49.984477       1 server_others.go:259] Using ipvs Proxier.

# 在 kube-proxy pod 看到如上信息后说明是启用ipvs的
# 测试使用的是 deployment 2个nginx, 一个 svc 指向 nginx pod

# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  127.0.0.1:30080 rr
  -> 172.16.100.8:80              Masq    1      0          0         
  -> 172.16.100.197:80            Masq    1      0          0         
TCP  172.16.0.1:443 rr
  -> 10.10.1.35:6443              Masq    1      4          0         
TCP  172.16.0.10:53 rr
  -> 172.16.100.9:53              Masq    1      0          0         
  -> 172.16.100.10:53             Masq    1      0          0         
TCP  172.16.0.10:9153 rr
  -> 172.16.100.9:9153            Masq    1      0          0         
  -> 172.16.100.10:9153           Masq    1      0          0               
...

Note : ipvs is not yet stable, please use it with caution; and the --masquerade-all option is not compatible with Calico security policy control, please consider using it as appropriate (Calico requires that this option cannot be enabled when doing network policy restrictions)

Reference:

https://blog.csdn.net/weixin_43936969/article/details/106175580
https://blog.csdn.net/qq_25854057/article/details/122469765
https://kubernetes.io/zh/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive/

Guess you like

Origin blog.csdn.net/qq_25854057/article/details/124201240