Talking about the implementation mechanism of Kubernetes Service load balancing

 Wang Xigang  360 Cloud Computing

Heroine declaration

Kubernetes Serivce is an abstraction with the same label Pod collection (it can be simply understood as the LB in the cluster). Services inside and outside the cluster can communicate with each other through the Service. However, there are many types of Service. What kind of scenario each type of Service is suitable for and how kube-proxy implements Service load balancing will be the focus of this article.

PS: rich first-line technology, a wide range of forms, all in " 3 60 cloud computing " point of concern Oh!

1

The working principle of Service and kube-proxy in kubernetes cluster

Before introducing Service and kube-proxy, let's talk about their role in a Kubernetes cluster.

image

Let us analyze the above picture:

1. The kube-proxy running on each Node node will watch Services and Endpoints objects in real time.

When the user creates a Service with a label in the kubernetes cluster, an Endpoints object with the same name will be created in the cluster at the same time to store the Pod IP under the Service. Their relationship is shown in the following figure:

image

2. After each kube-proxy running on the Node node perceives the changes in Services and Endpoints, it will set the relevant iptables or IPVS rules on the respective Node node for later users to access the services under the Service through the ClusterIP of the Service .

3. After kube-proxy has set the required rules, users can route and forward the rules set by iptables or IPVS through ClusterIP on the Node or client Pod in the cluster, and finally send the client request to the real one. Backend Pod.

I will talk about how kube-proxy sets up Iptables and IPVS policies later. Next, let's first introduce the usage scenarios of each different type of Service.

2

Service type

The current Kubernetes Service supports the following types, and while introducing the types, you can understand the specific usage scenarios of each type of Service.

ClusterIP

The Service of the ClusterIP type is the default Service of the Kubernetes cluster, which can only be used for intra-cluster communication. Cannot be used for external communication.

The structure of the ClusterIP Service type is shown in the figure below:


image.png

NodePort

If you want to access services inside the cluster outside the cluster, you can use this type of Service. A Service of the NodePort type will open a specified port on all Node nodes in the cluster. After all traffic is sent directly to this port, the forwarded Service will access the real service.

The structure of the NodePort Service type is shown in the figure below:


image.png

LoadBalancer

The LoadBalancer type of Service is usually used in combination with the LB of the cloud vendor to expose the services inside the cluster to the external network. The LoadBalancer of the cloud vendor will assign an IP to the user, and then the traffic through the IP will be forwarded to your Service.

The structure of the LoadBalancer Service type is shown in the following figure:image.png

Ingress

Ingress is not actually a type of Service, but it can act on multiple Services and serve as the entrance to the internal services of the cluster.
Ingress can do many different things, such as forwarding requests to different services based on different routes.

The structure of Ingress is shown in the figure below:image.png

3

Service discovery

Service currently supports two types of service discovery mechanisms, one is through environment variables, and the other is through DNS. In these two scenarios, the latter is recommended.

Environment variable

When a Pod is created, kubelet will register all Service-related environment variables that the cluster has created in the Pod, but it should be noted that all PODs before the service creation will not register the Service environment variables. Therefore, in normal use, it is recommended to perform service discovery between services through DNS.

DNS

The CoreDNS service can be deployed in the cluster (the old version of the kubernetes cluster uses kubeDNS), so that the Pod inside the cluster communicates between various services within the cluster through DNS.

The current kubernetes cluster uses CoreDNS as the default DNS service by default. The main reason is that CoreDNS is simple and flexible for extension based on Plugin. And it is not completely bound by Kubernetes.

4

Service load balancing

At the beginning of this article, I have introduced how service and kube-proxy cooperate in a cluster to achieve service load balancing. kube-proxy plays a key role in it. As a controller, kube-proxy acts as a hub for the interaction between k8s and Linux kernel Netfilter. Monitor the changes of the kubernetes cluster Services and Endpoints objects, and set different rules for the kernel according to the different modes of kube-proxy (iptables or ipvs) to implement routing and forwarding. Next, we will introduce the working mechanism of kube-proxy based on the two modes of Iptables and IPVS to achieve Service load balancing.

Iptables for load balancing

Iptables is a user mode program that builds a Linux kernel firewall by configuring Netfilter rules. Netfilter is the network packet management framework of the Linux kernel. It provides a complete set of hook function management mechanisms, which makes it possible for packet filtering, network address translation (NAT) and connection tracking based on protocol types. The position of Netfilter in the kernel As shown below.


image.png

Next, I will introduce how kube-proxy uses Iptables for load balancing. The matching process of data packets in Iptables is shown in the following figure:


image.png

In Iptables mode, kube-proxy creates a series of custom chains in the PREROUTIN and POSTROUTING chains of the NAT table in the Iptables on the target node (these custom chains are mainly "KUBE-SERVICE" chains, "KUBE- POSTROUTING" chain, the "KUBE-SVC-XXXXXX" chain and "KUBE-SEP-XXXX" chain corresponding to each service), and then perform DNAT and SNAT operations on the data packets flowing to the Node through these custom chains to achieve routing , Load balancing and address conversion, as shown in the following figure:


image.png

In kube-proxy, the specific matching process of the client's request packet in the Iptables rule is:

1. PREROUTING chain or OUTPUT chain (Pod in the cluster passes through the OUTPUT chain when accessing the Service through clusterIP, and when the host outside the cluster accesses the Service through the NodePort method, through the PREROUTING chain, both chains will jump to the KUBE-SERVICE chain)image.png

image.png


2. KUBE-SERVICES chain (every port exposed by each Service will correspond to a corresponding rule in the KUBE-SERVICES chain. When the number of services reaches a certain scale, the data of the rules in the KUBE-SERVICES chain will be very Is large, and Iptables is a linear search when searching and matching, which will take a long time, and the time complexity is O(n))image.png

3. The KUBE-SVC-XXXXX chain (in the KUBE-SVC-XXXXX chain (the following series of hash values ​​are generated by the virtual IP of the Service), it will match one of the following rules with a certain probability to execute, through the statistic module for each The back-end sets the weight to achieve the purpose of load balancing. Each KUBE-SEP-XXXXX chain represents a specific Pod behind the Service (the hash value behind is generated by the actual IP of the back-end Pod), thus achieving load balancing purpose)image.png

4. KUBE-SEP-XXXX chain (through DNAT, modify the destination IP of the data packet to the Pod IP of the server) image.png5.POSTROUTING chain

image.png

6.KUBE_POSTROUTING chain (do SNAT for marked packets)

Through the above setting, load balancing based on Iptables is realized. However, Iptbles has some problems with load balancing:

  • Rule linear matching delay: The
    KUBE-SERVICES chain hangs a long KUBE-SVC-* chain. To access each service, it is necessary to traverse each chain until it matches. The time complexity is O(N)

  • Rule update delay:
    non-incremental, you need to copy the Iptables state by iptables-save first, then update some rules, and finally write to the kernel through iptables-restore. When the number of rules reaches a certain level, the process becomes very slow.

  • Scalability:
    When there are a large number of Iptables rule chains in the system, kernel lock will appear when adding/deleting rules, and you can only wait.

  • Availability: When the service is expanded/reduced, the refresh of Iptables rules will cause the connection to be disconnected and the service is unavailable.

In order to solve these problems currently existing in Iptables, students from the Huawei open source team contributed the IPVS model to the community. Next, I will introduce how IPVS achieves load balancing.

IPVS achieves load balancing

IPVS is a part of the LVS project. It is a 4-layer load balancer running in the Linux kernel with exceptional performance. Using the tuned kernel, it can easily handle more than 100,000 forwarding requests per second.

IPVS has the following characteristics:

  • Implementation of Load Balancer and LVS Load Balancer at the transport layer.

  • It is also based on Netfilter as Iptables, but uses hash tables.

  • Support TCP, UDP, SCTP protocol, support IPV4, IPV6.

  • Support multiple load balancing strategies:

    • rr: round-robin

    • lc: least connection

    • dh: destination hashing

    • sh: source hashing

    • sed: shortest expected delay

    • nq: never queue

  • Support session retention

The working principle of LVS is shown in the figure below:


image.png

1. When the client's request reaches the kernel space of the load balancer, it will first reach the PREROUTING chain.
2. When the kernel finds that the destination address of the requested data packet is the local machine, it sends the data packet to the INPUT chain.
3. When the data packet reaches the INPUT chain, it will be checked by IPVS first. If the destination address and port in the data packet are not in the IPVS rules, the data packet will be released to the user space.
4. If the destination address and port in the data packet are in the IPVS rules, then the destination address of this data packet will be modified to the back-end server (DNAT) selected by the balancing algorithm and sent to the POSROUTING chain.
5. Finally, it is sent to the back-end server via the POSTROUTING chain.

LVS mainly has three working modes, namely NAT, DR, and Tunnel mode. In kube-proxy, IPVS works in NAT mode, so the following mainly introduces NAT mode:

Or analyze the picture above:

1. The client sends the request to the front-end load balancer, the source address of the request message is CIP (client IP), and the destination address is VIP (front-end address of the load balancer)

2. After receiving the message, the load balancer finds that the request is an address that exists in the rule, then it changes the destination address of the requested message to the RIP address of the backend server, and balances the message according to the response. Strategy sent out

3. After the message is sent to Real Server, since the destination address of the message is itself, all will respond to the request and return the response message to LVS

4. Then LVS changes the source address of this message to the IP address of the machine and sends it to the client


After introducing the basic working principle, let's take a look at how to use IPVS mode for load balancing in kube-proxy. First, you need to specify the following parameters in the parameters to start kube-proxy:

--proxy-mode=ipvs //Set the mode of kube-proxy to IPVS--ipvs-scheduler=rr //Set the load balancing algorithm of ipvs, the default is rr--ipvs-min-sync-period=5s // The minimum time interval for refreshing IPVS rules--ipvs-sync-period=30s // The maximum time interval for refreshing IPVS rules

After setting these parameters, restart the kube-proxy service. When creating a Service of type ClusterIP, kube-proxy in IPVS mode will do the following things:

  • Create a virtual network card, the default is kube-ipvs0

  • Bind the service IP address to the virtual network card kube-ipvs0

image.png

  • Create an IPVS virtual server for each Service IP address

image.png

At the same time, IPVS also supports the session retention function. By specifying the service.spec.sessionAffinity parameter as ClusterIP, the default is None and the service.spec.sessionAffinityConfig.clientIP.timeoutSeconds parameter as the required time when creating the Srevice object, the default is 10800s.

The following is a specific example of creating a Service and specifying session retention:

kind: ServiceapiVersion: v1metadata:  name: nginx-servicespec:  type: ClusterIP  selector:    app: nginx  sessionAffinity: ClientIP  sessionAffinityConfig:    clientIP:      timeoutSeconds: 50  ports:  -name: http   protocol: TCP   port: 80   targetPort: 80

Then you can use ipvsadm -L to check whether the session persistence function is set successfully. In this way, kube-proxy can achieve load balancing through the IPVS mode.image.png

5

to sum up

kube-proxy is using iptables and ipvs to achieve load balancing of the Service, but through the implementation of iptables, due to the characteristics of Iptables itself, the new rules are added, and the update rules are non-incremental. You need to iptables-save first and then update in memory. Rules, modify the rules in the kernel, in iptables-restore, and Iptables is a linear search during rule search and matching, which will take a long time and the time complexity is O(n). With the implementation of IPVS, the time complexity of the connection process is O(1). Basically, the efficiency of the connection has nothing to do with the number of cluster services. Therefore, with the continuous increase of services within the cluster, the performance advantages of IPVS are reflected.

related articles

  • https://zhuanlan.zhihu.com/p/37230013

  • https://zhuanlan.zhihu.com/p/39909011

  • https://www.projectcalico.org/comparing-kube-proxy-modes-iptables-or-ipvs/

  • https://medium.com/google-cloud/kubernetes-nodeport-vs-loadbalancer-vs-ingress-when-should-i-use-what-922f010849e0

  • https://wiki.archlinux.org/index.php/Iptables_(%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87)

  • https://kubernetes.io/docs/concepts/services-networking/service/

  • http://blog.chinaunix.net/uid-23069658-id-3160506.html

  • https://tonydeng.github.io/sdn-handbook/linux/loadbalance.html

  • https://www.josedomingo.org/pledin/2018/11/recursos-de-kubernetes-services/


Guess you like

Origin blog.51cto.com/15127564/2666652