Talking about the design of Kubernetes network scheme from the optimization practice of Meitu container

Talking about the design of Kubernetes network scheme from the optimization practice of Meitu container

Introduction: This article discusses the design of network solutions in the Kubernetes environment by introducing the practical experience of Meitu online containerization, including actual problems encountered online. It is worth learning and learning from the architects who are transforming K8S.

Li Lianrong, a senior system R&D engineer of Meitu, has established a long-term connection service supporting tens of millions. He has been building Meitu's containerized service from scratch, and led the completion of Meitu's containerized network solution. He has profound knowledge in network and storage.

Currently, our Kubernetes cluster chooses to use Calico as the basic network solution.

The Challenge of Choosing Calico Network Solution

Calico is a set of SDN based on routing (BGP), which implements cross-host communication of containers through routing and forwarding. Calico virtualizes each node as a "router" and allocates an independent virtual network segment to it. The router provides routing services for the containers on the current node.

For more information about the Calico project, please refer to https://www.projectcalico.org/. The following takes a specific network as an example to introduce the design and difficulties.
Talking about the design of Kubernetes network scheme from the optimization practice of Meitu container

Take the above figure as an example, if the virtual network segment allocated by the node 192.168.1.2 is 10.233.1.0/24, and a container 10.233.1.2 is running on it, the routing information is as follows:

10.233.1.2 0.0.0.0 255.255.255.255 UH 0 0 0 cali814214d5913

When the physical machine receives an IP packet with a destination address of 10.233.1.2, it will be forwarded to the network port cali814214d5913, and cali814214d5913 is a network card created through veth-pair, which communicates with the container with the IP address of 10.233.1.2 on the machine, so , The container with the IP address of 10.233.1.2 can receive the corresponding IP packet.

When the container 10.233.2.2 on the node 192.168.1.3 sends an IP packet to 10.233.1.2, it needs to know the IP of the physical node where 10.233.1.2 is located, and add the following routing rules:

10.233.1.0/16 192.168.1.2 eth0

Calico uses BGP to learn routing rules between nodes. The node 192.168.1.2 establishes a BGP neighbor with the node 192.168.1.3, and the node 192.168.1.3 can learn the above routing rules through BGP. When the container 10.233.2.2 sends an IP packet to 10.233.1.2, it will be forwarded to the node 192.168.1.2 according to the above routing rules, and then forwarded from the node 192.168.12 to 10.233.1.2, thereby realizing the container's cross-host communication.

However, when the nodes where the two containers are located are in different subnets, such as 10.233.3.2, the node 192.168.2.2 and the node 192.168.1.2 where they are located are in different subnets. At this time, the following routes cannot be added to 192.168.2.2:

10.233.1.0/16 192.168.1.2 eth0

This is because the link layer between the physical machine 192.168.2.2 and the physical machine 192.168.1.2 cannot communicate. To solve this problem, Calico chose IPIP. IPIP encapsulates the IP data packet of the virtual network into the IP data packet of the physical network for transmission. After IPIP is enabled, the corresponding virtual network card will appear on the node, usually tunl0, node 1 92.168.2.2 can add the following routing rules:

10.233.1.0/16 192.168.1.2 tunl0

The difference from the previous route is that the network port is changed to tunl0. When 10.233.3.2 sends an IP packet to 10.233.1.2, its node will forward the IP packet to the network port tunl0, and the IP packet forwarded to tunl0 will be taken over by the IPIP driver. The IPIP driver encapsulates each IP data packet into an IP data packet on the physical network (the destination address is the next hop address, that is, 192.168.1.2, and the payload is the original IP data packet sent by the virtual machine).

Since the destination address of the IP data packet is the node 192.168.1.2, it can be forwarded through the physical gateway. The IPIP service running on the node 192.168.1.2 receives the physical network IP data packet and takes out the payload, and then forwards it to the corresponding container according to the routing rules on the node 192.168.1.2, thus realizing the container's cross-subnet communication.

Problems with Calico network solution

Through the working principle of Calico, it can be seen that Calico has the following problems:

  • When using IPIP, you need to nest the IP protocol, and the extra packaging and unpacking actions will bring performance overhead.
  • When using IPIP, the nested IP protocol header causes the actual effective MTU length to become smaller, which will also affect the actual bandwidth utilization.
  • Since nodes outside the cluster cannot learn routing information within the cluster, they cannot directly access the containers in the cluster.

According to the working principle of Calico, Calico chose IPIP in order to solve the cross-subnet communication of containers. It is precisely because of the introduction of IPIP that a series of performance problems have arisen. Then, why did Calico choose the IPIP protocol?

In order to understand this problem, let's take a look at how traditional physical networks solve inter-subnet communication. Taking the above figure as an example, the steps for the physical machine 192.168.2.3 to access the physical machine 192.168.1.2 are as follows:

  • The physical machine 192.168.2.3 detects that the target IP is in a different subnet from itself, so it sends it to its gateway 192.168.2.1 through the default routing rules.
  • The physical gateway 192.168.2.1 can know that the gateway 192.168.2.1 can forward the IP packet to the physical 192.168.1.2 through the routing protocol, so the corresponding IP packet is forwarded to the gateway 192.168.1.1.
  • The physical gateway 192.168.1.1 then forwards the IP packet to the physical machine 192.168.1.2.

Let's look at the Calico network again. If there is no IPIP, the steps for container 10.233.4.2 to access container 10.233.1.2 are as follows:

  • The host 192.168.2.3 detects that the target IP is in a different subnet from its own, so it forwards the IP packet with the target address 10.233.1.2 to its gateway 192.168.2.1.
  • The gateway 192.168.2.1 has no matching routing rules, so drop the IP packet and return the destination unreachable.

If IPIP is introduced, the steps for container 10.233.4.2 to access container 10.233.1.2 are as follows:

  • The host 192.168.2.3 matches the routing rule "10.233.1.0/16 192.168.1.2 tunl0".
  • The host machine 192.168.2.3 forwards the IP packet sent by the container to the physical machine 192.168.1.2 through the tunl0 port.
  • The IPIP driver sends the IP packet (destination address 10.233.1.2) sent by the container as the payload of the physical network IP packet (destination address 192.168.1.2).
  • The host 192.168.2.3 sends the physical network IP packet to the host 192.168.1.2 according to the transmission mode of the physical network.
  • After receiving the IP packet on the host 192.168.1.2, the IPIP driver unpacks and forwards the Payload to the container 10.233.1.2 as an IP packet.

Therefore, under the premise that the physical gateway cannot provide routing services for the virtual network, Calico chose to use IPIP to solve the cross-subnet communication of the container.

If through some method, the physical gateway can learn the routing rules of Calico virtual network, then the physical gateway can provide routing services for the virtual network, and Calico can realize the cross-subnet communication of containers without introducing IPIP, but , Why didn't Calico choose this way?

This is because Calico’s main application scenario is public cloud, which has the following characteristics:

  • Most public cloud vendors can provide a stable large two-tier environment, where the hosts can work in the same subnet segment, so there is no cross-subnet communication problem
  • Not all public cloud vendors can provide BGP routing learning interfaces. If cloud vendors do not provide BGP interfaces, Calico cannot synchronize virtual network routing rules to public clouds

Therefore, under this premise, Calico's choice of IPIP is very reasonable.

For private cloud scenarios:

  • Not all private cloud environments support the second layer, and the demand for Calico cross-subnet communication is very strong.
  • All hardware (including gateways) are within the controllable range. As long as the physical gateway supports the BGP protocol, Calico can synchronize routing rules to the physical gateway.

Therefore, for private cloud scenarios, it is a better choice to solve cross-subnet communication by letting Calico synchronize virtual network routing rules to the physical gateway.

Performance improvement program

In fact, Calico's document mentions a way to synchronize routing rules to physical gateways. For details, please refer to the following link:

https://docs.projectcalico.org/v2.6/usage/external-connectivity

This article refers to this design and synchronizes the virtual network routing rules to the physical gateway. There are two ways to establish BGP neighbors between the virtual network and the physical network:

  • Solution 1: Each node establishes BGP neighbors with physical gateways
  • Solution 2: Centralized components establish BGP neighbors with physical gateways

The following two scenarios are analyzed.

Option One

As shown in the figure below, solution one requires each SDN node to establish BGP neighbors with its physical gateway. Since each SDN node acts as the gateway of the local container, if the Calico service running on the physical machine can establish BGP with the physical gateway and synchronize its own routing rules to the physical gateway, then the physical gateway can learn the routing rules of the virtual network .
Talking about the design of Kubernetes network scheme from the optimization practice of Meitu container

The SDN service running on the node can realize the logic of automatically establishing BGP neighbors through coding or scripting, but this also requires the support of the physical gateway. If the physical gateway does not support the automatic establishment of BGP neighbors, it is meaningless to implement it only on the SDN side.

Traditional BGP routers need to configure BGP neighbors one by one, which is unacceptable for containerized clusters. Because the scale of containerized clusters is generally relatively large, nodes are often adjusted during the use of the cluster, such as capacity expansion, reduction, and machine failure. If every time you adjust a node, you need to manually complete the configuration of BGP neighbors by operation and maintenance, the operation and maintenance costs are huge, and it is easy to affect the stability of the cluster due to misoperation. Therefore, the gateway needs to support the automatic establishment of BGP neighbors.

To realize the automatic establishment of BGP neighbors, you need to use a router that supports the Dynamic Neighbors function. Traditional routers need to specify a clear IP address when configuring BGP neighbors. A router that supports Dynamic Neighbors can specify an IP network segment, and the router can automatically accept requests for establishing neighbors initiated by BGP devices in the specified network segment.

Before cluster deployment, configure the Dynamic Neighbors of the routers. When adjusting the cluster nodes, the SDN service running on the nodes actively establishes BGP neighbors with them, and synchronizes the routing information of the nodes to the physical gateway. For example, the SDN service running on the physical machine 192.168.1.2 automatically establishes a BGP neighbor (eBGP) with the physical gateway 192.168.1.1 and synchronizes the following routing rules to the physical gateway:

10.233.1.0/24 192.168.1.2 eth0

After the physical gateway 192.168.1.1 learns the above rules, it will be synchronized to the physical gateway 192.168.2.1 through BGP or other routing synchronization protocols. In this way, the entire physical network can learn the above rules. When the cluster container wants to send data to the 10.233.1.0/24 network segment, it directly sends the data packet to the physical gateway through its host machine, and the physical gateway can forward it to the target host according to the virtual network routing information it has learned, and finally Forwarded by the target host to the corresponding container.

Option II

Solution 2 needs to introduce a centralized component, which is called BGP Speaker in this article, and its network structure is as follows:
Talking about the design of Kubernetes network scheme from the optimization practice of Meitu container

The BGP Speaker runs in the SDN cluster. It is responsible for collecting routing information in the SDN cluster and synchronizing it to the physical gateway through BGP. Since this module can collect all routing information in the SDN cluster, it is no longer necessary for each SDN node to establish BGP neighbors with the physical gateway separately, and it no longer depends on the Dynamic Neighbors function of the physical gateway.

As shown in the figure below, BGP Speaker is mainly divided into two parts: observer and publisher.

  • observer: Responsible for collecting routing information of the SDN cluster.
  • Publisher: Synchronize the routing information collected by the observer to the physical gateway.
    Talking about the design of Kubernetes network scheme from the optimization practice of Meitu container

Calico saves SDN cluster related information (including configuration information, virtual network segment of each node, etc.) in etcd, and etcd supports watch. Therefore, observers can obtain Calico cluster node information in real time through watch. The observer mainly pays attention to the virtual network segment information divided by each SDN node, and generates corresponding routing rules according to the obtained virtual network segment information. For example, the virtual network segment divided by the node 192.168.1.2 is 10.233.1.0/24, the observer will generate the following Routing rules:

10.233.1.0/24 192.168.1.2 eth0

Publisher implements the BGP protocol. It establishes a BGP neighbor (eBGP) with the physical gateway. The main job is to synchronize the routing information collected by the observer to the physical gateway. Since the publisher needs to synchronize the routing information of different nodes to the physical gateway, the publisher is actually a BGP Route Reflector for the physical gateway.

gobgp provides an open source BGP protocol library, supports a complete BGP protocol, and can be used as a basic library for publishers. In addition, BGP is bidirectional. Publisher can not only synchronize the routing information it holds to the physical router, but the physical router also synchronizes the routing information to the publisher. Since SDN does not need to know the routing information of the physical network, the publisher can filter these out. Routing information.

Since BGP Speaker is deployed centrally, the high availability of BGP Spaker will directly affect the stability of the SDN cluster. BGP Speaker obtains cluster information directly from the etcd storage of the SDN cluster. It does not need to save any data (including generated routing information). Therefore, BGP Speaker is stateless. Therefore, only multiple sets of BGP Speakers need to be deployed in the SDN cluster. Can achieve high availability.

Multiple sets of BGP Speakers are not aware of each other and operate independently. When deploying BGP speakers, you can also deploy BGP speakers in different racks or computer rooms according to the topology of the physical network to ensure that the SDN cluster can still work normally when a rack or computer room fails.

Security risks

Both of the above two schemes can realize the intercommunication between the physical network and the virtual network. However, in both schemes, the virtual network automatically synchronizes routing rules to the physical gateway through the BGP protocol. If the virtual network generates wrong routing rules or the generated routing rules are Physical network conflicts will affect the operating status of the physical network where the cluster is located; if multiple sets of SDN clusters are deployed in the same physical network and the network segments of these multiple sets of SDN conflicts, the routing rules of the virtual network will also be disordered, affecting The stability of SDN.

Therefore, some measures need to be taken to ensure that there is no conflict between SDN and between SDN and the physical network.

Ensure that SDN does not conflict with the physical network

Most of the current physical gateways support route filtering, which means that when the gateway learns new routing rules through BGP, it can filter according to certain rules. Only the routing rules that meet the requirements will be synchronized to its own routing table. When deploying a containerized cluster, you should plan the SDN network address first, and ensure that the SDN network address does not conflict with the physical network.

When configuring the physical gateway, you can set filter rules to ensure that SDN cannot modify any routing rules outside of the network segment to which it belongs (for example, restrict SDN to only update routing rules with the destination address in the 10.233.0.0/16 network segment), and pass In this way, you can ensure that SDN will not affect the stability of the physical network.

Ensure that there is no conflict between SDNs

After the physical network and the virtual network are interconnected, the routing information of multiple SDN clusters deployed in the same physical network will be synchronized to the physical gateway. If the network addresses of different SDNs overlap, conflicts will occur and the SDN network cannot work stably. Therefore, when deploying SDN clusters, different SDN clusters need to be assigned different network addresses (these addresses do not conflict with the physical network). When configuring the gateway, restrict each SDN to only synchronize routing rules in its own network address.

to sum up

The root cause of Calico's performance problems is that the physical network and the virtual network are not interoperable. This article designs two solutions to achieve the interoperability between the physical network and the virtual network, which solves the performance problems caused by the introduction of IPIP and NAT by Calico, and also brings certain Security risks, however, can be eliminated through additional safeguards to ensure the stable operation of the physical network and SDN network.

Li Lianrong, the author of this article, please indicate the source for reprinting, technical originality and architecture practice articles, welcome to submit articles through the official account menu "contact us".

Recommended reading

  • Meitu's Practice in Logging on Large-scale Containerized Platforms (1) Model Selection Thinking
  • Meitu Internet Technology Salon strongly landed in Shenzhen: sharing 1.5 billion users’ experience of Meitu’s back-end technical architecture
  • Let’s talk about how to build a large and medium-sized platform from the practice of Didi Travel
  • Talk about DNS optimization in HTTPS environment: Meitu App request time-consuming and nearly half case

Highly available architecture

Changing the way the internet is built

Talking about the design of Kubernetes network scheme from the optimization practice of Meitu container

Guess you like

Origin blog.51cto.com/14977574/2546958