With so many CNIs, how to choose? | Container Networking Series Issue 1

Container Networking Series: Issue 1

With the application of cloud-native technology, Kubernetes is becoming the operating system in the cloud-native era, and technological innovations around Kubernetes are emerging one after another. Among them, taking the container network as an example, there are nearly 30 CNIs registered on the Kubernetes official website.

However, as enterprises begin to use Kubernetes to carry more and more services, enterprises have higher and higher requirements for container networks, and some have even exceeded the scope of CNI. How to alleviate the network resistance encountered by enterprises when implementing private container cloud platforms has developed into a very important and urgent problem.

In this issue, we will focus on what are the typical scenarios of CNI? And how to make CNI selection?


1.   Introduction to CNI

CNI (Container Network Interface) defines a set of specifications for implementing container network interface configuration and IP address assignment. CNI is only concerned with the network connection of the container and removing the allocated network resources when the container is deleted, so CNI is widely supported and the specification is easy to implement.

With the rapid development of kuberntes, many excellent open source CNIs have emerged today, such as calico, flannel, and cilium. Each CNI has its own characteristics and application scenarios, but also has its own shortcomings. Therefore, it is very important to use the appropriate CNI for different scenarios. Sometimes, in order to meet more complex network requirements, multiple CNIs are required to be used in combination, which will make the network model more complex and greatly increase the difficulty of maintenance.

For more detailed CNI explanation, please refer to:

 

1.1 Comparison of mainstream CNIs

To sum up, in the field of open source CNI, calico, flannel, and cilium have their own advantages, but they are still slightly insufficient for enterprise functions. For example, calico and flannel have basic functions, and the community is more active. At the same time, the two can be used in combination to learn from each other, but the maintenance cost of combined use will also increase. cilium is an independent data plane implemented by eBPF. It has deep research on network security and service forwarding, but it is difficult to operate and maintain. On the other hand, Fabric is more comprehensive in terms of functions, with excellent performance and stability, and relatively simple operation and maintenance.

 

2. Introduction to Fabric

Fabric is a CNI plug-in developed by Boyun, which aims to provide a container network management platform that can adapt to a variety of scenarios, is powerful, has excellent performance, is stable, reliable and easy to use.

Fabric supports underlay/overlay mode, IPV4/IPV6 single-stack and dual-stack, container multi-network/multi-NIC, cluster federation, EIP, QoS, NetworkPolicy, PodSecurity, Windows and other features.

In addition, in order to improve the efficiency of operation and maintenance, a corresponding debug tool has been developed, which has the capabilities of traffic tracking and cache analysis. Support Linux/Windows operating system, support ARM/X86 and other CPU architectures.

From Fabric 2.5+ onwards, functions such as eBPF, smart NICs, traffic analysis, and non-inductive upgrades will also be integrated to improve data plane performance, business stability, and finer-grained O&M debugging capabilities.

 

2.1 Overall Architecture

Figure 1 Fabric Underlay Architecture

 

Figure 2 Fabric overlay architecture

 

Fabric consists of four core components:

  • ovs : mature and stable software switch for access to container interfaces, as well as data plane forwarding and security policy control

  • ovs-controller : Responsible for the control of the ovs data plane, the default flow table generation, and the dynamic flow table delivery

  • fabric-ctl : responsible for the collection, caching, and control of other core functions of pod information in the cluster

  • fabric binaries : standard k8s CNI for configuration of container network interfaces

 

2.2 Design Concept

Fabric has always followed four design principles:

  1. Micro-segmentation design: control plane high stability, fast convergence, high performance, low latency

  2. Simple and stable: fully distributed deployment, strong scalability, distributed control, no single point of failure, automatic operation and maintenance

  3. Security isolation: Tenant isolation, NetworkPolicy, PodSecurity, tunnel encryption

  4. Rich functions: IP/MAC fixed, QoS, egressIP, cluster federation, network monitoring

 

2.3 Development History

Fabric has released 6 stable versions since the end of 2018, and has been running stably in the production environment of more than ten financial customers for many years.

Figure 3 Fabric development process

 

3. General scenarios and selection suggestions

k8s supports mixed deployment. In many production environments, in order to maximize the use of cluster resources, k8s master nodes can be deployed on small-scale virtual machines, while computing nodes can be deployed on physical machines or large-scale virtual machines. Even, a cluster can have heterogeneous CPU nodes or nodes with different operating systems (Linux/Windows) at the same time. No matter which mode of Fabric is selected, its core functions are perfectly supported, such as multiple networks, multiple IPs, QoS, tenant isolation, and Network Policy.

 

3.1 Underlay dual network card mode

This mode uses the simplest way to get through the needs of hosts inside and outside the cluster to access services in the cluster. At the same time, all traffic in the cluster can be tracked on external switches, routers, firewalls and other network devices, which is convenient for network operation and maintenance personnel. Flow control and other auditing capabilities. In this mode, the performance of the service network can also be guaranteed to the greatest extent, which is almost equal to the maximum bandwidth provided by the service network. However, this mode consumes the real IP addresses of the business network, which limits the scale of the cluster to a certain extent.

 

3.1.1 Environmental Requirements

  1. Each node contains at least two network cards, which are used for the management network and the service network respectively.

  2. The management network and service network cannot overlap, and the service network IP cannot be occupied by other machines outside the cluster

  3. The service NIC configures the uplink port as access/trunk as required, and enables the promiscuous mode

 

3.1.2 Applicable scenarios

(1) The machines outside the cluster have the need to directly access the services in the cluster, such as the registration center outside the cluster

(2) Intra-cluster services have higher requirements on network performance

(3) The business traffic in the cluster can be controlled by security devices such as firewalls

(4) Small and medium-sized clusters, because the pods in the cluster consume real IP addresses. In this scenario, IP addresses are relatively precious. If the cluster scale is too large, the business network IP address pool may be exhausted.

 

3.1.3 Suggestions for Service Network Planning

Because the underlay consumes the existing IP resources, the planning of the service network is very important. If a single service network is too large, the broadcast domain may be too large and the performance of the switch will be degraded. Therefore, when planning the service network, we can divide it into multiple C-type networks, and different C-type networks can be used by different tenants. This also ensures that Isolation between tenants.

 

3.2 Overlay mode

This mode is easy to deploy and has few environment dependencies. Even if you don't care about the location of the nodes, the overlay mode can theoretically be used as long as the nodes in the cluster can access each other. In addition, this mode consumes virtual IP addresses and can support large-scale business clusters.

 

3.2.1 Environmental Requirements

(1) The node has at least one network card, and each node can communicate with each other

(2) If there are multiple network cards on the node, it is necessary to ensure that the index of the management network card is the smallest.

(3) The corresponding tunnel protocol ports are released between nodes. For example, vxlan needs to allow UDP 4789 ports, and geneve needs to allow UDP 6081 ports.

 

3.2.2 Applicable scenarios

(1) Clusters outside the cluster do not need to directly access cluster services

(2) Cluster services do not require very high network performance

(3) Private address pool to support large-scale clusters

(4) There is less external network dependence, and no other configuration is required except to release the corresponding tunneling protocol policy

 

3.2.3 Overlay Federation

In order to open up the communication of business Pods between Fabric clusters, Fabric has supported cluster federation capabilities since 2.3. Compared with the network federation solution provided by the open source community, Fabric federation does not need to establish multiple tunnels. The principle of cross-cluster Pod communication is the same as that of cross-node Pod communication in the same cluster. It does not require additional gateway forwarding, and directly uses a layer of tunnels to open up without redundant networks. loss.

 

4. VPC scenarios and model selection suggestions

From a network perspective, the infrastructure can be distributed to three layers, most business systems are in the second and third layers, and network policies between or within layers need to be configured.

 

4.1 VPC-based multi-cluster management

According to different business needs, we can build multiple clusters, including underlay and overlay deployment modes as needed, virtual machines can communicate directly with Pods in the underlay cluster, and overlay clusters in the same VPC can use federation to cross-cluster business communication.

 

5. Future Outlook

Now Fabric has rich and powerful functions, and can be applied to various environments. In the future, we will continue to optimize performance on the control plane and data plane, and provide richer network capabilities, monitoring and analysis capabilities, and operation and maintenance capabilities, aiming to create a more stable and powerful enterprise-level container network platform.

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324114685&siteId=291194637