A deep dive into the Kubernetes network model and network communications

Kubernetes defines a simple and consistent network model based on the design of the flat network structure. It can communicate efficiently without mapping host ports to network ports, and without other components for forwarding. This model also makes it easy to migrate applications from virtual machines or host physical machines to Kubernetes-managed pods.

This article mainly explores the Kubernetes network model in depth and understands how containers and pods communicate. The implementation of the network model will be introduced in a later article.

network-iceberg

Kubernetes network model

The model defines:

  • Each pod has its own IP address, which is reachable within the cluster.
  • All containers in a pod share the pod IP address (including the MAC address), and the containers can communicate with each other before ( localhostusing
  • Pods can use the pod IP address to communicate with other pods on any node in the cluster without NAT
  • Kubernetes components can communicate with each other and with pods
  • Network isolation can be achieved through network policies

Several relevant components are mentioned in the definition above:

  • Pod: A pod in Kubernetes is similar to a virtual machine with a unique IP address. Pods on the same node share network and storage.
  • Container: A pod is a collection of containers that share the same network namespace. A container in a pod is like a process on a virtual machine, and the processes can use to localhostcommunicate; the container has its own independent file system, CPU, memory, and process space. Containers need to be created by creating Pods.
  • Node: pod runs on a node, and the cluster contains one or more nodes. Each pod's network namespace will be connected to the node's namespace to open up the network.

I’ve talked about network namespaces so many times, but how does it work?

How network namespaces work

Create a pod in the Kubernetes distribution k3s . This pod has two containers: a container that sends requests curland a container that provides web services httpbin.

Although the distribution version is used, it still uses the Kubernetes network model, which does not prevent us from understanding the network model.

apiVersion: v1
kind: Pod
metadata:
  name: multi-container-pod
spec:
  containers:
  - image: curlimages/curl
    name: curl
    command: ["sleep", "365d"]
  - image: kennethreitz/httpbin
    name: httpbin

Logged into the node through lsns -t netthe network namespace on the current host, but httpbinthe process was not found. There is a namespace command /pausethat this pauseprocess is actually an invisible sandbox container process in each pod. The role of the sanbox container will be introduced in the next article Container Network and CNI.

lsns -t net
        NS TYPE NPROCS    PID USER     NETNSID NSFS                                                COMMAND
4026531992 net     126      1 root  unassigned                                                     /lib/systemd/systemd --system --deserialize 31
4026532247 net       1  83224 uuidd unassigned                                                     /usr/sbin/uuidd --socket-activation
4026532317 net       4 129820 65535          0 /run/netns/cni-607c5530-b6d8-ba57-420e-a467d7b10c56 /pause

Since each container has an independent process space, we change the command to view the space of the process type:

lsns -t pid
        NS TYPE NPROCS    PID USER            COMMAND
4026531836 pid     127      1 root            /lib/systemd/systemd --system --deserialize 31
4026532387 pid       1 129820 65535           /pause
4026532389 pid       1 129855 systemd-network sleep 365d
4026532391 pid       2 129889 root            /usr/bin/python3 /usr/local/bin/gunicorn -b 0.0.0.0:80 httpbin:app -k gevent

129889The namespace to which it belongs can be found through the process PID :

ip netns identify 129889
cni-607c5530-b6d8-ba57-420e-a467d7b10c56

You can then execute the command using this namespace exec:

ip netns exec cni-607c5530-b6d8-ba57-420e-a467d7b10c56 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether f2:c8:17:b6:5f:e5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.42.1.14/24 brd 10.42.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f0c8:17ff:feb6:5fe5/64 scope link
       valid_lft forever preferred_lft forever

From the results, it can be seen that the IP address of the pod 10.42.1.14is bound to the interface eth0, and eth0is connected to 17the interface number.

On the node host, check 17the interface information. veth7912056bIt is a virtual ethernet interface (virtual ethernet device) under the root namespace of the host. It is a tunnel connecting the pod network and the node network , and the peer end is an interface under the pod namespace eth0.

ip link | grep -A1 ^17
17: veth7912056b@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default
    link/ether d6:5e:54:7f:df:af brd ff:ff:ff:ff:ff:ff link-netns cni-607c5530-b6d8-ba57-420e-a467d7b10c56

The above results show that the vethserver is connected cni0to a network bridge.

Bridges work at the data link layer (Layer 2 of the OSI model) and connect multiple networks (can be multiple network segments). When the request reaches the bridge, the bridge will ask all connected interfaces (here pods are connected to the bridge through veth) if they have the IP address in the original request. If there is an interface response, the bridge will record the matching information (IP -> veth) and forward the data.

What if there is no interface response? The specific process depends on the implementation of each network plug-in. I plan to introduce commonly used network plug-ins, such as Calico, Flannel, Cilium, etc. in subsequent articles.

Next, let’s take a look at how network communication is completed in Kubernetes. There are several types:

  • Communication between containers in the same pod
  • Communication between pods on the same node
  • Communication between pods on different nodes

How Kubernetes networking works

Communication between containers within the same pod

The communication between containers in the same pod is the simplest. These containers share the network namespace, and each namespace has a loloopback interface, which can be used localhostto complete the communication.

network-within-pod

Communication between pods on the same node

When we run curlcontainers and httpbinin two pods respectively, the two pods may be scheduled to the same node. curlThe outgoing request reaches the interface in the pod according to the routing table in the container eth0. It then reaches the root network space of the node through eth0the tunnel connected to .veth1

veth1cni0Connected to other virtual ethernet interfaces connected to other pods through a bridge vethX, the bridge will ask all connected interfaces if they have the IP address in the original request (such as here 10.42.1.9). After receiving the response, the bridge will record the mapping information ( 10.42.1.9=> veth0) and forward the data at the same time. The final data veth0enters the pod through the tunnel httpbin.

network-within-node

Communication between pods on different nodes

Communication between pods across nodes is more complicated, and different network plug-ins handle it differently . Here we choose an easy-to-understand method to briefly explain.

The first half of the process is similar to communication between pods on the same node. When the request reaches the bridge, the bridge asks which pod owns the IP but does not receive a response. The process enters the routing addressing process of the host and reaches the higher cluster level.

There is a routing table at the cluster level, which stores the Pod IP network segment of each node (a Pod network segment (Pod CIDR) will be allocated when a node joins the cluster, for example, the default Pod CIDR in k3s is, the node 10.42.0.0/16obtained The network segments are 10.42.0.0/24, 10.42.1.0/24, 10.42.2.0/24, and so on). The node requesting the IP can be determined through the node's Pod IP network segment, and then the request is sent to the node.

cross-node

Summarize

Now you should have a preliminary understanding of Kubernetes network communication.

The entire communication process requires the cooperation of various components, such as Pod network namespace, pod Ethernet interface eth0, virtual Ethernet interface vethX, network bridge, cni0etc. Some of these components correspond to pods one-to-one and have the same life cycle as pods. Although it can be created, associated, and deleted manually, for non-permanent resources such as pods, which are frequently created and destroyed, too much manual work is also unrealistic.

In fact, these tasks are entrusted by the container to the network plug-in, and the network plug-in follows the specification CNI (Container Network Interface).

What do network plug-ins do?

  • Create a network namespace for pods (containers)
  • Create interface
  • Create veth pair
  • Set up namespace networking
  • Set up static routes
  • Configure the Ethernet bridge
  • Assign IP address
  • Create NAT rules
  • ...

reference

Articles are published uniformly on the public account云原生指北

The web version of Windows 12 deepin-IDE compiled by junior high school students was officially unveiled. It is known as "truly independently developed" QQ has achieved "three-terminal simultaneous updates", and the underlying NT architecture is based on Electron QQ for Linux officially released 3.2.0 "Father of Hongmeng" Wang Chenglu : Hongmeng PC version system will be launched next year to challenge ChatGPT, these 8 domestic AI large model products GitUI v0.24.0 are released, the default wallpaper of Ubuntu 23.10, a Git terminal written in Rust, is revealed, the "Tauren" in the maze JetBrains announces the WebStorm 2023.3 roadmap China Human Java Ecosystem, Solon v2.5.3 released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5110404/blog/5605493