k8s Network Model Analysis

Today, we study a little k8s network model, based on the analysis flannel vxlan + kubeproxy iptables mode.

A .Docker

First analyze Docker-level network model, we know that container is a kernel-based namespace mechanism to achieve isolation resources. network is one of many in a namespace, then how to ensure a communication between nodes on the container it? Docker approach is to be bridged by virtual network adapter virtual bridge. The following specific explanation.

First, each container is in its own network namespace inside by default, which means that by default it is only a localhost own separate network (or what network equipment is not? TBD), can not communicate with the outside. To solve this problem, Docker to create a pair of veth pair, always accepted this veth pair appears, can be understood as a pair of ports, all data will go from one end from the other end. This will then docker veth pair of a namespace is added to the container, the other end on a virtual bridge to bridge, this bridge is actually docker0 virtual NIC on the host, we can observe the following instructions:

[wlh@meizu storage]$ brctl show
bridge name    bridge id        STP enabled    interfaces
docker0        8000.02422551422b    no        veth43dc241
                                       veth551eae5
                                       veth844b02c
                                              vethd06364a
                                              vethe95e44c

The figure can be seen that each container device veth bridged docker0 above, within such a communication along the container can vethA-1 -> vethA-2 -> docker0 -> vethB-2 -> vethB-1 flowed

2. Flannel

Docker enables communication between nodes on the same container, then k8s orchestration platform as a container, how to communicate on different nodes of the container it? It needs the support of third-party plug-ins, there are multiple overlay network solutions, introduced one of the more simple one, flannel here. flannel currently supports three operating modes: vxlan, udp, host-gw, wherein udp and more like VXLAN, flannel program itself is udp packets encapsulated in user mode, the kernel VXLAN for packet processing, and therefore will udp slower. So udp is not recommended for use in a production environment, but for debug. The host-gw mode requires between all nodes with one of the other nodes have a direct route (specifically refer to the relevant article), here we use vxlan explain as a working model.

At work, flannel etcd synchronized data from the storage k8s, including the use of sub-operating mode and other nodes in the cluster. For example, on my machine, data stored in its etcd as follows:

 1 [wlh@xiaomi xuexi]$ etcdctl ls /kube-fujitsu/network
 2 /kube-fujitsu/network/config
 3 /kube-fujitsu/network/subnets
 4 
 5 [wlh@xiaomi xuexi]$ etcdctl get /kube-fujitsu/network/config
 6 {"Network":"172.30.0.0/16","SubnetLen":24,"Backend":{"Type":"vxlan"}}
 7 
 8 [wlh@xiaomi xuexi]$ etcdctl ls /kube-fujitsu/network/subnets
 9 /kube-fujitsu/network/subnets/172.30.20.0-24
10 /kube-fujitsu/network/subnets/172.30.44.0-24
11 /kube-fujitsu/network/subnets/172.30.83.0-24
12 
13 [wlh@xiaomi xuexi]$ etcdctl get /kube-fujitsu/network/subnets/172.30.83.0-24
14 {"PublicIP":"10.167.226.38","BackendType":"vxlan","BackendData":{"VtepMAC":"b6:c7:0f:7f:66:a7"}}

 

Here the line 6 represents the subnet 172.30.0.0/16 entire cluster, rather 8/9/10 three lines representing the three nodes, each node to create a new will from 172.30.0.0/16 the redistribution of a subnet to it. flannel processes on each node reads etcd of these configurations, and then modify the parameters docker start their processes on the node, which is added in a --bip = 172.30.20.1 / 24, so that the node starts all containers are docker in this sub-segment inside. With these settings, ensure that the cluster ip address between all the containers are not repeated.

Solve the problem of container after repeated ip address, the following is a container cross-node communication. In vxlan mode, flannel creates a virtual network adapter on the node called flannel.1, its MAC address is output above the VtepMAC. The same routing table of the node will be modified, as shown below:

1 [wlh@meizu storage]$ route
2 Kernel IP routing table
3 Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
4 //....more 
5 172.30.20.0     0.0.0.0         255.255.255.0   U     0      0        0 docker0
6 172.30.44.0     172.30.44.0     255.255.255.0   UG    0      0        0 flannel.1
7 172.30.83.0     172.30. 83.0      255.255 . 255.0    UG     0       0         0 flannel. 1

Here you can see the destination address is 172.30.20.0/24 will be sent to docker0, this is in fact a container on this host. And the container on the other nodes will be routed to the card processing flannel.1. flannel flannel.1 data will be received on the card, the addition of provisions flannel good header, then sent out from the NIC teaming. The encapsulated packet is udp protocol, the destination address is the physical address of the node where the container and which is the default port 8472 (udp mode is the default port 8285). That underlying vxlan mode implementations are udp packet sent by only vxlan encapsulated packet mode is done in kernel mode, the mode udp packets encapsulated in the user mode is completed. On the host where the target container, flannel listens port 8472, remove the packet flannel head and then transferred to docker0 card, network card docker0 received packet is an ordinary container communication will not perceive these processes underlying.

3. kube-proxy

We know, k8s in the concept of service, which has its own ip address. Then access to the service is how to distribute pod back end of it. The work is done by kube-proxy, which has three operating modes, userspace (older), iptables (faster), ipvs (experimental). Wherein userspace early model, it is essentially to make the use of a proxy kube-proxy, all access to the service will be forwarded to kube-proxy component, and a request to redistribute it pod. Apparently this model for a large-scale cluster is a speed bottleneck. iptables pattern is achieved through modification iptable requesting the distribution. ipvs mode do not know.

Below an example will be specifically described iptables mode. First create deployment and service listed below:

 1 apiVersion: apps/v1
 2 kind: Deployment
 3 metadata:
 4   name: nginx
 5   labels:
 6     name: nginx
 7 spec:
 8   selector:
 9     matchLabels:
10       name: nginx1
11   replicas: 3
12   template:
13     metadata:
14       labels:
15         name: nginx1
16     spec:
17       nodeName: meizu
18       containers:
19       - name: nginx
20         image: nginx
21         ports:
22         - containerPort: 80
23 ---
24 apiVersion: v1
25 kind: Service
26 metadata:
27   name: nginx
28   labels:
29     name: nginx1
30 spec:
31   ports:
32   - port: 4432
33     targetPort: 80
34   selector:
35     name: nginx1
[wlh@xiaomi xuexi]$ kubectl get pod -o wide|grep nginx
nginx-cb648c7f5-c8h26       1/1     Running   0          24m    172.30.20.7   meizu    <none>           <none>
nginx-cb648c7f5-pptl9       1/1     Running   0          40m    172.30.20.6   meizu    <none>           <none>
nginx-cb648c7f5-zbsvz       1/1     Running   0          24m    172.30.20.8   meizu    <none>           <none>

[wlh@xiaomi xuexi]$ kubectl get svc -o wide
nginx        ClusterIP   10.254.40.119   <none>        4432/TCP   38m    name=nginx1

Here create a service, provide simple nginx service in 4432 outside the port. Observed after the resource is created, kube-proxy adds the following rule iptables NAT table on the node:

[wlh@meizu storage]$ sudo iptables-save|grep nginx
-A KUBE-SERVICES ! -s 10.254.0.0/16 -d 10.254.40.119/32 -p tcp -m comment --comment "default/nginx: cluster IP" -m tcp --dport 4432 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.254.40.119/32 -p tcp -m comment --comment "default/nginx: cluster IP" -m tcp --dport 4432 -j KUBE-SVC-4N57TFCL4MD7ZTDA
[wlh@meizu storage]$
sudo iptables-save|grep 0x4000/0x4000 -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000 -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT

 

The first line is output to make a mark, meaning that all destined 10.254.40.118:4432(nginx service) request (except for the source ip 10.254.0.0/16 packets) will be marked with a flag, the packet is in the subsequent processing will be performed after the filter table marked with this flag. In the filter table will be marked with marking packets MASQUERADE process, in fact, SNAT, the packet source ip address into a physical address of the local host network adapter, and then sent out, or if the direct use of the container, then ip address , it is clear that the physical network is not going to recognize this address.

 

Guess you like

Origin www.cnblogs.com/elnino/p/11369760.html