Calico principle of network communication Secret

Calico is a pure three-program data center network, and seamlessly integrates Iaas such as OpenStack cloud architecture, provide IP communication between the controllable VM, it container, bare metal. Why it is a pure triple it? Because all data packets and containers of the host are found in the form of the corresponding route, then all routes to the BGP protocol to synchronize all the machine or to a data center, thereby completing the entire interconnected network.

Briefly, Calico on the host create a pile of veth pair, wherein a host on one end, the other end of the vessel network namespace, and then several routes are provided in the container and the host to complete the Internet.

1. Calico network model Secret

Let us help you understand the principles of Calico communication network through specific examples. Randomly selecting one node in the cluster as the experimental k8s node A into the container, the container A to see the IP address:

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if771: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1440 qdisc noqueue state UP
    link/ether 66:fb:34:db:c9:b4 brd ff:ff:ff:ff:ff:ff
    inet 172.17.8.2/32 scope global eth0
       valid_lft forever preferred_lft forever

Here the container is acquired / 32-bit host address, the container A represents a single point as a local area.

A glance at the vessel default route:

$ ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link

Now the question is, can know from the routing table 169.254.1.1is the default gateway container, but could not use a card that corresponds to this IP address, which is what the hell?

Mo panic, the first recall, when the destination address of a packet is not native, they will query the routing table, found the gateway from the routing table, it first by ARPobtaining the gateway MAC address, and then sent the data network MAC packet to the gateway in the destination MAC, and IP address of the gateway does not appear in any network header. In other words, no one cares what the IP address of the Exactly, if I could find the MAC address, can respond to ARP on the line.

Thought of this, we can proceed with the down, you can ip neighcheck the local ARP cache command:

$ ip neigh
169.254.1.1 dev eth0 lladdr ee:ee:ee:ee:ee:ee REACHABLE

The MAC address should be Calico cram into it, but also to respond to ARP. But whether it is how to achieve it?

We first recall normal circumstances, the kernel sends out an ARP request asking the entire Layer 2 network who owns 169.254.1.1this IP address with the IP address of the device will be its own MAC
address to the other party. But now the situation is more awkward, container and hosts are not the IP address, even the port on the host calicba2f87f6bb, MAC address is a useless ee:ee:ee:ee:ee:ee. Logically container and the host network simply can not communicate fishes it! So Calico is how to do it?

I will not beat about the bush, in fact Calico use the proxy ARP function card. Proxy ARP is a variation of the ARP protocol, when an ARP request target inter-network gateway device receives this ARP request, will be returned to the requestor with their MAC address, which is the proxy ARP (Proxy ARP). for example:

Above this figure, the server computer sends an ARP request to the MAC address of 8.8.8.8, it will be determined that the router (gateway) receives this request, since the target does not belong to the network segment 8.8.8.8 (i.e., cross-segment), this time will return MAC address of own interface to the PC, the subsequent access to the server computer, the destination MAC is directly encapsulated MAC254.

Now we know, the Calico nature or the use of a proxy ARP told a "white lie," Let's make sure.

View host card information and routing information:

$ ip addr
...
771: calicba2f87f6bb@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 14
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
...

$ ip route 
...
172.17.8.2 dev calicba2f87f6bb scope link
...

Check whether to open proxy ARP:

$ cat /proc/sys/net/ipv4/conf/calicba2f87f6bb/proxy_arp
1

If not worry, you can verify it by tcpdump packet capture:

$ tcpdump -i calicba2f87f6bb -e -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on calicba2f87f6bb, link-type EN10MB (Ethernet), capture size 262144 bytes


14:27:13.565539 ee:ee:ee:ee:ee:ee > 0a:58:ac:1c:ce:12, ethertype IPv4 (0x0800), length 4191: 10.96.0.1.443 > 172.17.8.2.36180: Flags [P.], seq 403862039:403866164, ack 2023703985, win 990, options [nop,nop,TS val 331780572 ecr 603755526], length 4125
14:27:13.565613 0a:58:ac:1c:ce:12 > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 66: 172.17.8.2.36180 > 10.96.0.1.443: Flags [.], ack 4125, win 2465, options [nop,nop,TS val 603758497 ecr 331780572], length 0

to sum up:

  1. Calico all traffic through a smart way to guide the workload of a special gateway 169.254.1.1, so that the drainage device to the host network calixxx eventually convert all three layers two three flow to forward flow.
  2. On the host through open proxy ARP ARP reply to implement such ARP broadcasts are suppressed on the host, broadcast storm suppression, there will not be a problem ARP table expansion.

2. Analog Network

Now that we have mastered the networking principle of Calico, then you can manually simulate verified. Architecture as shown:

Perform the following command on Host0:

$ ip link add veth0 type veth peer name eth0
$ ip netns add ns0
$ ip link set eth0 netns ns0
$ ip netns exec ns0 ip a add 10.20.1.2/24 dev eth0
$ ip netns exec ns0 ip link set eth0 up
$ ip netns exec ns0 ip route add 169.254.1.1 dev eth0 scope link
$ ip netns exec ns0 ip route add default via 169.254.1.1 dev eth0
$ ip link set veth0 up
$ ip route add 10.20.1.2 dev veth0 scope link
$ ip route add 10.20.1.3 via 192.168.1.16 dev ens192
$ echo 1 > /proc/sys/net/ipv4/conf/veth0/proxy_arp

在 Host1 上执行以下命令:

$ ip link add veth0 type veth peer name eth0
$ ip netns add ns1
$ ip link set eth0 netns ns1
$ ip netns exec ns1 ip a add 10.20.1.3/24 dev eth0
$ ip netns exec ns1 ip link set eth0 up
$ ip netns exec ns1 ip route add 169.254.1.1 dev eth0 scope link
$ ip netns exec ns1 ip route add default via 169.254.1.1 dev eth0
$ ip link set veth0 up
$ ip route add 10.20.1.3 dev veth0 scope link
$ ip route add 10.20.1.2 via 192.168.1.32 dev ens192
$ echo 1 > /proc/sys/net/ipv4/conf/veth0/proxy_arp

网络连通性测试:

# Host0
$ ip netns exec ns1 ping 10.20.1.3
PING 10.20.1.3 (10.20.1.3) 56(84) bytes of data.
64 bytes from 10.20.1.3: icmp_seq=1 ttl=62 time=0.303 ms
64 bytes from 10.20.1.3: icmp_seq=2 ttl=62 time=0.334 ms

实验成功!

具体的转发过程如下:

  1. ns0 网络空间的所有数据包都转发到一个虚拟的 IP 地址 169.254.1.1,发送 ARP 请求。
  2. Host0 的 veth 端收到 ARP 请求时通过开启网卡的代理 ARP 功能直接把自己的 MAC 地址返回给 ns0。
  3. ns0 发送目的地址为 ns1 的 IP 数据包。
  4. 因为使用了 169.254.1.1 这样的地址,Host 判断为三层路由转发,查询本地路由 10.20.1.3 via 192.168.1.16 dev ens192 发送给对端 Host1,如果配置了 BGP,这里就会看到 proto 协议为 BIRD。
  5. 当 Host1 收到 10.20.1.3 的数据包时,匹配本地的路由表 10.20.1.3 dev veth0 scope link,将数据包转发到对应的 veth0 端,从而到达 ns1。
  6. 回程类似

通过这个实验,我们可以很清晰地掌握 Calico 网络的数据转发流程,首先需要给所有的 ns 配置一条特殊的路由,并利用 veth 的代理 ARP 功能让 ns 出来的所有转发都变成三层路由转发,然后再利用主机的路由进行转发。这种方式不仅实现了同主机的二三层转发,也能实现跨主机的转发。


Guess you like

Origin www.cnblogs.com/ryanyangcs/p/11273040.html