Fun Talking about Network Protocol-Lecture 28 | Segregation of Cloud Network GRE, VXLAN: Although living in a community, privacy must also be protected

This series of related blog, reference geeks time - Something about network protocol

Fun Talk about Network Protocol-Lecture 28 | Segregation of Cloud and Network GRE, VXLAN: Although you live in a community, you also need to protect privacy

For the isolation problem in the cloud platform, the strategy we have used has always been VLAN, but we have also said that this strategy has only 12 VLANs, with a total of 4096. When it was designed, it seemed to be enough, but now it is definitely not enough, what should I do?

One way is to modify this agreement . This method is often not feasible, because when this protocol forms a certain standard, the procedures that run on thousands of devices must follow this rule. Now let ’s talk about the changes. Who will tell these programs one by one? Obviously, this is an impossible project.

The other way is to extend . A header is extended based on the original package format, which contains enough IDs to distinguish tenants. The outer package format is as traditional as possible and still compatible with the original format. Once we encounter the need to distinguish users, we use this special program to handle this special package format.

This concept is very similar to the tunnel theory we talked about in Chapter 22. Do you remember the story of the self-driving tour through the ferry to Hainan Island? In that section, we said that the extended header is mainly used for encryption, and the header we need now is to be able to distinguish users.

The network composed of the underlying physical network devices is called the Underlay network , and the network composed of these technologies used in virtual machines and clouds is called the Overlay network . This is a virtualized network implementation based on the physical network . In this section, we focus on two Overlay network technologies.

GRE

The first technology is GRE , which stands for Generic Routing Encapsulation , which is an IP-over-IP tunneling technology . It encapsulates the IP packet in the GRE packet, adds the IP header outside, encapsulates the data packet at one end of the tunnel, and transmits it on the path, and decapsulates the other end. You can think of Tunnel as a virtual, point-to-point connection. 
Insert picture description here
As you can see from this figure, in the GRE header, the first 32 bits must be there, and the latter are optional. In the first 4 identification bits, are there any options after the identification? There is a very important key field, which is a 32-bit field, which often stores the Tunnel ID used to distinguish users . 32 bits, enough for any cloud platform to drink a pot!

The format type below is a GRE header format specifically used for network virtualization, called NVGRE . It also gives a 24-bit network ID number, which is completely sufficient.

In addition, GRE also needs a place to encapsulate and decapsulate GRE packages. This place is often a router or a Linux machine with routing capabilities.

GRE tunnel transmission process

Using GRE tunnel, the transmission process is like the following picture. There are two network segments and two routers, and the GRE tunnel is used for communication. After the tunnel is established, two more tunnel ports will be used for packetization and decapsulation.
Insert picture description here

  1. Host A is on the network on the left with an IP address of 192.168.1.102, and it wants to access host B. Host B is on the network on the right with an IP address of 192.168.2.1 15. So a packet is sent with the source address 192.168.1.102 and the destination address 192.168.2.115. Because it needs to be accessed across network segments, according to the default default routing table rules, it should be sent to the default gateway 192.168.1.1, which is the router on the left.
  2. According to the routing table, from the router on the left, go to the 192.168.2.0/24 network segment, you should take a GRE tunnel, and enter the tunnel from Tunnel0 at the end of the tunnel.
  3. Encapsulate the packet at the end of the Tunnel tunnel, and add the GRE header in addition to the internal IP header. For NVGRE, the GRE header is added to the MAC header, and then the external IP address is added, which is the external IP address of the router. The source IP address is 172.17.10.10 and the destination IP address is 172.16.11.10, and then sent from the physical network card of E1 to the public network.
  4. In the public network, she walked along the router hop by hop, all in accordance with the external public IP address.
  5. When the network packet reaches the peer router, it must also reach the Tunnel0 of the peer end, and then start to decapsulate, remove the outer IP header, and then forward it from the E3 port to the server B according to the network packet inside and according to the routing table .

GRE technology shortcomings

From the principle of GRE, we can see that GRE has solved the problem of insufficient VLAN ID through the tunnel. However, the GRE technology itself still has some shortcomings.

The first is the number of tunnels . GRE is a point-to-point tunnel. If there are three networks, a tunnel needs to be established between every two networks. If the number of networks increases, the number of tunnels will increase exponentially. 
Insert picture description here
Second, GRE does not support multicast , so after a virtual machine in a network sends out a broadcast frame, GRE will broadcast it to all nodes that have a tunnel connection with the node.

Another problem is that there are still many firewalls and layer 3 network devices that cannot parse GRE , so they cannot properly filter and load balance GRE encapsulated packets.

VXLAN

The second Overlay technology is called VXLAN . Unlike the three-layer GRE on the third floor, VXLAN sets a VXLAN header from the second floor. The VXLAN ID contained in this is 24 bits, which is enough. The UDP, IP, and outer MAC headers are also encapsulated outside the VXLAN header.
Insert picture description here
As an extensible protocol, VXLAN also needs a place to encapsulate and decapsulate VXLAN packets. The point to achieve this function is called VTEP (VXLAN Tunnel Endpoint) .

VTEP is equivalent to the steward of the virtual machine network. There can be one VTEP on each physical machine. Each virtual machine needs to register with this VTEP housekeeper when it starts, and each VTEP knows how many virtual machines are registered on it. When the virtual machine wants to communicate across VTEP, it needs to be done through the VTEP agent, and the VTEP performs package encapsulation and decapsulation.

Unlike GRE end-to-end tunnels, VXLAN is not point-to-point, but supports multicast to locate the target machine, not necessarily from this end to the other end.

When a VTEP is started, they all need to pass the IGMP protocol . Joining a multicast group is just like joining a mailing list or joining a WeChat group. All messages sent to this mailing list or messages sent to the WeChat group can be received by everyone. And after the virtual machine on each physical machine is started, VTEP knows that a new VM comes online, and it belongs to me.
Insert picture description here
As shown in the figure, virtual machines 1, 2, and 3 belong to the same user's virtual machine in the cloud, so they need to be assigned the same VXLAN ID = 101. On the cloud interface, you can know their IP addresses, so you can ping virtual machine 2 on virtual machine 1.

Virtual machine 1 finds that it does not know the MAC address of virtual machine 2, so the packet cannot be sent out, so it sends an ARP broadcast.
Insert picture description here
When the ARP request arrives at VTEP1, VTEP1 knows that I have a virtual machine here. To access a virtual machine that is not under my control, I need to know the MAC address, but I do n’t know, what should I do?

VTEP1 thought, did I join a WeChat group? You can @all in it and ask who owns virtual machine 2. VTEP1 encapsulates the ARP request in VXLAN and multicasts it.

Of course, in the group, VTEP2 and VTEP3 have received the message, so they will unpack the VXLAN packet to see that there is an ARP.

VTEP3 broadcasted locally for a long time, and no one returned, saying that Virtual Machine 2 was not under their control.

VTEP2 is broadcast locally, and virtual machine 2 returns, saying that virtual machine 2 belongs to me, and the MAC address is this. Through this communication, VTEP2 has also learned. Virtual machine 1 belongs to VTEP1. In the future, to find virtual machine 1, go to VTEP1. 
Insert picture description here
VTEP2 encapsulates the ARP reply in VXLAN. This time, there is no need to multicast any more and it is sent back directly to VTEP1.

VTEP1 unpacks the VXLAN packet and finds that it is an ARP reply, so it sends it to virtual machine 1. Through this communication, VTEP1 also learned. Virtual machine 2 belongs to VTEP2. After that, find virtual machine 2 and find VTEP2.

The ARP of the virtual machine 1 is answered, and the MAC address of the virtual machine 2 is known, so the packet can be sent.
Insert picture description here
The packet sent by virtual machine 1 to virtual machine 2 arrives at VTEP1. Of course it remembers what I just learned. If you want to find virtual machine 2, go to VTEP2. Send it out.

After the network packet reaches VTEP2, VTEP2 unpacks the VXLAN encapsulation and forwards the packet to virtual machine 2.

When the packet returned by virtual machine 2 arrives at VTEP2, it of course remembers what it just learned. To find virtual machine 1, go to VTEP1, so the package is encapsulated in VXLAN, and the IP addresses of VTEP1 and VTEP2 are added to the outer layer Also sent out.

After the network packet reaches VTEP1, VTEP1 unpacks the VXLAN encapsulation and forwards the packet to virtual machine 1.
Insert picture description here
With GRE and VXLAN technology, we can solve the limitation of VLAN in cloud computing. How to integrate this technology into the cloud platform?

How to integrate GRE and VXLAN technologies into the cloud platform?

Remember the story of moving everything in your dorm to a physical machine? 
Insert picture description here
The virtual machine is your computer. The router and DHCP server are equivalent to a home router or bedroom computer. The external network port accesses the Internet. All computers are connected to a switch br0 through the internal network port. If the virtual machine wants to access the Internet , Need to connect to the router through br0, and then forward the request to the public network after NAT through the router.

The next thing is miserable. Your dormitory is in conflict. You will be divided into three dormitory rooms. Corresponding to the picture above, your bedroom head, that is, the router is on a single physical machine, and the other roommates, that is, the VM, are in two. On a physical machine. This time, a complete br0 is cut off, and each dormitory is a separate section.
Insert picture description here
But only the head of your bedroom has a public network port to surf the Internet, so you secretly made a tunnel in the middle of the three dormitories, and connected the two br0s of the three dormitories through the tunnel with the network cable, so that the other roommate's computer and your bedroom head The computer seems to be connected to the same br0, in fact, it is forwarded through the network cable in your tunnel.

Why do we need one more br1 virtual switch? The interconnection between virtual machines and physical machines is divided into two layers mainly through the br1 layer. The middle tunnel can have various digging methods, GRE and VXLAN.

After using OpenvSwitch, br0 can use the Tunnel function and Flow function of OpenvSwitch.

Tunnel function

OpenvSwitch supports three types of tunnels: GRE, VXLAN, and IPsec_GRE . When using OpenvSwitch, the virtual switch is equivalent to the endpoint of GRE and VXLAN encapsulation.

We simulate creating a network topology as follows to see how the tunnel should work. 
Insert picture description here
Each of the three physical machines has two virtual machines, which belong to two different users. Therefore, the VLAN tags must be played differently so that they cannot communicate with each other. But the same users on different physical machines can communicate with each other through the tunnel, so they can be connected together through the GRE tunnel.

Flow function

Next, all Flow Table rules are set on br1, and each br1 has three network cards, of which network card 1 is internal and network cards 2 and 3 are external.

Let's look at the design of Flow Table in detail. 
Insert picture description here

  1. Table 0 is the entrance of all traffic. All the traffic entering br1 is divided into two types of traffic, one is the traffic entering the physical machine, and the other is the traffic from the physical machine.
    All traffic coming in from port 1 is sent out, all handled by Table 1.
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1 in_port=1 actions=resubmit(,1)"

The traffic coming from ports 2 and 3 are all the traffic entering the physical machine, and all are processed by Table 3.

ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1 in_port=2 actions=resubmit(,3)"
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1 in_port=3 actions=resubmit(,3)"

If there is no match, it is discarded by default.

ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=0 actions=drop"
  1. Table 1 is used to process all outgoing network packets. There are two cases, one is unicast and the other is multicast.
    For unicast, Table 20 handles it.
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1 table=1 dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)"

For multicast, it is handled by Table 21.

ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1 table=1 dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,21)"
  1. Table 2 follows Table 1, if it is neither unicast nor multicast, it is discarded by default.
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=0 table=2 actions=drop"
  1. Table 3 is used to process all incoming network packets. It is necessary to convert the Tunnel ID to VLAN ID.
    If it does not match the Tunnel ID, it is discarded by default.
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=0 table=3 actions=drop"

If the tunnel ID is matched, it is converted to the corresponding VLAN ID, and then skip to Table 10.

ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1 table=3 tun_id=0x1 actions=mod_vlan_vid:1,resubmit(,10)"
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1 table=3 tun_id=0x2 actions=mod_vlan_vid:2,resubmit(,10)"
  1. For incoming packets, Table 10 will perform MAC address learning. This is what a Layer 2 switch should do. After learning it, send it from port 1.
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1 table=10
actions=learn(table=20,priority=1,hard_timeout=300,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]-
>NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1"

Table 10 is used to learn the MAC address, and the learning results are placed in Table 20. Table 20 is called MAC learning table.

NXM_OF_VLAN_TCI is a VLAN tag. In the MAC learning table, each entry is only for a certain VLAN, and the learning tables of different VLANs are separated. In the entry of the learning result, it will be marked which VLAN this entry is for.

NXM_OF_ETH_DST [] = NXM_OF_ETH_SRC [] means that the MAC Source Address in the current packet will be placed in dl_dst in the entry of the learning result. This is because each switch learns through incoming network packets. When a certain MAC comes in from a certain port, the switch should remember that future packets destined for this MAC must go out of this port, so the source MAC address is placed in the destination MAC address, because this is for sending made.

load: 0-> NXM_OF_VLAN_TCI [] means that in Table 20, when the packet is sent from the physical machine, the VLAN tag is set to 0, so after learning, there will be actions = strip_vlan in Table 20.

load: NXM_NX_TUN_ID []-> NXM_NX_TUN_ID [] means that in Table 20, when sending the packet from the physical machine, set the Tunnel ID, how much it comes in and how much it is sent, so after studying, There will be set_tunnel in Table 20.

output: NXM_OF_IN_PORT [] is sent to which port. For example, if you came in from port 2, after studying, there will be output: 2 in Table 20.
Insert picture description here
Therefore, as shown in the figure, through the MAC address learning rule on the left, the learned result is just like the one on the right. This result will be placed in Table 20.

  1. Table 20 is the MAC Address Learning Table. If it is not empty, it will be processed according to the rules; if it is empty, it means that no MAC address learning has been done, so it has to be broadcast, so it must be handed over to Table 21.
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=0 table=20 actions=resubmit(,21)"
  1. Table 21 is used to process multicast packets.
    If it does not match the VLAN ID, it is discarded by default.
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=0 table=21 actions=drop"

If the VLAN ID is matched, the VLAN ID is converted to the Tunnel ID and sent from both port 2 and port 3 of the two network cards for multicast.

ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1table=21dl_vlan=1 actions=strip_vlan,set_tunnel:0x1,output:2,output:3"
ovs-ofctl add-flow br1 "hard_timeout=0 idle_timeout=0 priority=1table=21dl_vlan=2 actions=strip_vlan,set_tunnel:0x2,output:2,output:3"

summary

Well, this section is here, let's summarize it.

  • To isolate the networks of different users and solve the problem of the limited number of VLANs, you need to use the Overlay method. GRE and VXLAN are commonly used.
  • GRE is a point-to-point tunnel mode. VXLAN supports multicast tunnel mode. They all need to be encapsulated and decapsulated at a certain Tunnel Endpoint to achieve inter-physical machine interworking.
  • OpenvSwitch can be used as a Tunnel Endpoint, by setting the rules of the flow table to isolate and convert the virtual machine network and the physical machine network.

Finally, I will leave you two thinking questions.

  1. Although VXLAN can support multicast, if the number of virtual machines is relatively large, the broadcast storm problem will still be very serious in the Overlay network. Can you think of any way to solve this problem?
  2. Clouds based on virtual machines are more complicated, and the network cards in the virtual machines have more levels of conversion to the physical network. There is a more lightweight cloud model than virtual machines. Do you know what?
Published 40 original articles · won praise 1 · views 5347

Guess you like

Origin blog.csdn.net/aha_jasper/article/details/105575763