Fun Talking about Network Protocol-Lecture 21 | Data Center: I am a developer and own a house to build a villa

This series of related blog, reference geeks time - Something about network protocol

Fun Talking about Network Protocol-Lecture 21 | Data Center: I am a developer and own a house to build a villa

Whether you are watching news, placing orders, watching videos, downloading files, the final destination is in the data center. We learned so much about network protocols and network-related knowledge before. Are you curious, what does a data center look like?

data center

The data center is a hodgepodge, and almost all the knowledge learned before is used.

When we talked about the office network, we knew that there were many computers in the office. If you want to access the external network, you need to pass through something called a gateway , which is often a router.

There are also a lot of computers in the data center, but it is not the same as the laptop or desktop in our office. Inside the data center are servers. It is placed in a server called a rack (Rack) shelf above.

The entrance and exit of the data center are also routers, because at the border of the data center, just like the border of a country, it is called the border router . For high availability, there will be multiple border routers.

Generally, only one operator's network is connected at home, and for high availability, in order to provide services through another operator when one operator has a problem, the border router of the data center will connect multiple operators The internet.

Since it is a router, you need to run a routing protocol. The data center is often an autonomous area (AS) in the routing protocol. If the machine in the data center wants to access the website outside, the machine in the data center also provides services to the outside world, all of which can obtain the internal and external routing information through the BGP protocol. This is the concept of multi-line BGP we often hear .

If the data center is very simple and there are few machines, then just like the home or dorm, all the servers are directly connected to the router. However, there are often a lot of machines in the data center. When a rack is filled, a switch is needed to connect these servers to communicate with each other.

These switches are often placed on the top of the rack, so they are often called TOR (Top Of Rack) switches . This layer of switches is often called the access layer (Access Layer) . Note that this access layer is not a concept with the access layer of the application that was previously described. 
Insert picture description here
When a rack cannot be placed, multiple racks are required, and a switch is also required to connect the multiple racks together. These switches have higher performance requirements and greater bandwidth. These switches are called aggregation layer switches (Aggregation Layer) .

Every connection in the data center needs to consider high availability. The first thing to consider here is that if a machine has only one network card, a network cable is connected to it, and it is connected to the TOR switch. If the network card is broken, or the network cable is accidentally dropped, the machine will not be able to get on. Therefore, at least two network cards and two network cables need to be plugged into the TOR switch, but the two network cards must work like a network card. This is often referred to as network card bonding .

This requires that both the server and the switch support a protocol LACP (Link Aggregation Control Protocol) . They communicate with each other, aggregate multiple network cards into one network card, and aggregate multiple network cables into one network cable. Load balancing can be performed between the network cables, or they can be used as preparations for high.
Insert picture description here
The network card has a high availability guarantee, but the switch still has problems. If there is only one switch in a rack and it hangs, the entire rack can no longer be connected to the Internet. Therefore, the TOR switch also needs high availability, and the connection between the access layer and the aggregation layer also needs high availability, and it cannot be connected in a single line.

The most traditional method is to deploy two access switches and two aggregation switches. Both the server and the two access switches are connected, and the access switch and the two aggregations are connected. Of course, this will form a ring, so you need to enable the STP protocol and remove the ring, but then the two aggregations can only be one master and one backup. We have learned in the STP protocol that only one way will work. 
Insert picture description here
The switch has a technology called stacking , so another method is to form multiple switches into a logical switch, the server is connected to multiple access layer switches through multiple wire distribution, and the multiple wires of the access layer switch are respectively Connect to multiple switches, and form a hyperactive connection by stacking private protocols .
Insert picture description here
Due to the greater bandwidth requirements and the greater impact of hanging, two stacks may not be enough, there may be more, such as four stacks for a logical switch.

The convergence layer connects a large number of computing nodes together to form a cluster. In this cluster, the servers communicate with each other through the second layer. This area is often called a POD (Point Of Delivery) , and sometimes it is also called an available zone (Available Zone) .

When the number of nodes is too large, an availability zone cannot be placed, and multiple availability zones need to be connected together. The switch connecting multiple availability zones is called a core switch .
Insert picture description here
Core switches have higher throughput and higher requirements for high availability. They definitely need to be stacked, but often stacking is not enough to meet the throughput, so multiple sets of core switches still need to be deployed. For high availability, the core and aggregation switches are also fully interconnected.

At that time, there was still that problem. What should I do if there was a loop?

One way is that different availability zones are on different Layer 2 networks, and different network segments need to be allocated. The aggregation and core communicate through a Layer 3 network. Layer 2 is not in a broadcast domain, and there will be no Layer 2 ring. The problem of the road. There is no problem with the three-layer ring, as long as the best path is selected through the routing protocol. Why can't the second layer have a loop, and the third layer can? You can recall the situation of the second layer loop.
Insert picture description here
As shown in the figure, the internal routing protocol OSPF is used to find the best path for access between the core layer and the aggregation layer, and ECMP equivalent routing can be used to load balance and high availability among multiple paths.

However, with the increasing number of machines in the data center, especially with cloud computing and big data, the cluster size is very large, and they all require a layer 2 network. This requires the two-layer interconnection to rise from the convergence layer to the core layer , that is, below the core, all are two-layer interconnections, all in a broadcast domain, which is often referred to as the large second layer .
Insert picture description here
If the horizontal traffic of the large Layer 2 is not large and the number of core switches is not large, stacking can be done. However, if the horizontal traffic is large and the stacking is not enough, multiple sets of core switches need to be deployed and fully interconnected with the aggregation layer. Because stacking only solves the ringless problem in a core switch group, and the entire interconnection between groups, other mechanisms are needed to solve it.

If it is STP, the ability to deploy multiple sets of cores cannot expand horizontal traffic, because only one set still works.

So the second layer introduced TRILL (Transparent Interconnection of Lots of Link) , which is a multi-link transparent interconnection protocol . The basic idea is that if there is a problem with a layer 2 ring, but there is no problem with a layer 3 ring, then the routing capability of layer 3 is simulated at layer 2.

The switch running the TRILL protocol is called RBridge , which is a bridge device with routing and forwarding characteristics , but this route is based on the MAC address, not the IP.

Rbridage operates through link state protocol . Remember this routing protocol? Through it, you can learn the topology of the entire Layer 2 and know which MAC should be accessed from which bridge; you can also calculate the shortest path, and you can also use equal-cost routing for load balancing and high availability. 
Insert picture description here
The TRILL protocol adds its own header to the original MAC header and the outer MAC header. Ingress RBridge in the TRILL header is a bit like the source IP address in the IP header. Egress RBridge is the destination IP address. These two addresses are end-to-end, and will not change during the intermediate routing. The outer MAC can have a bridge for the next hop, just like the next hop for routing is also presented by the MAC address.

As shown in the process shown in the figure, there is a packet to be sent from host A to host B, through RBridge 1, RBridge 2, RBridge X, etc. until RBridge 3. The packet received by RBridge 2 is divided into two layers: the inner layer is the MAC address and the inner VLAN of the traditional host A and host B.

First add a TRILL header to the outer layer, which describes that the packet comes in from RBridge 1, it needs to go out from RBridge 3, and it has the same hop count as the IP address of layer 3. Then outside, the destination MAC is RBridge 2, the source MAC is RBridge 1, and the outer VLAN.

When RBridge 2 receives this packet, first check whether the MAC is its own MAC. If so, it depends on whether it is Egress RBridge, that is, whether it is the last hop; Find the next hop RBridge X in a similar way to route search, and then send the packet.

In the packets sent by RBridge 2, the inner information is unchanged, and the outer TRILL header is inside. Similarly, describing that this packet comes in from RBridge 1, it has to go out from RBridge 3, but the hop count is reduced by 1. The target MAC of the outer layer becomes RBridge X, and the source MAC becomes RBridge 2.

This keeps forwarding until RBridge 3, unpacks the outer layer, and sends the inner layer packet to host B.

Is this process very similar to IP routing?

For the large second layer broadcast package, it also needs to be realized by the technology of distribution tree. We know that STP is a graph with a ring, and a tree is formed by removing edges, and the distribution tree is a graph with a ring to form multiple trees. Different trees have different VLANs, and some broadcast packets are broadcast from VLAN A Some broadcast from VLAN B to achieve load balancing and high availability. 
Insert picture description here
Outside the core switch is the border router. At this point, the hierarchy from the server to the data center boundary is clear.

On the core switch, there are often some security devices, such as intrusion detection, DDoS protection and so on. This is a barrier for the entire data center to prevent attacks from outside. There are often load balancers on the core switch, the principle has been mentioned in the previous chapter.

In some digital centers, for storage devices, there will also be a storage network to connect the SAN and NAS. However, for new cloud computing, traditional SAN and NAS are not used, but software-defined storage deployed on x86 machines is used. This storage is also a server, and it can be integrated into a rack with computing nodes, which makes Effective, there is no separate storage network.

So the network of the entire data center is shown below. 
Insert picture description here
This is a typical three-layer network structure. The third layer here does not refer to the IP layer, but refers to the three layers of the access layer, the convergence layer, and the core layer. This model is very useful for external traffic requests to internal applications. This type of flow is from outside to inside or from inside to outside. Corresponding to the picture above, it is from top to bottom, from bottom to top, from north to south, so it is called north-south traffic .

However, with the development of cloud computing and big data, there is more and more interaction between nodes. For example, big data computing often needs to copy data to different nodes. This needs to go through the switch to make the data from left to right. From right to left, left west to right east, so it is called east-west flow .

In order to solve the problem of east-west traffic, the Leaf Ridge Network (Spine / Leaf) has evolved .

  • Leaf switch (leaf) , directly connected to the physical server. The demarcation point of the L2 / L3 network is on the leaf switch, and above the leaf switch is a layer 3 network.
  • A spine switch is equivalent to a core switch. Multiple paths are dynamically selected between leaf ridges through ECMP. The spine switch is just providing a flexible L3 routing network for the leaf switch. The north-south traffic can be sent out directly from the spine switch, but through the switch parallel to the leaf switch, and then connected to the border router. 
    Insert picture description here
    The traditional three-layer network architecture is a vertical structure, while the leaf ridge network architecture is a flat structure, which is easier to expand horizontally.

summary

Well, that's it for complex data centers. Let's summarize, you need to remember these three key points.

  • The data center is divided into three floors. The server is connected to the access layer, then the aggregation layer, then the core layer, and the outermost border routers and security devices.
  • All links in the data center require high availability. The server needs to be bound with a network card, and the switches need to be stacked. Layer 3 devices can use equal-cost routing, and layer 2 devices can use TRILL.
  • With the development of Yunhe Daxiju, the east-west flow is more and more important than the north-south flow, so it evolves into a leaf ridge network structure.

Finally, I will leave you two questions:

  1. For the data center, high availability is very important, and each device must consider high availability. That high availability across the computer room, do you know what to do?
  2. The aforementioned activities of browsing news, shopping, downloading, and watching videos are all common users accessing resources in the data center through the public network. How should IT administrators access the data center?
Published 40 original articles · won praise 1 · views 5360

Guess you like

Origin blog.csdn.net/aha_jasper/article/details/105575506