SDN Practice of Alibaba Cloud Overlay: Architecture Design and Product Implementation

Abstract:  This article introduces Alibaba Cloud's SDN practice of overlay networks on the cloud, including the introduction of overlay architecture, the challenges encountered in building overlays, and overlay-based products.

        At the recent Open Network Summit in Los Angeles, Dr. Cheng Gang, senior technical expert of Alibaba Cloud Network, delivered a speech SDN Practice for Cloud Overlay Network: From Infrastructure to Products. The design and productization practice of Alibaba Cloud Overlay's SDN architecture are introduced in detail. The following is a translation of this speech, which I would like to share with you.

 

Problem background:

        Alibaba Cloud has developed very rapidly in the past two years. Its annual turnover has grown at a rate of 104%. In fiscal year 2018, its quarterly revenue has exceeded 3 billion yuan, and its domestic market share has reached 47.6%, making it a world leader. Public cloud service provider. Alibaba Cloud's global data center construction also includes major countries on all continents, and can provide cloud services to users all over the world.

        Alibaba Cloud Network is responsible for providing reliable, efficient and secure network services for such a fast-growing behemoth, and the most stressful moment comes from Taobao's Double 11 event every year. Alibaba Cloud is responsible for providing Taobao's Double 11 event. The elastic service of double 11, and Alibaba Cloud Network is responsible for the normal transmission of huge traffic during Double 11. There is a set of data that can be annotated for this traffic: the peak transaction volume of Double 11 is 325,000 orders/second, and the peak payment volume is 256,000 orders. per second, the total sales in one day is 168.2 billion, which is higher than the total sales in one week of the 2017 American online shopping festival Cyber ​​Week. Under such traffic pressure, the Alibaba Cloud network needs a reliable, efficient, elastic, and secure network architecture to complete current tasks and prepare for the future.

84811aed02881b70fbbc8efaa1f6b7017bc0c397

 

ff40d6306dc1421e815d4215aad368163285ae84

           

           

Alibaba Cloud Network Architecture:

        As shown in the figure below, the Alibaba Cloud network mainly consists of three layers: the underlying physical layer and its SDN controller, the overlay network layer and its SDN controller, and the application layer. Each layer consists of more detailed modules. Alibaba Cloud The physical layer network is one of the largest SDN networks in the world, which can be subdivided into components such as DC network, Alibaba metropolitan area network, Alibaba backbone network, etc. The controller of the physical layer is responsible for managing the physical layer network and provides APIs to the Overlay layer. To shield the details of various physical networks, the physical network will not be described in this article; the overlay network layer consists of gateways (Gateway), Alibaba Virtual Switch (AVS: Alibaba Virtual Switch), and load balancer (SLB: Server Load Balance) It is composed of network components such as the data plane of this layer, the control plane of the overlay network layer, and the management plane of the overlay network layer. The overlay network layer controller is responsible for managing each network component, providing a unified API to the products and services of the application layer, and calling the physical The controller of the overlay layer completes network functions; the application layer is mainly the major products of Alibaba Cloud Network, such as CEN and resource management and scheduling systems (such as Fuxi), which realize the use and operation of Alibaba Cloud Network by calling the northbound interface of the Overlay controller. schedule.

2fcc096bef11579ec128d7ef46bbb6c3a486970f

       Next, is one of the focuses of this article, the architecture description and component introduction of the Overlay network layer.           

Alibaba Cloud Overlay Network Layer Architecture, Components and Challenges

       The overlay layer of Alibaba Cloud Network is an SDN-based network architecture, which can be divided into three planes: data plane, control plane and management plane. The data plane of overlay is mainly composed of various network components, including SLB, AVS, Gateway, hybrid Gateway and other components, in order to improve the forwarding performance of these components, Alibaba Cloud network group has introduced many data plane acceleration technologies, such as user mode protocol stack technology, and the control plane of the overlay network is composed of a hierarchical controller system, which Mainly to ensure the scalability of the control plane, on each host, there is a host controller (host controller), in each region (region), there is a region controller (region controller), and finally, there is a Global controller. The host controller is responsible for obtaining the configuration scheme of the components of each data plane from the regional controller, the regional controller is responsible for the management and scheduling of the overlay network layer in the region, and the global controller is responsible for coordinating and scheduling each The overlay network resources of the region, especially the management and scheduling of global traffic; the management plane of the overlay network is a separate entity, responsible for collecting data from the logs, databases and other records of the other two planes of the overlay to facilitate on-site recovery and Exception debugging can also provide automated network management through data analysis and learning.

 

 

 

d9d41bf087e9badee1625568e9a7f2f24ae6a7be

       The following is a detailed introduction to the Overlay network control plane:

 

Control plane design of overlay network:

       In order to meet the needs of Alibaba Cloud's rapid expansion, the reliability of network services, and automated network management and control, the overlay network layer must ensure high scalability, high availability, and intelligence. The scheme adopted by the architecture design.

       Scalability design The underlying physical network of Alibaba Cloud Network has been in the process of rapid expansion, so the architecture of the overlay layer must have sufficient scalability. The design requirements require that the region controller of each region can support millions of virtual machines The number of virtual machines that each VPC can support is 100,000, which is a difficult goal to achieve. In the early stage when the order of magnitude of virtual machines and VPCs is not very large, the control plane of the overlay network can In order to directly push configuration information to the host controller, but with the rapid increase in the scale of the cluster, a buffer layer is added to the control plane of the Overlay network. The underlying data plane system can realize self-learning of some network information in the buffer layer, and the control plane It is also possible to decentralize some functions to the buffer layer for implementation, thereby reducing the function and load of the control surface.

e4998f5c843a65ddc7a66a4517b9aacdea811223

       High availability design :   In order to ensure the high availability of Alibaba Cloud network, the redundancy of functional components must be guaranteed. For example, if a region consists of three availability zones (AZ: Availability Zone), in order to ensure the gateway of this region, the SLB Availability, Alibaba Cloud Network will require these components to be deployed in three AZs and back up each other. Even if components in a certain area are unavailable, components in other areas will immediately fill the vacancy to ensure overall availability. Deployment of controllers Plural deployments will also follow similar high-availability rules.

 

3be218ff777f573c9e2c2059cd4250a31a751df2

        Intelligent management capability :  The management system of Alibaba Cloud Network can collect all logs and information records from each plane of the underlying physical network to each component plane of the Overlay network, and this is done in real time. Therefore, Alibaba Cloud Network can not only Timely detection of the occurrence of various abnormalities in the physical layer and the overlay layer is expected to be able to quickly locate the cause of the abnormality with the help of machine learning, and respond automatically if possible.

7898d1df5f8f3ac2cc15eaec56b89098fd8cf041

        The requirements for the component design of the overlay network include the following aspects: AVS needs to meet the requirements of high throughput, ultra-low latency, hot upgrade, live migration, etc. Gateway needs to meet high scalability, hot upgrade capability, and more than 5 million connections per second capacity, 30 million per second packet processing capacity, etc., the control plane needs to meet the requirements of one million VPC per region, one hundred thousand virtual machines per VPC, and the management capability of controlling one hundred thousand routers within 3 seconds.

        SDN can help the overlay network layer to achieve scalability, easy management, easy development, and then can develop a high-availability architecture, increase the function of intelligent management, and improve the overall performance of the network.

Alibaba Cloud Application Layer Products

        Based on the services provided by the physical layer and the overlay network layer, Alibaba Cloud Network provides many excellent products. The most representative product CEN (Cloud Enterprise Network) will be introduced below.

7a8eef4f3bcb9fc7ca031c0951437048b2519c5a

CEN:

       With the expansion of the service content of cloud service providers, the number of connection entities responsible for the cloud network is also increasing. Although it is still the user network, the World Wide Web and the cloud network that need to be connected, the number of connected entities, the connection method and the connection quality are not affected. The demand has been increasing. For Alibaba Cloud Network, the first-generation network initially provided only needed to meet the network connectivity from cloud services to the World Wide Web, but higher requirements soon emerged, so the second-generation network, VPC As a network entity on the cloud, in order to ensure the interconnection of all network entities, high-speed channels between VPCs also appear. However, this still cannot meet the needs of users, because users may have multiple networks using different connection methods. Therefore, Alibaba Cloud Network has designed and implemented the third-generation network, CEN, through the architecture based on the overlay network, which provides the pan-connection capability of the current network entity and the intelligent management and control capability of the network.

        The implementation method of CEN is also based on SDN. The following figure is an example. It is assumed that Alibaba Cloud users have built IDCs in multiple regions. At the same time, based on the application requirements of users, users in various regions use VPCs on Alibaba Cloud, so it is natural On the ground, users need to connect all IDCs and VPCs. The traditional method may take several months to complete, and the capital investment is not small. After the construction is completed, it is necessary to continue to maintain and operate. Using CEN is simple Through the dedicated line link that has been connected to the VPC, the user's IDC can directly complete the interconnection of all network entities in multiple regions. It can be completed quickly with simple settings, and it can also save subsequent maintenance and upgrade investment.

a724809c26c6981229f0f7030cf3a1d10e99e40e

        However, in the design and implementation of CEN, Alibaba Cloud Network has also faced many challenges. First, user IDC still hopes to access CEN services through traditional network protocols, which means that routing information is exchanged based on BGP, but for CEN For a global network, the traditional BGP protocol is not a very suitable choice. Secondly, the data plane components of the underlying Overlay network of CEN, due to their different types and functions, have obvious gaps in performance and throughput, for example, SLB and Gateway Because it is based on standard server software implementation, it allows high availability deployment, and also allows rapid development of new features, but their throughput is insufficient compared to IDC's standard switches. Therefore, when users require high-bandwidth services, Even if edge switches allow such services to be provided to users, the pressure on SLB or Gateway will be too great. Finally, as the CEN that connects all network entities, its high availability must be guaranteed.

        Architecture of CEN: The architecture of CEN  is  shown in the figure below. As a product under the Overlay network, CEN is designed based on the architecture of Overlay. In order to solve the problem of routing synchronization, CEN designs the bottom controller at the user access point to control the exchange between users and users. The routing BGP protocol makes the behavior of the database in and out of CEN controllable. At the same time, CEN introduces a regional controller and a global controller to ensure high availability to ensure the correct and controllable transmission of user data flow on CEN. Data plane The components of the overlay network layer ensure that the user's data can be transmitted correctly and efficiently. The management plane has similar functions to the management plane of the overlay network layer: real-time tracking of logs and status, providing data and a platform for intelligent network management.

a27a6fd4e48f7cac8c21a909f4d1040e0114766c

        The functions of the CEN control plane will also change adaptively with the application of new technologies. For example, in the figure below, Alibaba Cloud Network improves the network performance of the data plane components of the overlay network layer. At the same time, the CEN controller system can decentralize part of the control. Function to the underlying controller, through the controller to achieve small-scale path optimization and performance improvement.

cbde57b96b056c2560a45a810e953dc643b6ddd2 

 

Full text summary:

       Alibaba Cloud's Overlay network layer is based on SDN technology, which significantly improves scalability and high availability, supports rapid iteration of network technologies, and introduces network autonomy. , In terms of anti-malicious attack, Overlay network can also support.

Original link

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325027557&siteId=291194637