Tencent's high-performance distributed routing technology appeared at APNet, the Asia-Pacific webinar

The Asia-Pacific Network Seminar (APNet) brings together the best researchers in the field of computer networks and systems in the Asia-Pacific region to share their latest research results and discuss current cutting-edge hot research issues. On August 3-4, 2020, the two-day 4th Asia Pacific Webinar (APNet'20) was held online. Tencent, Alibaba, Microsoft, Nvidia, Cisco, ByteDance, etc. shared their respective online Among the latest achievements in the field, Lu Jianchao, the architect of Tencent TEG Network Platform Department , gave a speech on Scalable and Flexible Routing Service for Tencent Cloud Access Network .

https://conferences.sigcomm.org/events/apnet2020/index.html

With the popularization of cloud computing, 5G, AI, etc., more and more customers deploy their services on Tencent Cloud, and Tencent Cloud has ushered in explosive growth in recent years. The rapid growth of customer connection requests and massive end-user visits pose new challenges to Tencent Cloud's access network.

On August 03, 2020, at the Asia Pacific webinar APNet, Tencent introduced the architecture and design concept of Tencent's Software Defined Router (SoftwareDefined Router) in detail, and how SDR solves the cloud in different access scenarios through software defined methods. Flexibility, scalability and high availability under network scale.

Tencent's access network mainly has three scenarios

1. Dedicated line access. By deploying dedicated line gateways at access points, large-scale enterprise customers’ own data centers can be connected to Tencent’s network nearby, providing customers with high bandwidth, low latency and high security.

2. VPN access, corporate branches access Tencent Cloud through the Internet, and access resources on the cloud in a low-cost manner.

3. End user access. Tencent deploys TIX (Tencent Internet Exchange) infrastructure in regional cores and POPs around the world to provide efficient channels for global end users to access resources on Tencent Cloud.

In the early days, when the scale of Tencent Cloud was still relatively small, Tencent's access network mainly realized interconnection with external networks by deploying traditional commercial routers or switches.

With the rapid development of Tencent Cloud in recent years, new challenges continue to emerge, including:

1. Ten million level routing table entries, 10T level forwarding performance, different dimensional capabilities can be scaled horizontally as needed

2. Network characteristics need to be iterated quickly to meet the interoperability and scheduling requirements of different access scenarios

3. Network Capex and Opex need continuous optimization

Since traditional commercial network equipment is not designed for cloud networks, it is gradually unable to support it in the face of new challenges under the scale of cloud networks. Mainly manifested in:

1. Binding of software and hardware vendors, high coupling, long feature iteration cycle

2. Performance and specifications cannot be flexibly expanded on demand

3. High cost  

Obviously, facing the scale of cloud network, we need a new system architecture. To this end, the Network Platform Department redesigned the network service architecture for cloud network scale based on the design principles of high scalability, ultra flexibility, high reliability, and high operation and maintenance .

The new architecture, which we call Software Defined Router (Software Defined Router), is based on the core concept of stripping complex network functions and features from network hardware and transferring them to general-purpose x86 servers. It faces cloud network requirements and uses software programming. Define the cloud network router.

Under the new architecture system, the overall network function is divided into overlay network and underlay network.

The Overlay network is further divided into four major functional components, Data Plane, Routing Plane, Control Plane and Orchestrator. Different functional components are deployed in different server clusters, and each component can be independently designed, maintained and upgraded according to its own characteristics and needs. At the same time, the model based on software programming greatly improves the flexibility of the network, and the development and iteration speed is increased by 10x compared with the traditional network, which can meet the diverse needs of customers faster and better.

The Underlay network uses low-cost box switch networking. The underlay network only needs to provide a simple IP Backbone function for connecting internal components and external networks. The underlay network is completely unaware of overlay services.

SDR's internal components efficiently synchronize routing, ARP, and static configuration information through a distributed message queue. At the same time, EA switches are deployed at the edge for interconnection with external networks. The EA switch works in the L2 network and is responsible for providing multiple types of interconnection ports (GE/10GE/00GE) to the external network and implementing L2 layer network isolation. Data Plane is responsible for high-performance data forwarding through the self-developed user mode protocol stack. Routing information is transferred between Routing Plane and external routers and internal routing components through the BGP protocol. Orchestrator and Control Plane are responsible for the synchronization of global or regional configuration, management, and operation and maintenance information in the roles of Global Controller and Local Controller, respectively. In addition, in order to efficiently synchronize the massive dynamic flow table information among SDR internal components, the Control plane also provides high-performance distributed message channels and distributed storage services.

As a connector for internal and external networks, the SDR under the new architecture, through the deployment of global Access Sites, completely connects the traditional network and the cloud network, and realizes the efficient interconnection of the external network and the internal network of Tencent. The integration and interoperability of intelligent network services provide great convenience.

SDR's natural software-defined characteristics make it a substantial upgrade compared to traditional networks in terms of flexibility, scalability, reliability, and operation and maintenance.

In terms of flexibility, the outbound traffic of the external network, through the Flex rules of SDR, can realize refined and flexible scheduling according to the needs of different customers and different services. Ingress traffic from the external network, through the ultra-large specification routing table, can achieve fine scheduling based on 32-bit IPv4 or 128-bit IPv6 granularity, supporting flexible migration and disaster recovery of internal gateways or services.

Security services such as FW/DDoS use SDR's Flexrules to guide the redirection of attacks or abnormal traffic and the return of cleaned traffic on demand. By deploying VxLAN between FW service and SDR, it supports local deployment of FW service or remote deployment. In terms of DDoS protection, SDR supports both software-based large-format forwarding entries and hardware-based ultra-large bandwidth forwarding capabilities.

In order to further improve performance, SDR introduced Tencent Smart Switch (TSS) to achieve hardware acceleration. TSS is a programmable switch developed by Tencent, which provides hardware T-level wire-speed forwarding capabilities and subtle-level low latency. TSS is defined as the offloading component of Data Plane, which customizes ASIC message processing behavior through a programmable language to form a general flow-based and LPM-based pipeline. Through the collaborative work with Data Plane and Control Plane, it provides services for different business scenarios. Hardware acceleration capability.

SDR currently provides 10Tbps forwarding capability, 10 million-level routing table, and 100k/s end-to-end routing update capability.

In terms of reliability, based on a redundant architecture design, the system is not affected when a single point of failure occurs in each component. At the same time, due to the complete decoupling between the components, the forwarding plane can provide Non-stop forwarding (NSF) capability to process messages normally when other components fail. On the routing plane, the SDR further splits the BGP function into a BGP speaker unit and a BGP route computation unit, which are deployed in different clusters. Among them, the BGP speaker unit is deployed with peer granularity, upgrades and rapid failure recovery, and provides Non-Stop Routing (NSR) capabilities.

In terms of operation and maintenance, SDR implements real-time detection and rapid fault isolation of different levels of cluster-level, server-level, and core-level through linkage with the Real-time Monitoring and Operating System (RMOS) system. At the same time, the SDR health status is monitored in real time based on rich logs and alarm information. In addition, SDR provides one-click isolation, fast route convergence, and cross-domain disaster recovery capabilities to ensure uninterrupted customer services from the perspective of the entire network.

In the future, SDR will provide end-to-end, real-time network quality detection and analysis capabilities for different levels of networks. Based on real-time, different-dimensional network quality, SDR will implement dynamic, multi-dimensional, and fine-grained traffic scheduling strategies. At the same time, SDR will integrate network simulation and network verification platforms to further enhance the reliability and operation and maintenance of the entire network.

In summary, for cloud-scale networks, SDR uses software and hardware decoupling, function decoupling, and software definition to build a new access network with ultra-high flexibility, scalability, operation and maintenance, and low cost for Tencent Cloud. .

Guess you like

Origin blog.csdn.net/Tencent_TEG/article/details/108138253