Tencent Cloud microservice platform TSF has a major upgrade in remote multi-active unitization capabilities

OSC invites you to party! 1028 Suzhou Yuan Founding Association, let’s hunt for treasures in the AI era

Introduction

The 2023 Tencent Global Digital Ecosystem Conference has concluded successfully on September 7-8. 40+ special events showcased Tencent’s latest cutting-edge technologies, core products, and solutions.

In the special session on microservices and message queues, Zhang Zhen, Product Manager of Tencent Cloud Microservice Platform TSF, gave a wonderful speech on "Tencent Cloud Microservice Platform TSF's Remote Multi-Active Unitization Capability Major Upgrade". This article reviews in detail the best practices of Tencent Cloud microservice unitization.

Overview of unitized architecture

What is unitization

Judging from the current service-oriented architecture, services under the traditional architecture are hierarchical. Each layer uses a different partitioning algorithm. Each layer has a different number of nodes. The upper-layer nodes randomly select the lower-layer nodes. This uncertainty will lead to the possibility of cross-region or cross-region when the upper-layer node accesses the lower-layer node. The cost of cross-region calls is very high. It is not only necessary to solve the problem of delay, but also to ensure data synchronization. Both of these points are very challenging in terms of technical implementation.

Then change the idea and design the calling path in advance so that the request can be called along the planned path. This design path can solve the above challenges. The emergence of unitized architecture follows this design. Under the unitized architecture, the access layer, service layer, and data layer use the same partitioning algorithm to achieve logical binding of computing resources and data resources, ultimately forming a Standardized processing unit.

Characteristics and types of units

After understanding what unitization is, let’s look at the characteristics and types of units.

Characteristics of the unit

1. Each unit includes a set of computing resources and a set of data resources, and uses the same rules for logical association. For example, they all use the same label.

2. Depending on the scale of system business, a system may plan multiple units, and a common number may be 4-12, or even more.

3. In principle, there is only one set of data resources within a unit.

Type of unit

In principle, the types of units are classified according to the different services carried within the unit. For example, those used to carry ingress traffic are gateways or access units, and those that process services are called business units. Those that can be divided into units, have their own data, and can complete all services without relying on other businesses are called standard business units. Businesses that cannot be split and have more reads and less writes are called local technical units. In addition, in the entire system, there are generally some configuration-type businesses that are dependent on many units and cannot be split. In this case, they are placed in the global unit.

It can be seen from this that if we want to use a unitized architecture, it is not a particularly easy task. A series of architectural planning and business transformation are required.

So, what are the benefits of unitization? This is about the value of unitization.

The value of unitization

Generally speaking, when the business scale gradually expands and the architecture complexity becomes higher and higher, the number of database connections, standardized expansion, cross-machine room stability and performance issues will gradually become prominent.

Database connection problem

Let’s first look at the issue of the number of database connections. In the context of cloud native, applications can easily achieve horizontal expansion, but each expanded instance will generate a number of connections to the database. As business volume increases, the upper limit of database connections often becomes a bottleneck for cluster expansion. When a distributed database is not used, this problem can be solved through unitization. After the data resources and logical resources are bound, the data resources of each unit are determined, and the computing resources are also determined. We can control the number of database connections through units, and achieve the overall expansion of the distributed system through unit expansion.

Distributed operation and maintenance and expansion issues

Let’s look at the second question – distributed operation and maintenance and capacity expansion. Generally, distributed systems are expanded through monitoring alarms and manual intervention. The timing and capacity of expansion need to be judged based on experience, and may not be accurate and timely. If a unitized architecture is adopted, standardized expansion in the unit dimension can achieve a neat and unified architecture and standardized operation and maintenance actions. It can also realize advance expansion planning based on the business volume of a unit, so that you can truly know what you are doing before operating and when operating. Uniform.

Cross-machine room performance issues

The third issue is cross-machine room performance issues. In microservice clusters, applications are usually stateless, which means that traffic will be distributed indiscriminately. When the access layer gateway distributes traffic to the service layer, cross-center access will occur, which greatly affects the system. stability and performance. If a unitized architecture is used, the unitized flow closed-loop feature can solve this problem very well.

The problem of multiple activities in different places

The last question - live more lives in a different place. When the architecture gradually evolves to multi-activity in remote locations, the above stability and performance issues will be infinitely amplified in off-site scenarios. Therefore, unitization is also an important solution to achieve multi-activity in remote locations.

Next, let’s take a look at the overall solution provided by Tencent Cloud for unitization.

Unitized architecture solutions

Tencent Cloud unit design concept

Through extensive practical experience, Tencent Cloud has summarized and refined the Tencent Cloud unit architecture, which is divided from top to bottom into the access layer, application layer, data layer and facility layer.

The access layer is responsible for receiving traffic, identifying traffic, and forwarding traffic. The identified traffic is forwarded to the corresponding unit in the application layer for processing. The units are split according to the customer dimension. In most scenarios, single customer transactions are processed in a closed loop within the unit, and a small number of cross-customer transactions are processed across units.

The unitized architecture of the entire system is a complex system engineering that covers comprehensive design at all levels. Tencent's products provide support for unitized atomic capabilities, and ISV partners implement service encapsulation based on atomic capabilities. Therefore, Tencent’s unitized design always adheres to the design and delivery concepts of open, lightweight, and flexible delivery.

Tencent Cloud application unit architecture

From the perspective of the overall core system application, we can divide the application into different units according to business logic. The top is the access unit ADU of the access layer, which is responsible for access and outgoing capabilities. At the application layer, it is divided into SDU, LDU and RDU according to the divisibility of services. The lowest layer is the public component GDU. During the process of unit transformation of the business, units can be defined and split according to this rule.

Access unit (ADU): Responsible for access and outgoing capabilities.

Standard processing unit (SDU): responsible for business processing capabilities.

Local Unit (LDU): Provides single-AZ shared service capabilities.

City-wide unit (RDU): Provides city-wide shared service capabilities.

Global Unit (GDU): Global type service in the remote multi-active architecture.

Introduction to Tencent Cloud Microservices Platform (TSF)

With the overall concept of unitization in mind, let's look at the best practices of unitization at the microservice level.

TSF: an out-of-the-box microservices platform

Tencent Cloud Microservice Platform (Tencent Service Framework, TSF) is a commercial PaaS platform framework compatible with various microservice architectures such as Spring Cloud and Service Mesh. It provides one-stop microservice full life cycle management capabilities, data-based operation support, and Multi-dimensional application and service governance. It has functions such as codeless transformation and migration, rapid business deployment, fine-grained service governance, rapid problem location and troubleshooting, and lightweight operation and maintenance.

TSF core characteristics and values - standardization and diversification

Use standardization

TSF provides standardized service access specifications, unified standard operation experience, unified registration and configuration center services, and standardized deployment management, which can bring customers a consistent experience in access, operation, and configuration.

Technology diversification

TSF is compatible with mainstream frameworks such as SpringCloud, Dubbo, Service Mesh, GRPC, and Spring Cloud.

Full stack of capabilities

The TSF microservice platform has a complete authority management and control system, provides diversified service governance capabilities, the technical system is autonomous and controllable, supports performance tuning and operation and maintenance troubleshooting, and has all-round service governance and full-cycle management capabilities, which can meet user requirements. The demands of microservice platform.

In recent years, as more and more customers have upgraded their architecture, transformed from single applications to microservice applications, and migrated from off-cloud to cloud, products that can cover the core requirements of development, testing, release, production and other stages have become Indispensable capabilities, TSF provides comprehensive management capabilities at each of the above stages, and is the first choice for many customers to purchase a one-stop microservice platform.

TSF has always been exploring high availability and unitization, and we are committed to continuously optimizing the underlying architecture to ensure the stability and reliability of the platform. At the same time, we are constantly researching and practicing high availability, providing standardized product capabilities, and bringing customers a more stable, reliable and pleasant experience.

TSF unitized product capabilities

TSF grows with customers and provides corresponding solutions at various stages of evolution from single center, multi-activity in the same city, three centers in two places to multi-activity in different places according to different customer demands. Nowadays, as large financial institutions are experimenting with remote multi-active unitized architectures, we have also released the TSF remote multi-active unitized product capabilities to assist users in the exploration and practice of unitized architectures.

Next, a brief introduction to several key stages will be given.

More life in the same city

TSF has long been capable of multiple services in the same city, which is also the high availability requirement of most microservice customers.

What is multi-life?

Multi-active in the same city is actually a multi-active solution under the cloud native architecture. We usually have multiple availability zones in the same region. For example, there are availability zones A and B. We deploy the same services, active-active in the same city and each other as backup.

The same-city active-active architecture is easier to implement, but it can only cope with computer room failures. When the entire region is down, the service is still unavailable. At this time, three centers in two places are needed.

Three centers in two places (unitized)

Generally speaking, there are two ways to achieve three centers in two places, remote disaster recovery and unitization. Both architectures are currently used by customers. The difference is that the unitization model can obtain some of the advantages brought by unitization. , such as unit grayscale, overall unit expansion, etc. However, since remote locations are used for disaster recovery, the problem of idle resources is serious.

Live more in a different place

The challenge of multi-activity in remote locations is to activate idle remote resources, so unitization is the best way to solve data synchronization and access delays in remote scenarios. Under remote multi-active unitization, services in multiple locations can provide business services. At the same time, due to the closed-loop characteristics of unit traffic, the probability of remote access is greatly reduced, and data does not need to be fully synchronized in real time, enabling cross-regional high-availability unitization. Living disaster recovery.

TSF unitized product capability matrix

In order to help our customers use unitized architecture, TSF has released corresponding product capabilities, covering the three core scenarios of unitized management, high availability disaster recovery, and efficient operation and maintenance.

Before implementing unitization, you first need to plan the architecture, design the number of units, and add unitized products, including unitized gateways at the access layer, microservice platforms at the application layer, message queues, and databases at the data layer. During implementation, unitized rules need to be configured and pushed to individual components. If unit grayscale is required, it also supports configuring grayscale units and grayscale rules.

In addition to basic capabilities, disaster recovery and observability under unitization are also the focus of unitization construction. In disaster recovery scenarios, TSF provides multiple disaster recovery capabilities such as disaster recovery unit configuration, one-click disaster recovery switching, and disaster recovery simulation drills. In terms of operation and maintenance, it supports unified unit monitoring and alarming, and monitors and tracks cross-unit or cross-computer room requests.

Below, let’s look at some core design points.

● Unitized routing

The client request carries the label and is routed to the unitized gateway, and the gateway calculates the target unit based on the label. TSF will identify whether the call is within a unit or across units, and then forward the request to the corresponding unit. The calling order for the same service is: first call within the unit, then call the center, and finally call the center in the same city.

●Unit grayscale

The above is the main process of unitized routing. Next, let’s look at the implementation of unit grayscale. In the standard microservice architecture, grayscale release mainly relies on framework capabilities. Through version isolation between applications, traffic can be sent to designated application version clusters according to version tags. Through traffic proportion control, a part of the requests in the production traffic can be processed by the grayscale application cluster.

In a unitized scenario, you can first set 1-2 grayscale units, and then clarify the grayscale dimensions. For example, common ones include specifying customer number or customer label grayscale, etc. Before the gateway performs unit routing calculation, the grayscale table is queried first. If the requested feature matches the grayscale rule, it is routed directly according to the unit defined in the table and forwarded to the corresponding grayscale unit to complete the unit grayscale.

● Unitized disaster recovery

Next, let’s look at unitized disaster recovery.

In a remote multi-active scenario, when a unit fails, the traffic of the unit needs to be switched to other units as soon as possible to ensure service continuity. You can configure the mutual backup unit table during architecture planning to establish the mutual backup relationship between units. When one of the units fails, routing is adjusted by updating the status and directing traffic to the backup unit. When a switchover occurs, the database will adjust the copy under the standby unit to the primary copy to provide services.

For example, as shown in this picture, when unit 1 fails, the traffic of unit 1 is deactivated, and the mutual backup unit information corresponding to unit 1 is obtained (unit 5). It waits for the master/standby switchover of the database to be completed, updates the global route, and forwards the traffic. Go to Unit 5. At this time, in addition to carrying its own business traffic, unit 5 also carries the business traffic of unit 1. When unit 1 recovers from the fault, the gateway layer routing is adjusted to unit 1 through switchback. In this way, unit switching and switchback during disaster recovery are completed.

Summarize

This article starts with unitization. It first introduces the concept and value of unitization, as well as Tencent's overall solution for unitization. Then it focused on the field of microservices, introduced Tencent's microservice platform TSF and TSF's product capabilities to support unitization, and focused on the three core scenarios of unit routing, unit grayscale and disaster recovery switching.

Riding on the horse to gallop, come to me, Daofu, to lead the way. The upgrade of the remote multi-active unit capability of Tencent Cloud microservice platform TSF is only a small step in product development. The microservices team hopes to polish the product better in the future to meet more customer needs.