Exploration and practice of distributed full-link grayscale release

Head picture.png

Author|Gu Xin
Source| Alibaba Cloud Native Official Account

In the era of Internet finance, financial products and service models continue to innovate, and the demand for financial system capacity has increased sharply. In order to further meet the needs of improving operation and maintenance standards, and to improve service continuity. The Industrial and Commercial Bank of China (hereinafter referred to as ICBC) started the technical pre-research work for the transformation of the distributed architecture in 2014. After in-depth research on the open source microservice framework and technology selection, it determined the independent R&D and construction of a distributed service platform based on open source Dubbo, and Combining financial scenarios, ICBC has made more than 30 customizations on core capabilities such as service registration and discovery based on Dubbo to support the ultra-large-scale business scenario of more than 700,000 providers in a single registry. Distributed services, as the core capabilities of the distributed system, help ICBC's application architecture to transform to a distributed and service-oriented way, and carry the core banking system of the future open platform.

In a distributed system, due to the complexity of the link, the high technical threshold, and the high difficulty of landing due to the distributed full-link grayscale release, it has gradually become the difficulty for financial technology to achieve full-link grayscale release. ICBC has always been at the forefront of the industry in the construction of distributed systems, actively exploring distributed full-link grayscale publishing, and is committed to solving the full-link grayscale publishing capabilities across applications and services under a distributed architecture.

Industry traditional grayscale release

Grayscale release is an effective means to avoid release risks in the industry, and it can usually be implemented in several ways, such as blue-green deployment, rolling release, and grayscale release.

1. Blue-green release

Blue-green deployment refers to running two versions of applications at the same time. As shown in Figure 1, during blue-green deployment, the original version does not stop the service, and a set of new versions are directly deployed. After the new version is running normally, the traffic is switched to new version. However, blue-green deployment requires two sets of programs to be run at the same time during the upgrade process, and the hardware requirements are twice the daily requirements.

1.png
Figure 1 Blue-green deployment

2. Rolling release

Rolling upgrade means that during the upgrade process, not all new versions are started at the same time, but a new version is started first, and then an old version is stopped, and so on, until the upgrade is completed. However, the rolling upgrade has risks. After the rolling upgrade starts, the traffic will flow directly to the new version that has been launched, but the new version is not necessarily available, such as further testing to confirm. During the rolling upgrade, the entire system is in a very unstable state. If a problem is found, it is difficult to determine whether the problem is caused by the new version or the old version.

2.png
Figure 2 Rolling release

3. Grayscale release

The grayscale release starts a new version of the application first, but does not directly cut the traffic, but the testers conduct online tests on the new version. If there is no problem, you can import a small amount of user traffic to the new version, and then observe the running status of the new version and collect various runtime data. If you compare the new and old versions at this time, it is called A /B test. After confirming that the new version is working well, gradually import more traffic to the new version. During this period, you can continuously adjust the number of running server copies of the new and old versions to make the new version able to withstand more and more Large flow pressure. Until 100% of the traffic is switched to the new version, finally the remaining old version services are closed, and the grayscale release is completed. If problems are found in the new version during the grayscale release process (grayscale period), the traffic should be immediately switched back to the old version, so that the negative impact will be kept to a minimum.

3.png
Figure 3 Gray release

ICBC's exploration of corporate-level link gray release capabilities

ICBC started the IT architecture transformation project in 2015. The distributed system has covered more than 100 key applications. There are already tens of thousands of distributed service nodes. The average daily service call volume exceeds 6 billion, and the transaction peak value exceeds 100,000 TPS. The cluster processing capacity of the host performance capacity. As of 2019, various ICBC projects have implemented grayscale releases mainly through rolling upgrades, blue-green releases, and business switching.

With the transformation of IT architecture, the underlying architecture and platform systems of the services supported by the distributed system are becoming more and more complex, and the uncertain factors of production and operation have increased significantly compared with those of the mainframe, which puts forward higher requirements for the stable operation of the production system. ICBC has supported the distributed full-link grayscale release method in the first half of 2020, aiming to form a unified grayscale release specification for key product lines, key applications, and public support platforms in the complex and distributed scenarios. The line provides technical support for the full-link grayscale release capability.

1. Facing diverse financial business scenarios, build enterprise-level full-link grayscale capabilities

ICBC currently has nearly 1 billion accounts and handles nearly 200 million payment and settlement services through multiple channels every day, which requires extremely high system availability. Faced with different product lines, there is an urgent need for end-to-end full-link grayscale release to reduce the risk of version release. ICBC's full-link gray-scale release capability is achieved by coloring business traffic, combining multiple components such as soft load balancing, gateways, and service frameworks, to achieve colorized traffic routing by label, and supporting full-link gray-scale routing across applications and nodes Capabilities, and establish a gray release operation and maintenance monitoring system and management and control mechanism.

4.png
Figure 4 Full-link gray-scale flow of ICBC

2. Traffic label-level gray-scale routing capability to control financial business scenarios

Full-link gray-scale publishing adopts the method of label routing, and uses the soft load and service framework to identify the label in the dyed traffic and the label of the gray-scale environment node, so that the corresponding dyed traffic can only flow in the gray-scale environment of the corresponding label.

1) Grayscale traffic distribution of soft load

By identifying the gray-scale tags in the traffic, the soft load routes the gray-scale traffic to the gray-scale environment of the corresponding tag to realize the first-level distribution of the gray-scale traffic.

5.png
Figure 5 Soft load grayscale routing

2) Service framework gray-scale routing

After the gray-scale request traffic flows to the service-oriented node of the business layer, the subsequent traffic is managed by the service framework and transferred through the RPC (Dubbo) protocol. The label routing layer of the service framework will automatically identify whether the request carries a gray-scale traffic identifier, and Filter the specific grayscale environment and forward the request.

6.png
Figure 6 Service framework gray routing

3) Transparent transmission of gray label link

At the business service layer, the service framework is responsible for the delivery of grayscale labels. Dubbo provides an elegant implicit parameter mechanism to conveniently transfer some of the upstream and downstream marking and control messages, and realize the ability to be insensitive to business. In this mechanism, the ICBC microservice framework takes the gray label as an implicit parameter, and automatically sets the parameter in the request when the consumer initiates the request, so that the gray traffic carried in the link transfer process The degree of identification can be passed down layer by layer to achieve full-link gray-scale release capability.

7.png
Figure 7 Transparent transmission of gray scale logo

4) Gray downgrade guarantees the safe execution of business transactions

When the grayscale identifiers of all service nodes in the link cannot match the grayscale request identifier, the grayscale request is processed by the normal node in this environment, and the grayscale identifier can continue to be transmitted downstream. Ensure the high availability of the system and prevent transaction failures when the traffic cannot find the corresponding identification node.

8.png
Figure 8 Grayscale degradation

3. Summary

At present, ICBC has established a unified full-link grayscale release standard, which reduces the labor cost and difficulty of grayscale environment construction for each application to achieve grayscale release, improves R&D efficiency, and finally achieves consistent grayscale across applications and services Release ability. It has achieved full-link grayscale publishing capabilities in more than 20 applications such as aggregate payment business lines and mobile banking business lines.

Future outlook

With the continuous advancement of ICBC's IT architecture transformation, ICBC will continue to build a financial information system with dual cores of host and platform to ensure the stable operation of financial services and support the rapid growth of high-frequency business. With the construction concept of "openness, high capacity, easy expansion, controllable cost, safety and stability, and convenient R&D", actively promote technological innovation and management and control upgrades in the field of distributed full-link grayscale release, covering core transaction link scenarios of banks , Continuously improve the full-link grayscale release model, reduce application access costs, and improve the compatibility and adaptation capabilities of various components in full-link grayscale release to adapt to complex distributed financial transaction scenarios and provide strong support for the construction of smart banks.

Guess you like

Origin blog.51cto.com/13778063/2587658