Full-link grayscale practice based on cloud-native gateway

Author: Ni Haifeng (Hai Er)

foreword

With the continuous expansion of the enterprise scale, it is difficult for the traditional single application to further support the development of the business, and the iteration speed of the business has been difficult to meet the growth of the business. At this time, the enterprise will transform the application system into a micro-service to reduce the coupling of the business Improve the efficiency of development iterations and make development more agile.

The original vision of the micro-service system architecture is to improve the iterative efficiency of the business by reducing the granularity of the system. However, in the process of practicing the microservice architecture, especially after the number of services increases, the efficiency problems caused may be greater than the architectural dividends brought by the microservice architecture itself.

Publishing challenges under the microservice architecture

After the system is split into microservices, one of the business goals is to achieve high-frequency business delivery by reducing the service granularity. However, in the process of practicing the microservice architecture, the complete decoupling of upstream and downstream services can almost exist in an ideal state. A common situation is that frequent releases of changes to microservices usually lead to a large loss of business traffic, so R&D personnel have to make changes during low-peak business hours at night. And during the release process, the upstream and downstream teams involved must be on standby throughout the entire process, so that they can be repaired immediately after problems are discovered during the release phase, which greatly reduces the happiness of R&D personnel.

How to implement grayscale, observable, and rollback safety production capabilities to meet the demands of rapid iteration and small new verification under the condition of rapid business development is a problem that enterprises must face in the process of deepening micro-services.

This article will focus on the overall scheme of full-link flow control for Spring Cloud microservice applications in the Alibaba Cloud EDAS ACK environment. Through the full-link traffic control function, a traffic control environment can be quickly created to route traffic with certain characteristics to the target version of the application.

Principles of Grayscale Publishing Practices

Under the microservice architecture, the key to the practice of grayscale publishing lies in the three elements of layering, isolation, and compatibility . On this basis, it has the ability to observe business . Layering is the preliminary preparation before designing a grayscale release plan, and isolation and compatibility are the main means to achieve grayscale.

There are two ways to realize traffic isolation in full-link grayscale scenarios: physical environment-based isolation and logical environment-based isolation .

Based on the isolation of the physical environment, it is necessary to build a network-isolated and resource-independent environment for services that require grayscale, and deploy the grayscale version of the service in it. Since the formal environment is relatively isolated from the grayscale environment, the formal environment cannot access services in the grayscale environment. Therefore, even services and components that have not been updated in version also need to be deployed in the grayscale environment. In terms of implementation principles, the common blue-green deployment is one of the technical implementations. However, in a scenario where there are enough online services, the solution based on physical environment isolation is relatively inflexible, and will cause a large number of redundant nodes and additional resource overhead.

The core of the logical environment isolation solution is to color the traffic. When the traffic forwards the request on the call chain, the gray traffic that is dyed is identified through the gateway, various middleware and microservices, and the request is dynamically forwarded to the corresponding gray traffic. Version. Then make dynamic decisions based on rules. Therefore, when the version changes, the forwarding rules of the call link will also change in real time. Compared with building a set of physically isolated grayscale environments, dynamically adjusting policies based on logic can save a lot of resources and operation and maintenance costs, and can help developers realize more complex full-link control scenarios.

Label routing divides one or more service providers into the same group through labels, thereby restricting traffic to only flow in specified groups and achieving the purpose of traffic isolation. Label routing can be used as the capability basis for scenarios such as multi-version development and testing, multi-version traffic isolation of the same application, and A/B Testing. In fact, there are many other usage scenarios for label routing, such as full-link flow control, same-AZ priority, full-link stress testing, disaster recovery and multi-active, etc.

image.png

Finally, in specific engineering practice, not all components can effectively achieve fine-grained control of traffic through isolation. For example, stateful components such as databases, no matter from the perspective of implementation cost or technology, will not be implemented every time they go online. All rebuild a new set of table databases, and refresh the corresponding version of the SQL script after synchronizing the data. Therefore, the compatibility of the old and new versions in specific scenarios becomes a necessary precondition.

architecture analysis

The back-end architecture technology stack of this project is Spring Cloud Alibaba, which uses a complete set of Alibaba Cloud cloud-native best practices, including EDAS and MSE cloud-native gateways. The front-end application is developed with VUE, and its resources are all static resources. It can be known from the application architecture that static resources are provided externally by using Nginx as the HTTP service.

image

In this architecture design, the business requirements are as follows:

  1. The front-end to the back-end can perform fine-grained routing control according to different complex rules (such as the city in the header, UserID, etc.), and at the same time, when the grayscale version of the downstream service is abnormal or does not exist, it can be downgraded to the baseline service to process the request.

  2. Random grayscale release of online traffic according to a certain percentage.

  3. It supports marking messages in the message queue and consuming them by corresponding consumers.

  4. As much as possible zero code modification.

  5. It is necessary to group the messages in the message queue in gray scale and consume the messages by the corresponding Consumer.

  6. Observable problems with grayscale flow are required.

EDAS flow control

EDAS is a cloud-native PaaS platform for application hosting and microservice management. It provides full-stack solutions for application development, deployment, monitoring, operation and maintenance, and supports microservice operating environments such as Spring Cloud and Apache Dubbo.

image.png

On the EDAS platform, users can quickly deploy applications to various underlying server clusters through WAR packages, JAR packages, or images, and easily deploy baseline and grayscale versions of applications. At the same time, EDAS can be seamlessly connected to the service management capability of MSE, and can obtain advanced features such as zero-code intrusion, non-destructive application log-off and offline, canary release, and full-link traffic control without additional Agent installation.

The MSE cloud-native gateway is a new generation gateway launched by Alibaba Cloud. It combines traditional traffic gateways and microservice gateways to reduce resource costs and provide users with refined traffic management capabilities. It supports ACK container services, Nacos, Eureka, fixed addresses, FaaS and other service discovery methods support multiple authentication login methods to quickly build a security defense line, and provide a comprehensive and multi-perspective monitoring system, such as indicator monitoring, log analysis, and link tracking.

image.png

Using the EDAS micro-service governance capabilities and working with cloud-native gateways, multiple sets of logical environments can be easily used to achieve full-link grayscale. EDAS realizes the functional expansion of the development framework at compile time based on bytecode enhancement technology. This solution has no business perception, and can have full-link grayscale governance capabilities without modifying any line of business code.

Full link traffic grayscale

Traffic entrance: The entrance applies the traffic entrance in the microservice system, which corresponds to the MSE cloud-native gateway in this scenario.

Swimlane: A set of isolated environments defined for the same version of an application. Only the request traffic that meets the flow control routing rules will be routed to the marking application in the corresponding lane. An application can belong to multiple swimlanes, and a swimlane can contain multiple applications. There is a many-to-many relationship between applications and swimlanes.

Baseline environment: Unmarked applications belong to the baseline stable version of the application, that is, a stable online environment.

Traffic fallback: The number of services deployed in the swimlane is not required to be exactly the same as the baseline environment. When there are no other services in the swimlane that depend on the call chain, the traffic needs to be rolled back to the baseline environment and further routed back when necessary. Swimlanes corresponding to labels.

Lane Group: A collection of swim lanes. The role of the lane group is mainly to distinguish between different teams or different scenarios.

Ingress application: mark the traffic that conforms to the flow control rules with the corresponding gray mark, and make the traffic go to the corresponding application version in the downstream application. Since the actual usage scenario in this case is MSE cloud-native gateway + EDAS, its marking capabilities are all concentrated on the MSE cloud-native gateway.

image.png

As can be seen from the figure above, the user has created swim lane A and swim lane B respectively, which involve the two applications of the trading center and the commodity center, which are respectively label label 2, where the A swim lane diverts 30% of the online traffic, and the B swim lane 20% of the online traffic is diverted, and the baseline environment (that is, the unmarked environment) diverts 50% of the online traffic.

image.png

By configuring the annotation alicloud.service.tag: gray on the deployment to identify the grayscale version of the application, and register it with the tag in the registration center, open the full link swimlane (automatically colored by the traffic of the machine) on the grayscale version of the application, and support grayscale Grayscale traffic automatically adds grayscale x-mse-tag: gray tags, and forwards traffic with grayscale tags to target grayscale applications by expanding consumer routing capabilities.

Create lane groups and lanes

When creating a swimlane, you need to select the entry type. Currently, only the entry application deployed in EDAS is supported as the entry application of the swimlane. You need to add the baseline version and grayscale version involved in the swimlane to the applications involved in the swimlane group.

image.png

Create diversion lanes, define lane names, and configure lane flow control rules.

image

After the creation is complete, a new lane will be generated. Click the lane name to record the lane label value. This example is f2bb906.

image

Gray scale application deployment

Clone the application into a gray version through edas, and select all the applications that need gray.

image

Rename all grayscale applications. The naming rule is to add -gray after the baseline application name to distinguish it. Click OK and wait for the application cloning to complete.

image.png

When you see that all applications (including grayscale version applications) are running and the number of instances is normal, you can proceed to the next step.

image.png

Add lane group and lane

Go back to the full-link traffic control page, find the previously created swimlane group, click Edit, add the baseline and grayscale applications to the swimlane group, and then click OK.

image.png

image.png

Add a grayscale application to the created grayscale swimlane, find the grayscale swimlane, click Edit, add a swimlane application, and select the gray version of the application.

image.png

image.png

MSE Cloud Native Gateway Routing Configuration

In the left menu bar of the EDAS console, select Traffic Management - Application Routing - MSE Gateway Routing, and click Create Route.

image.png

Define the route name, select the MSE gateway instance, configure the associated domain name, matching rules, request method, etc. The route here is to the baseline application, and there is no need to configure the request header (that is, no need to match grayscale rules).

image.png

Select the EDAS registration center as the service source, and fill in the configuration of the baseline application a as the target. Note: If there is no optional service after selecting the application, you need to check the agent status on the k8s cluster.

image.png

Create MSE cloud-native gateway grayscale routing

Create another MSE gateway route for the grayscale version application, define the route name, select the MSE gateway, add the request header, key=tag, value=gray, and select the a application of the grayscale application version in the target service.

image

image

Add policy configuration to grayscale routing, add Header rules, where header key:x-mse-tag, header value: value is the swimlane label obtained in step 4, after adding the rule and turn on the status switch.

image

After the addition is complete, the corresponding base and gray routes will be published and launched.

image.png

After completing the routing policy configuration, set the fallback target service to the baseline service for the next-hop service route corresponding to the gateway.

image.png

message grayscale

Traffic entrance: The entrance applies the traffic entrance in the microservice system, which corresponds to the MSE cloud-native gateway in this scenario.

RocketMQ's subscription relationship (Subscription) is the rules and status configurations for consumers in the system to obtain and process messages. The subscription relationship is dynamically registered to the server system by the consumer group, and in the subsequent message transmission, message matching and consumption progress maintenance are performed according to the filtering rules defined by the subscription relationship. The subscription relationship of message queue RocketMQ is designed according to the granularity of consumer groups and topics, so a subscription relationship refers to the subscription of a certain consumer group to a certain topic.

  1. Subscriptions of different consumer groups to the same topic are independent of each other.

image.png

  1. Subscriptions of the same consumer group to different topics are also independent of each other.

image.png

In the domain model of message queue RocketMQ, the message is initialized by the producer and sent to the server of the message queue RocketMQ. The message is stored in the specified queue corresponding to the topic in the order of arriving at the message queue server, and the consumer then follows the specified subscription relationship from Obtain and consume messages in the message queue RocketMQ.

In actual business scenarios, messages under the same topic are often processed by multiple different downstream business parties. The processing logic of each downstream is different, and only focuses on the subset of messages required by its own logic. For such scenarios, RocketMQ's subscription relationship supports message matching and consumption maintenance in message transmission according to the subscription relationship definition filter rules. For example, when consumers are consuming messages, they can choose which messages in the topic to consume, and setting consumption filtering rules can efficiently filter the message collection required by consumers, and flexibly set different message acceptance ranges according to different business scenarios.

Message filtering is mainly implemented through the following key processes:

image.png

  • Producer: The producer pre-sets some attributes and tags for the message when initializing the message, and specifies the filtering target for subsequent consumption.
  • Consumer: In the initialization and subsequent consumption process, the consumer reports to the server which messages of the specified topic need to be subscribed to by calling the subscription relationship registration interface, that is, the filter condition.
  • Server: When a consumer obtains a message, it will trigger the dynamic filter calculation of the server. The message queue RocketMQ version server matches the expression of the filter condition reported by the consumer, and delivers the qualified message to the consumer.

RocketMQ supports two scenarios : Tag filter and SQL attribute filter . Since a message can only be tagged with one string type tag, it is more suitable for some simple filtering scenarios, and the latter is to set the K/V key-value pair as an attribute for the message by the producer, and set the filter expression of SQL92 syntax to filter multiple attributes , so it is more suitable for some complex filtering scenarios. In addition, Tag itself is also a system attribute of the message, so SQL filtering is also compatible with Tag filtering. In SQL syntax, the attribute name of Tag is TAGS.

After introducing the RocketMQ subscription mode, let's take a look at the implementation plan of message grayscale. The essence of grayscale release is to solve the problem of environmental isolation. Normal and grayscale versions are distinguished by different informant groups. Without changing the code and application configuration, the agent of the service management layer completes the processing logic of messages from sending to subscription. The overall solution idea is as follows:

image.png

1. Create the corresponding consumer group : first, the user needs to pre-create the corresponding consumer group _grayID for each current consumer group . Example: The group name corresponding to serviceA_group is service_A_group_grayID. (Remarks: The name must be created according to the specification of _gray, otherwise the agent cannot associate the service with the consumer group after obtaining the corresponding grayscale identifier, or directly report an error that the consumer group does not exist)

2. Message coloring: through the user-defined grayscale environment grouping, the corresponding service is colored and marked with an environment logo. When the service starts to send messages to the corresponding Topic normally, after hijacking the corresponding message through the agent of the service, put the custom tag putUserProerty(" gray_tag ", "$ gray mark"), and put the message into the corresponding Topic.

3. Dynamically establish a subscription relationship: when a consumer is added to the corresponding environment, it will be assigned to the corresponding consumer group based on the environment identifier. And dynamically establish a subscription relationship with Topic based on the consumer group.

4. Message filtering: Since the producer will push the messages of both the grayscale environment and the production environment to the same topic, different consumer groups need to consume based on the message filtering of the corresponding environment. In terms of technical implementation, RocketMQ can meet business scenarios based on Tag and message attributes. The reason why the custom message attribute is adopted in the scheme is more out of engineering practice considerations. Since the message Tag only supports adding a string-type message for a message, in some business scenarios, the Tag may be occupied by the application business scenario, resulting in an uncontrollable scenario. Therefore, by customizing the message attributes, the consumer's The agent side implements message filtering based on SQL92 syntax, which is relatively better in terms of fault tolerance, so it is recommended to use custom message attributes to implement message grouping.

The main advantage of this solution is that there is no high cost of use for managers and users, and there is no need to frequently modify the code and corresponding configuration information during the actual completion of the grayscale release process.

Message grayscale configuration

In the lane list, you can find the baseline and grayscale lanes of the created application, click the lane name to view the details of the lane, and get the lane label.

image

image

As shown in the figure above, the lane label of dev1 is 9e4be42 , and the lane label of dev2 is e396fee . Log in to the message service RocketMQ console, and create a grayscale consumer group based on the swimlane label obtained in the previous step. As shown in the figure, create consumer groups MyConsumerGroup_e396fee and MyConsumerGroup_9e4be42 respectively based on the lane labels obtained in step 3.

image

Select Application Management on the EDAS console to enter the application list. Find (message producers and consumers that need grayscale flow control) application B, application B-dev1, application B-dev2, application C, application C-dev1, application C-dev2, enter the application details page to deploy and add The following environment variables: All applications configure environment variables:

profiler.micro.service.mq.gray.enable=true

image

profiler.micro.service.mq.gray.enable=true
profiler.micro.service.mq.gray.cunsumer.base.excluded.tags=9e4be42,e396fee

Baseline application C additionally configures environment variables:

image

Note: The value of profiler.micro.service.mq.gray.cunsumer.base.excluded.tags is obtained by concatenating the lane tags obtained in step 3 with commas ",". If there is only one grayscale environment, it is not necessary to concatenate with commas. It needs to be modified according to the actual swimming lane.

Message Service Subscription Relationship Check

Log in to the message service RocketMQ console to view the subscription relationship of the consumer group, as shown in the following figure:

image

The default consumer group MyConsumerGroup is subscribed by the baseline application C and filters messages with gray scale.

image.png

image.png

The grayscale consumer groups MyConsumerGroup_e396fee and MyConsumerGroup_9e4be42 only subscribe to messages with corresponding grayscale labels.

Realize front-end grayscale based on client request

The usual way to implement the front-end grayscale strategy based on the client request IP is to configure it through nginx. When the user traffic reaches nginx, check the keywords in http_x_forwarded_for in the user request, and redirect the request to different front-end versions according to the value of the keyword. ,As shown below:

image.png

The Nginx configuration code is as follows:

set $canary_flag main;         # 定义目录变量
set $flag_page 0;            # 定义灰度条件的初始判断值
if ($http_x_forwarded_for ~ "(xxx.xxx.xxx.01|xxx.xxx.xxx.02|xxx.xxx.xxx.03")  {  # 判断来源IP,设置灰度条件的初始值
   set $flag_page "${flag_page}1";
}

In the actual project, based on the grayscale request of http_x_forwarded_for , since the corresponding request IP needs to be enumerated, it has a clear direction to the client, and it can mainly solve the scenario where the development and testing personnel do business verification in the production environment after the release is completed. .

The grayscale strategy based on IP also has its own limitations. In the production environment, it obviously cannot meet the scenarios that require real-time traffic at any time for business verification. A more general method can be judged by grayscale rules based on city and province information. In this solution design, the front-end site uses Alibaba Cloud CDN for static acceleration, and through the edge script of Alibaba Cloud CDN, the IP address requested by the client can be converted into the corresponding city information.

EdgeScript (ES for short) is a toolbox that can quickly implement CDN custom configuration. When the standard configuration on the CDN console cannot meet business needs, you can try to use EdgeScript to implement simple programming.

image

The edge script has built-in variables that CDN nodes can recognize, simple judgment statements, and provides a large number of functions packaged by Alibaba Cloud CDN for users to call directly. Through simple variable judgment and calling ready-made functions, most customized configuration requirements such as authentication, caching, speed limit, and request header increase and decrease can be met, and it can effectively solve customized configuration requirements that cannot be realized and business changes are not constant. agile problem.

image

The execution position of the edge script is shown in the figure. When the client request reaches the CDN node, the node gateway will process the request according to the standard configuration and edge script rules set on the console. Using the standard configuration on the CDN console as a reference, edge scripts can choose to take effect at the very beginning or at the end of request processing.

image

Traffic is marked based on provinces and cities, and the city where the requesting IP is located can be identified through the IP address database. The current link backend API uses Alibaba Cloud CDN.

image.png

The EdgeScript rule code is as follows:

cou = $ali_hook_ip_country_en
pro = $ali_hook_ip_region_en
city = $ali_hook_ip_city_en
regiont = concat(city,',',pro,',',cou)
region = lower(regiont)
add_req_header('X-Client-Ip-City',region)
add_rsp_header('X-Client-Ip-City',region)

Judge the value obtained by X-Client-Ip-City on Nginx, and when the grayscale rule is met, add the grayscale header to the request, and request the grayscale rule of the corresponding gateway.

Finally, in the MSE cloud-native gateway console, select the corresponding MSE cloud-native gateway, enter the routing management-routing configuration to find the corresponding route and edit it.

image.png

The request header (Header) is set to: x-client-ip-city regular match. guangdong. , as shown in the figure below:

image.png

Gray scale traffic observation and alarm

In the full-link grayscale release scenario, since there are two application versions in the production environment, it has high operation and maintenance complexity. In order to be able to identify traffic escape problems as early as possible, it is necessary to have the observability of grayscale traffic and the escape alarm capability. When unexpected traffic requests occur, developers can quickly analyze the escape situation through the monitoring view, and then Notify the corresponding functional team to deal with the alarm in a timely manner.

Based on the above-mentioned observable requirements, when the traffic passes through the MSE cloud-native gateway, the corresponding Header is configured in the routing configuration to meet the observable requirements for all traffic directions.

image

The specific judgment rule is: if there is grayscale traffic in the base route, or the grayscale route matches the baseline traffic, then it is considered that there is traffic escape.

Summarize

This article fully introduces two schemes based on physical environment isolation and logical environment isolation. Among them, a detailed analysis of the logical environment-based isolation scheme is made, and various technical points involved are introduced, and the landing scheme based on EDAS and MSE cloud-native gateway , and give relevant product configuration use cases.

Among them, MSE provides microservices with rich microservice-related capabilities such as cloud-native gateways, registration and configuration centers, and microservice governance. From the perspective of applications, EDAS deeply integrates and integrates each atomic capability of MSE, providing a microservice application management Best practice reference, everyone is welcome to experience and exchange.

Graduates of the National People’s University stole the information of all students in the school to build a beauty scoring website, and have been criminally detained. The new Windows version of QQ based on the NT architecture is officially released. The United States will restrict China’s use of Amazon, Microsoft and other cloud services that provide training AI models . Open source projects announced to stop function development LeaferJS , the highest-paid technical position in 2023, released: Visual Studio Code 1.80, an open source and powerful 2D graphics library , supports terminal image functions . The number of Threads registrations has exceeded 30 million. "Change" deepin adopts Asahi Linux to adapt to Apple M1 database ranking in July: Oracle surges, opening up the score again
{{o.name}}
{{m.name}}

Supongo que te gusta

Origin my.oschina.net/u/3874284/blog/10087338
Recomendado
Clasificación