Traffic protection practice based on cloud native gateway

background

In a distributed system architecture, each request will go through many layers of processing, such as from the ingress gateway to the Web Server to the call between services, and then to the service access cache or DB and other storage. In the traffic protection system in the figure below, we usually follow the principle of traffic funnel for traffic protection. At each layer of the traffic link, we need to carry out targeted traffic protection and fault-tolerant measures to ensure the stability of the service; at the same time, we need to pre-position the traffic protection as much as possible, for example, some HTTP request traffic The control is front-loaded to the gateway layer, and a part of the traffic is controlled in advance, so as to avoid excess traffic from hitting the backend, causing pressure on the backend and waste of resources. Therefore, it is very necessary to do traffic protection on the gateway side of.

In traditional traffic gateway scenarios, access control for traffic is a very common requirement. For example, in nginx, limit_req is the most common current limiting configuration, and in Envoy, it also supports local and global current limiting modes, but both have their limitations. In terms of richness of functions, the two are not as good as common open source projects of current limiting components, such as Sentinel, Hystrix, etc. In actual usage scenarios, the practicability is also very weak, for example, cluster current limiting without performance loss is not supported.

The traffic protection function of the cloud-native gateway uses the Sentinel kernel at the bottom layer, and has been strengthened and modified to a certain extent. Sentinel takes traffic and fault tolerance as the starting point, and helps ensure the stability of services and gateways from multiple dimensions such as traffic control, unstable call isolation, circuit breaker degradation, hotspot traffic protection, system adaptive protection, cluster flow control, etc., while providing Second-level traffic monitoring and analysis functions. Its commercialized products are not only widely used in Alibaba’s internal Taobao, Tmall and other e-commerce fields, but also have a lot of practice in Internet finance, online education, games, live broadcasting industries and other large government and central enterprises.

As a next-generation cloud gateway that integrates security, traffic, and microservices, the cloud-native gateway was given a position for use in all scenarios at the beginning of its birth. For this reason, traffic protection is also a necessary capability for it. In terms of traffic protection capabilities, it has the following advantages:

  • It has the same rich traffic protection functions as popular traffic protection projects such as Sentinel and Hystrix, and is still being updated iteratively.
  • Naturally supports amortized cluster flow control, so that users do not need to care about the number of gateways and upstream service nodes.
  • Provide supporting second-level monitoring, and support rich traffic indicators such as QPS, rejected QPS, abnormal QPS, RT, and concurrency. At the same time, it supports the viewing of historical data, and conveniently realizes the use path of observing first and then configuring protection rules.
  • Traffic protection rules take effect in seconds. After configuring the protection rules, there is no need to wait for them to take effect in seconds.

Introduction to Sentinel Traffic Model

As shown in the figure below, traffic protection refers to setting a suitable barrier strategy for different traffic. Under the observation of the barrier, once it is determined that the traffic cannot pass through, it should be intercepted in time, so as to protect the gateway and the back-end upstream. The role of the service.

The cloud-native gateway currently supports three different traffic protection capabilities: QPS current limiting, concurrency control, and fusing. This article will explain the specific effects and applicable scenarios of these three functions.

  • QPS current limiting

This is the most common scenario of traffic protection. As the name suggests, it is to limit the traffic of a certain route so that it can only access the gateway within a certain rate, so as to prevent the traffic surge of a certain route from causing the collapse of back-end services. The cloud-native gateway not only supports routing-level traffic limiting, but also naturally supports amortized cluster traffic control. Users do not need to care about the number of gateway nodes or the number of back-end service nodes. They only need to configure an overall threshold to easily implement traffic control for a certain The overall threshold current limit of a route.

  • concurrency control

The specific implementation of concurrency control is to maintain a concurrency value in real time (this value refers to the maximum parallel value of the routing traffic within one second, that is, the number of unfinished requests), once the next request exceeds the set threshold, Just intercept the request. This function is different from QPS current limiting. Even in low QPS scenarios, it can ensure that key resources will not be occupied by continuously accumulated slow calls, resulting in service unavailability, such as the thread pool of the back-end Upstream service and Database resources, etc., if they are occupied for a long time, will cause abnormalities in the upstream service. Similar to QPS current limiting, cloud-native gateways naturally support amortized cluster concurrent current limiting. You only need to configure an overall concurrency threshold to achieve overall concurrency control on a certain route.

  • fuse

This function can be seen in Sentinel, Hystrix and other current limiting projects, just like the literal meaning, fusing means that when there is an abnormal state in the traffic of the route, the traffic needs to be fused in time to ensure the connection with the route Relevant upstream services can run efficiently and stably without being affected by an abnormal routing traffic.

The fuse mechanism corresponds to the circuit breaker model (Circuit Breaker). When the call is in a certain unstable state (usually an exception or slow call) to a certain extent (usually focusing on the ratio rather than the absolute amount), the fuse is turned on (OPEN), and all requests will fallback; after a period of time, enter the detection recovery Phase (HALF-OPEN), let go of a certain number of requests, and use the status of these requests to indicate the recovery of downstream services. If these requests reach a steady state, restore the corresponding call (CLOSED); otherwise return to the fuse state, the specific principle As shown below:

In addition, the bottom layer of the cloud-native gateway’s traffic protection capability is based on Sentinel’s millisecond-level sliding window accurate statistics. For this reason, the cloud-native gateway’s traffic protection function interface is also equipped with a second-level monitoring system, which can detect problems through observation = > Create protection rules This path is used to better create traffic protection rules on cloud native gateways.

How to implement traffic protection on the cloud native gateway

QPS current limiting

Next, we will practice and use the above three traffic protection functions on the cloud native gateway.

First, enter the route configuration interface in the cloud-native gateway instance, select the "Limit" option in "Policy Configuration", and manually inject a traffic of about QPS 10,000 into this route, which can be seen in the supporting second-level monitoring for 5 minutes The QPS status of this route.

Below the second-level monitoring, you can see three configuration items: flow control rules, concurrency rules, and fuse rules. First, configure a flow-limiting rule. The specific parameters are shown in the figure below:

Turn on the switch and click the save button to successfully add a QPS flow-limiting strategy. The meaning of this strategy is that when the total QPS of routing traffic reaches 5000, the next incoming traffic in the statistics window will be rejected. Yes, return an HTTP packet, the return code is 429, the content is a text in JSON format, and the content is:

{
  "context": "just for test"
}

At this time, go back to view the second-level monitoring, and you can see the following graph:

concurrency control

The concurrency rules are also similar, except that the control value is changed from QPS to the number of concurrency. The specific reference configuration parameters are as follows:

 

Go back and view the second-level monitoring, and you can see the following results:

 

fuse

The configuration of the fuse rule is relatively complicated. For the specific meaning, please refer to the description of the configuration interface. The specific reference configuration is as follows:

 

The specific meaning of this rule is that within the 20-second statistical window, after the fifth request, the proportion of slow calls will be counted. Once the proportion exceeds 20%, the traffic of this route will be cut off immediately. The definition of slow calls is that the RT exceeds 1 ms request. After the configuration is complete, the monitoring performance is shown in the following figure:

 

The above example is just to demonstrate the effect. In the actual production environment, it is necessary to be more cautious in defining parameters such as the slow call ratio and the fuse duration, otherwise it may cause the overall backend service to be unavailable, which is a high-risk traffic protection function. . In addition to the proportion of slow calls, it also supports the judgment of the circuit breaker condition of the abnormal proportion. The definition of abnormality refers to the situation that 5XX occurs in the HTTP call.

Summarize

This article introduces in detail how to implement traffic protection on the cloud-native gateway, including what kind of protection rules to configure in different scenarios, and gives a detailed usage path, from which you can experience the traffic protection function of the cloud-native gateway compared to Advantages of current limiting function of other gateway products. As one of the core functions of the gateway on the cloud, we will continue to strengthen the traffic protection function in the future, and you are welcome to continue to pay attention to the MSE microservice engine product developments on the Alibaba Cloud official website.

Author: graffiti

Click to try cloud products for free now to start the practical journey on the cloud!

Original link

This article is the original content of Alibaba Cloud, and shall not be reproduced without permission

Redis 7.2.0 was released, the most far-reaching version Chinese programmers refused to write gambling programs, 14 teeth were pulled out, and 88% of the whole body was damaged. Flutter 3.13 was released. System Initiative announced that all its software would be open source. The first large-scale independent App appeared , Grace changed its name to "Doubao" Spring 6.1 is compatible with virtual threads and JDK 21 Linux tablet StarLite 5: default Ubuntu, 12.5-inch Chrome 116 officially released Red Hat redeployed desktop Linux development, the main developer was transferred away Kubernetes 1.28 officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/yunqi/blog/10101078