[Distributed] - current limiting fuse downgrade

Limiting

Current limiting, as the name suggests, limits system traffic and prevents users from accessing system resources too much, even maliciously, such as malicious crawlers, DDOS, etc.; it also prevents the system from crashing due to excessive traffic, thus causing damage to system running resources. to an effective management

In a distributed system, nodes need to call each other. If a node in the call chain goes down, the entire link will be inaccessible, causing an avalanche problem and making the entire system unavailable. In order to solve this problem , to maintain the nature of high availability, common processing methods include: timeout return , the caller calls timeout and directly returns without invalid waiting; bulkhead mode limits the number of threads that each business can use. When multiple threads are blocked in the same service, Subsequent threads will not be able to call the service and will not be blocked; circuit breakers and downgrades will cause the called business to circuit breakers and downgrades; and current limiting .
The first three are more response measures after node downtime is discovered, while current limiting is more of a preventive measure.

Current limiting rules

  1. QPS and number of connections
  2. Transfer rate: Limit the transfer rate for users to download and access certain resources
  3. Black and white list: Add different IPs to the black and white list to implement IP-dimensional traffic limiting strategies.

Current limiting plan

In distributed systems, there are two mainstream current limiting solutions:

  1. Gateway current limiting: Set the current limiting operation at the entrance of all traffic, usually the gateway
  2. Independent middleware current limiting deploys the current limiting service to a separate server. Other nodes obtain the current limiting information from here and then determine whether to allow traffic or deny traffic.

Current limiting algorithm

Count current limit

Counting current limiting should be the simplest current limiting algorithm. For example, the system is limited to processing only 100 requests at the same time, then maintain a counter. When a request is accepted, the counter increases by one, and when the processing is completed, the counter decreases by one. Each time a request comes, it is first judged whether the value of the counter exceeds the threshold, and if so, it is rejected. ask

Depending on whether the system is a single system or a distributed system, it can be subdivided into stand-alone current limiting and distributed current limiting. For single-machine current limiting, you can use atomic integers in Java as counters; for distributed current limiting, you can use the incr command in Redis.

The disadvantage of the counting current limiting algorithm is that it only considers the threshold of traffic and does not consider the burstiness of traffic. The pressure of reaching 100 requests in one hour is completely different from the pressure of reaching 100 requests in 1 second. The sudden traffic in the latter puts great pressure on the system, so the traffic within a certain time interval needs to be limited. Size, introduce time window, such as the simplest fixed window current limit

fixed window

Compared with traditional counting and current limiting, the fixed window algorithm has the concept of a time window. Within each fixed time window, if the counter value is less than the request threshold, the request for access is allowed, and the counter is incremented by one; otherwise, it is not allowed. . The length counter of each time window is reset and the next time window is entered.

This algorithm does seem to ensure that the number of requests within each window does not exceed the threshold, but it cannot ensure current limiting in the interface area between the two windows. For example, the threshold is set to 1 s for each window, and then every second Up to 100 requests, then if there are 51 requests in the second half of the i-th window, the counter is reset when entering the i + 1-th window, and there are 51 requests in the first half of the i + 1-th window, then According to the algorithm, these 102 requests will be allowed, but in the same 1s time window consisting of the second half of the i-th window and the first half of the i + 1-th window, 102 requests were allowed. Allowed, does not meet the threshold requirements for throttling

Therefore, the time window cannot be fixed and a sliding window needs to be used.

sliding window

There is no need to record the starting boundary of each time window in the sliding window. Instead, when each request arrives, the length of the time window is subtracted from the timestamp, such as 1 s, and then the boundary of the window is dynamically obtained, and then the arrival time is determined. Check whether the number of requests within this window exceeds the threshold to determine whether to allow or deny the request. Therefore, the algorithm needs to save the arrival timestamp of each request, and at the same time, it needs to clear the arrival timestamps of outdated requests before the window length. If there are many requests within a window length, a certain amount of memory storage overhead will be required.

The problem with this approach is that:

  1. The computer's clock is limited by hardware, and there may be errors in time calculations, which cannot meet high timeliness requirements;
  2. The second is that this method can only meet the current limit within a certain length of time window, and cannot limit the traffic burst concentrated in a very short time interval, because the time accuracy of the system may not meet the requirements and cannot ensure smooth traffic . And sometimes there are multiple resource restrictions, such as no more than 100 requests in 1 s, and in order to resist high concurrency, no more than 5 requests in 10 ms, so the limit of a single time window cannot meet the demand. Of course, we can achieve the purpose of multiple restrictions by setting multiple windows at the same time and counting at the same time.

leaky bucket algorithm

In order to solve the problem of unsmooth traffic caused by traffic bursts , we set up a leaky bucket. When a request comes, it does not directly determine whether to allow or deny it, but first puts it into the bucket. If the number of requests stored in the bucket reaches the bucket's capacity, then Refuse subsequent requests, and then the leaky bucket periodically releases the requests in the bucket, which are handled by the backend service.

It can be seen that in this algorithm, no matter how high the rate of request generation is, the rate at which the back-end service processes it is fixed, thus smoothing the traffic, which is very similar to the message queue, cutting peaks and filling valleys. Here, a major theorem in computers - that all problems can be solved by adding an intermediate layer - comes into play again.

But absolute traffic smoothing is not necessarily a good thing . We can accept some sudden requests because they need to be processed as soon as possible to satisfy the user experience, as long as the system can run smoothly. For this, the token bucket algorithm is required

Token Bucket Algorithm

The token bucket algorithm also requires a leaky bucket, but what is put into the bucket is not the request, but the token. Tokens will be put into the leaky bucket regularly. If the number of tokens in the bucket exceeds the bucket capacity, subsequent tokens will be discarded. When a request comes, you need to ask the bucket for a token first. If the request is successful, it will be allowed to be processed, otherwise it will be rejected. This idea is very similar to a semaphore, which can control the number of objects that a certain resource is accessed at the same time.

When multiple requests burst, assuming there are sufficient tokens in the bucket, these burst requests can obtain tokens immediately and then be processed, unlike the leaky bucket algorithm which can only be processed at a fixed rate forever. Therefore, the token bucket algorithm performs better when dealing with burst traffic.

A problem with the token bucket algorithm is that when the system first starts running, there are no tokens in the bucket, so the initial request cannot obtain enough tokens and cannot be processed. However, when the system first starts running, it should be Sufficient resources are available to process the request. The solution is to preheat the token bucket at the beginning and put a few tokens in advance to ensure that requests when the system first starts running can be processed in time.

Related components

Hystrix

Hystrix is ​​a component in the SpringCloud framework that can use semaphores and thread pools to limit current

Nginx

Nginx's excellent proxy, routing and load balancing functions make it the first choice among gateways, and it also provides a current limiting function, so if you choose gateway current limiting, you can use Nginx. Nginx provides two current limiting methods, namely controlling the rate and controlling the number of concurrent connections. The algorithm used is the leaky bucket algorithm.

Sentinel

Sentinel is Alibaba’s open source comprehensive solution for service fault tolerance, supporting functions such as current limiting, circuit breaking and downgrading.

The current limit threshold for resources can be set asSWCorThreads per second

The default current limiting mode isdirect mode, the current limiting effect is fast fail. In addition, other current limiting modes are supported, such as association mode and link mode.
existassociation modeIn , for a certain resource, you can set its associated resources. When its associated resources reach the current limiting threshold, it itself will be limited. This operation is suitable for scenarios where application concessions of other resources are restricted in order to ensure the availability of certain important resources. For example, the order service has two interfaces for reading order information and writing order information. In high concurrency scenarios, there may be two interfaces. will occupy system resources. In this case, we may want to give priority to ensuring the use of the write information interface. Then we can use the association mode, enable the association mode for the read information interface, and set the associated resource to the write information interface, so that when writing As the number of requests for the information interface increases, the interface for reading information will be restricted, thus making system resources tend to write to the information interface to ensure the availability of the write information interface.
link modeLimit the flow based on the entrance of the specified link. Different resource interfaces may have different or the same entrance. Limiting the flow of a certain entrance will not affect the requests entering at other entrances.

Current limiting effect exceptfail fastThere are also Warm Up and waiting in line.
Warm UpIt mainly deals with the situation where the system has been in a low water level state for a long time and encounters a sudden surge in flow, directly raising the system to a high water level, which may cause the system to be overwhelmed. Avoid cold system overwhelm with cold starts and warm-ups. Based on the token bucket algorithm. Specific to the implementation, if a certain resource selects the current limiting effect of Warm Up, then the requested threshold will initially be the initial threshold divided by a cooling factor. The default is 3, that is, the initial threshold is only the threshold set by the user . one-third of the value , and then after the preheating time, the threshold value is gradually raised to the set threshold value, thereby completing the cold start .
Waiting in lineIt is mainly used to strictly limit the rate of requests passing through the system and pass them at a constant speed, which corresponds to the leaky bucket algorithm.

fuse

We know that in circuits, fuses are used to protect the circuit. When the circuit current is too large, the fuse will blow to avoid device damage. The same is true for fuses in application systems. Service circuit breaker means that the caller uses a circuit breaker as a proxy when accessing the service. The circuit breaker will continue to observe whether the status returned by the called service is success or failure. When the number of failures exceeds the set threshold, the circuit breaker opens and the request The service cannot be reached, thereby preventing the caller from blocking the calling process.

circuit breaker status

A circuit breaker has three states:

  1. CLOSED: Default state, the circuit breaker observes that the failure rate of the called service request does not reach the threshold, and the proxied service is considered to be in good condition.
  2. OPEN: The circuit breaker observes that the proportion of request failures has reached the threshold, so it considers the proxy service to be faulty, and turns on the switch so that the request no longer reaches the proxy service, but fails quickly.
  3. HALF OPEN: After the circuit breaker is opened, you need to try to restore access to the proxy service. At this time, you need to switch to the half-open state, and then request the proxy service to see if the service has returned to normal. If it is confirmed that the service has returned to normal, the circuit breaker will turn to the CLOSED state, otherwise it will turn to the OPEN state.

issues to consider

  1. How long is the duration of the circuit breaker set? That is, after this duration is exceeded, switch to HALF OPEN and try again.
  2. For different exceptions, different post-circuit processing logic may need to be defined.
  3. To record request failure logs for monitoring
  4. It is not necessary to wait until the circuit breaker duration has elapsed before retrying. You can consider proactive retry. For example, for circuit breakers caused by problems such as connection timeout that may be restored in a short period of time, you can use asynchronous threads to detect the network, such as telenet, to detect the network. Switch to HALF OPEN to retry when unblocked
  5. Set up a compensation interface so that operation and maintenance personnel can manually close the circuit breaker.
  6. When retrying, you can use previously failed requests to retry, but be sure to pay attention to whether the business allows this.

Downgrade

After the service is disconnected, subsequent requests are generally allowed to go through a pre-configured processing method. This processing method is a downgrade logic. Usually when the system has high concurrency, in order to ensure the normal operation of important core businesses, non-core and non-critical businesses are no longer allowed to occupy part of the resources normally and are downgraded, thereby freeing up system resources for core business execution.

Guess you like

Origin blog.csdn.net/Pacifica_/article/details/128281081
Recommended