2024Java eight-part essay-1-current-limiting

sidebar: heading
title: 限流算法介绍
category: 实践经验
tag:
  - 并发
head:
  - - meta
    - name: keywords
      content: 限流算法,令牌桶算法,漏桶算法,时间窗口算法,队列法
  - - meta
    - name: description
      content: Java常见面试题总结,让天下没有难背的八股文!

Current limiting algorithm

In most cases, we do not need to implement a current limiting system ourselves, but in actual applications, current limiting is a very subtle system protection method with many details, especially when the traffic is high, understand the current limiting system you are using. The current limiting algorithm will help you make full use of the current limiting system to achieve your business needs and purposes, and avoid some of the problems that may arise from using the current limiting system.

Token Bucket Algorithm

The token bucket algorithm refers to designing a container (i.e. "bucket"), and a component continues to run to add tokens to the container. The token can be a simple number, character or combination , or it can be just a count, and then when each request enters the system, it needs to receive a token from the bucket. All requests must have a token to enter the back-end system. When the token bucket is empty, the request is rejected; when the token bucket is full, no new tokens are added to it. The architecture of the token bucket algorithm is shown in Figure 1:

45329225fca2bc2246c4e835f8c8ce0e.png

The implementation logic of the token bucket algorithm is as follows:

First, there will be a threshold for the number of visits in a defined time window, such as 1,000 people per day, 5 requests per second, etc. The minimum granularity of the current limiting system is generally seconds. If it is smaller, it will become inaccurate due to implementation and performance reasons. Or unstable. Assuming that N requests are allowed within T seconds, the token bucket algorithm will cause the token adding component to add N tokens to the token bucket every T seconds.

Secondly, the token bucket needs to have a maximum value M. When the token adding component detects that there are already M tokens in the token bucket, the remaining tokens will be discarded. Reflected in the current limiting system, it can be considered as the instantaneous maximum flow allowed by the current system, but not the continuous maximum flow. For example, the maximum number of tokens in the token bucket is 100, and 10 new tokens will be added to it every second. When the tokens are full, a traffic of 100 TPS suddenly appears. This is bearable, but what if 100 TPS traffic for two consecutive seconds will not work, because the token adding speed is 10 per second, and the adding speed cannot keep up with the usage speed.

Therefore, for any current limiting system that uses the token bucket algorithm, we will notice that it requires two parameters during configuration:

  • Average threshold (rate or average)

  • Peak threshold (burst or peak)

Through the author's introduction, readers should realize that the peak threshold of the token bucket algorithm is specific and does not refer to the maximum traffic allowed by the current current limiting system. Because this description may make people think that as long as the traffic is below the threshold, this is not the case, because as long as the traffic is higher than the added speed for a period of time, problems will occur.

On the other hand, it is not easy for the token bucket algorithm's current limiting system to calculate the maximum traffic it supports, because the maximum traffic it can support in real time depends on the traffic changes during the entire time period, that is, the token stock, rather than just on the token stock. to an instantaneous amount of tokens.

Finally, when a component requests a token, the token bucket will randomly select a token and distribute it, and then remove the token from the bucket. Note that the token bucket will not perform other operations at this time, and the token bucket will never actively ask the token adding component to add new tokens.

The token bucket algorithm has a variant with the same idea but in the opposite direction, called the leaky bucket algorithm. It is an improvement of the token bucket and is widely used in commercial applications.

The basic idea of ​​the leaky bucket algorithm is to regard requests as water flow and use a bucket with a hole at the bottom to contain it. The rate at which water leaks out of the hole at the bottom is constant. When all requests enter the system, they will first enter this bucket and slowly flow through it. The bucket outflow is handed over to the background service. The bucket has a fixed size. When the water flow exceeds this size, excess requests will be discarded.

The architecture of the leaky bucket algorithm is shown in the figure:

8bb6e3255b304d45acdcf018cb81541f.png

The implementation logic of the leaky bucket algorithm is as follows:

  • First, there will be a container to store requests. The container has a fixed size M, and all requests will be stored in this container first.

  • The container will have a forwarding logic that occurs cyclically at a rate of N requests every T seconds.

  • When the number of requests in the container reaches M, all new requests are rejected.

So similarly, the configuration of the leaky bucket algorithm also requires two values: average (rate) and peak (burst). It’s just that the average value is used to represent the number of leaked requests, and the peak value represents the number of requests that can be stored in the bucket.

Note: The leaky bucket algorithm and the buffering current limiting idea are not the same thing!

The request is also placed in a container. The leaky bucket algorithm and buffering are not used for the same purpose and must not be confused. Their differences are as follows:

  • In the leaky bucket algorithm, requests stored in the bucket will be leaked to the back-end business server at a constant rate. In the buffering idea, requests placed in the buffer area will not be sent out until the back-end server is idle.

  • In the leaky bucket algorithm, the requests in the bucket should have been processed by the system. They are the expectations declared by the system to the outside world and should not be lost. In the buffering idea, the requests placed in the buffer area are only optimized for unexpected situations. , there is no mandatory requirement that these requests can be processed.

The leaky bucket algorithm and the token bucket algorithm are very close in ideology, but the implementation directions are exactly opposite. They have the following similarities and differences:

  • The token bucket algorithm replenishes the number of requests (tokens) that can be forwarded at a fixed rate, while the leaky bucket algorithm forwards requests at a fixed rate;

  • The token bucket algorithm limits the number by the budget, while the leaky bucket algorithm limits the number by the actual number of requests;

  • The token bucket algorithm can be accepted to a certain extent when there is an explosive increase in traffic, as is the leaky bucket algorithm. However, when the traffic explodes, the token bucket algorithm will cause the business server to directly bear this traffic, while the business server of the leaky bucket algorithm will What is felt is the same rate change.

Therefore, through the above comparison, we will find that the leaky bucket algorithm is slightly better than the token bucket algorithm, because the leaky bucket algorithm controls traffic more smoothly, while the token bucket algorithm will faithfully transfer traffic fluctuations to On the business server head.

The leaky bucket algorithm is widely used in Nginx and distributed current limiting systems such as the current limiting function of Redis. It is currently one of the most popular algorithms in the industry.

time window algorithm

The time window algorithm is a relatively simple and basic traffic limiting algorithm. Because it is relatively rough, it is not suitable for large websites with large traffic fluctuations or more sophisticated traffic control requirements.

Time window algorithms can be divided into two types based on the way to determine the time window:

  • fixed time window algorithm

  • Sliding time window algorithm

The fixed time window algorithm is the simplest. I believe that if readers who are new to the current limiting concept are asked to quickly design and implement a current limiting system, they can quickly think of this algorithm. This algorithm limits a request threshold within a fixed period of time. If it is not reached, the request will be allowed to pass. If the number threshold is reached, the request will be rejected. Proceed as follows:

  • First determine a starting time point, which is usually the time when the system starts.

  • Starting from the starting time point, according to our needs, set a maximum value M, start accepting requests and start counting requests from 0.

  • During the time period T, when the request count exceeds M, all remaining requests are rejected.

  • After the time period T is exceeded, the count is reset.

The idea of ​​the fixed time window algorithm is simple, but its logic is problematic. It is not suitable for services with large traffic fluctuations and fine control of traffic requirements. Let's look at the following example:

Assume that our time period T is 1 second and the maximum request value is 10. In the first second, the distribution of the number of requests is 1 request at the 500th millisecond and 9 requests at the 800th millisecond, as shown in Figure 3:

01e2e89fb87112989952b606c48c56a6.png

This is a reasonable distribution of requests for the first second.

At this time, within the 200th millisecond of the second second (that is, the 1200th millisecond of two seconds), another 10 requests came, as shown in Figure 4:

924cc3cce3b91f2bbaddfebd042370d4.png

It is still reasonable to look at the second second alone, but when the two time periods are connected together, problems arise, as shown in Figure 5:

ae618d9e5dca381a301c5491f000c09d.png

From 500 milliseconds to 1200 milliseconds, the back-end server received 20 requests in just 700 milliseconds, which obviously violated our original intention of hoping to receive up to 10 requests per second. This kind of traffic that is much larger than the expected traffic is added to the back-end server, which will cause unpredictable consequences. Therefore, people improved the fixed window algorithm and changed it to a time window algorithm that checks that any time period does not exceed the threshold of the number of requests: the sliding time window algorithm.

The sliding time window algorithm requires that when a request enters the system, it must look back to the past time period T, find the number of requests in it, and then decide whether to accept the current request. Therefore, the sliding time window algorithm needs to record the time point at which the request arrives within the time period T. Logic As shown in Figure 6:

8cfb839a63f17888451382b3cd30c501.png

The explanation is as follows:

1. Determine a starting time point, usually the time when the system starts, and record this point as the starting point of the time window. Then create an empty list as a record of the timestamps of the request within the time window.

2. When a request comes, use the current timestamp to compare whether it is within the starting point of the time window plus the T time period (from the starting point to the starting point + T is the time window).

  • If so, view the number of all requests recorded within the current time window:

    • If exceeded, the request is rejected.

    • If not, the request is added to the timestamp record and handed over to the backend business server.

  • If not, check the timestamp record, delete the record with the oldest timestamp, then update the starting point of the time window to the second oldest record time, then return to step 2 and check again whether the timestamp is within the time window.

Although the sliding time window has been improved, it is still not good at handling a large number of sudden requests within a certain period of time. The token bucket and leaky bucket algorithms allow the average request rate and the maximum instantaneous request rate to be specified, which is better than the time window algorithm. More precise control.

The time window algorithm can be improved by using multiple time windows. For example, you can set a 1 second time window current limit of 10 TPS and a 500 millisecond time window current limit of 5 TPS. Both run at the same time, thus ensuring more precise current limit control.

queue method

The queue method is very similar to the leaky bucket algorithm. They both put requests into an area and then the business server extracts the requests from it. However, the queue method uses a completely independent external system instead of relying on the current limiting system. The architecture of the queue method is shown in Figure 7:

b25ab0dd5fdee171bcd0004dee457f6b.png

Compared with the leaky bucket algorithm, the advantages of the queue method are as follows:

  • The business logic layer determines the speed at which requests are charged. The current limiting system, that is, the queue, no longer needs to pay attention to the traffic settings (for example, what is T, what is N, what is M, etc.), and only needs to focus on retaining the sent requests, while the business server has complete control over the pulling of messages. Decide the speed of request acquisition according to your own conditions, which is more free.

  • The business logic layer is completely protected, and services can be added to consume these requests. This method completely hides the business server behind the client, and the queue is responsible for all traffic, which can also better protect itself from malicious traffic attacks.

  • Queues can use more robust, mature services that are more complex than throttling systems but can withstand much greater traffic. For example, if the business server uses a message queue like Alibaba Cloud or AWS, the business server does not need to worry about capacity expansion, as long as the request does not have high real-time requirements. Because the business server uses cloud services, there is no need to worry about the expansion of one end of the queue. And because the message pull frequency and processing speed are freely determined, the pressure on its own expansion is not that great.

However, the biggest flaw of the queue method is that the server cannot directly communicate with the client, so it is only suitable for use cases where the client asks the business server to perform tasks and does not require a response. All services that require a substantial response from the client cannot be used. For example, if the service provided by the business server is a message sending service, then this model is fine, but if the client requests some user information, then this method is completely unfeasible.

This article is excerpted from "In-depth explanation of large-scale website architecture design"

Guess you like

Origin blog.csdn.net/weixin_45737584/article/details/134301978