Ten minutes to understand Java current limiting and common solutions

Basic concept of current limiting

QPS and connection count control

Transmission rate

Black and white list

distributed environment

Common Algorithms for Current Limiting Schemes

token bucket algorithm

Leaky Bucket Algorithm

sliding window

Commonly used current limiting schemes

Nginx current limiting

middleware throttling

Current limiting components

Legitimacy verification current limit

Guawa current limiting

Gateway layer current limiting

Consider current-limiting design from the perspective of architecture

The specific means of implementing current limiting:
Tomcat current
limiting Basic concept
of current limiting For general current limiting scenarios, it has two dimensions of information:

Time limit is based on a certain time range or a certain time point, which is what we often call "time window", such as limiting the time window of every minute and every second.
Resources are based on the limitation of available resources, such as setting the maximum access Number of times, or the maximum number of available connections
Combining the above two dimensions, current limiting is to limit resource access in a certain time window, such as setting a maximum of 100 access requests per second. But in a real scenario, we will not only set one kind of current limiting rule, but will set multiple current limiting rules to work together. The main types of current limiting rules are as follows:

QPS and connection count control

For the current limit of the number of connections and QPS), we can set the current limit of the IP dimension, or set the current limit based on a single server.

picture

In a real environment, multi-dimensional current limiting rules are usually set, such as setting the access frequency per second of the same IP to less than 10, and the number of connections to less than 5, and then setting the maximum QPS of each machine to 1000, and the maximum number of connections to 200. Furthermore, we can treat a certain server group or the servers in the entire computer room as a whole, and set higher-level current-limiting rules. All these current-limiting rules will work together to control traffic.

Transmission rate

Everyone is familiar with "transfer rate", such as the download speed of resources. Some websites have a more detailed flow-limiting logic in this regard. For example, the download speed of ordinary registered users is 100k/s, and after purchasing a membership, it is 10M/s. Behind this is the flow-limiting logic based on user groups or user tags.

Black and white list

Black and white lists are very common means of current limiting and release in various large-scale enterprise applications, and black and white lists often change dynamically. For example, if an IP visits too frequently for a period of time and is identified by the system as a robot user or traffic attack, then this IP will be added to the blacklist to limit its access to system resources. Commonly known as "blocking IP".

The crawler programs we usually see, such as crawling pictures of beautiful women on Zhihu, or crawling time-sharing information of stocks in the brokerage system, these crawler programs must implement the function of changing IP to prevent being added to the blacklist.

Sometimes we also find that the company's network cannot access large public websites such as 12306. This is also because the outgoing IP of some companies is the same address, so when the number of visits is too high, this IP address will be recognized by the other party's system , and then added to the blacklist. Students who use home broadband should know that most network operators will assign users to different outgoing IP segments, or dynamically change the user's IP address from time to time.

The white list is easier to understand. It is equivalent to the gold medal bestowed by the emperor, and you can freely shuttle through various current limiting rules without hindrance. For example, some e-commerce companies will add the accounts of very large sellers to the white list, because such sellers often have their own operation and maintenance system, and need to connect with the company's IT system to do a lot of operations such as product release and replenishment. In addition, search the background of the official account programming technology circle and reply "Java" to get a surprise gift package.

distributed environment

Distributed is different from the single-machine current-limiting scenario. It considers all servers in the entire distributed environment as a whole. For example, for IP current limiting, we limit an IP to a maximum of 10 visits per second. No matter which machine the request from this IP falls on, as long as the service node in the cluster is accessed, the current will be limited. The constraints of the rules.

We'd better save the current limiting information on a "centralized" component, so that it can obtain the access status of all machines in the cluster. Currently, there are two mainstream current limiting solutions:

Gateway layer current limiting applies current limiting rules to the entrance of all traffic.
Middleware current limiting stores current limiting information in a middleware in a distributed environment (such as Redis cache), and each component can get the current data from here. Traffic statistics at all times, so as to decide whether to deny service or release traffic
sentinel, springcloud ecosystem tailors a component for distributed current limiting, fuse downgrade and other components for microservices.
Common algorithms for current limiting schemes
Token Bucket Algorithm

The Token Bucket token bucket algorithm is currently the most widely used current limiting algorithm. As the name suggests, it has the following two key roles:

The Request that gets the token will be processed, and other Requests will either be queued or discarded directly

The place where the bucket is used to hold tokens, all Requests obtain tokens from this bucket mainly involves 2 processes:

token generation

This process involves the token generator and the token bucket. We mentioned earlier that the token bucket is a place to hold tokens. Since it is a bucket, it must have a capacity, that is to say, the tokens that the token bucket can hold Quantity is a fixed value.

For the token generator, it will add tokens to the bucket according to a predetermined rate, for example, we can configure it to issue tokens at a rate of 100 requests per second, or 50 per minute. Note that the issuance speed here is uniform, which means that the 50 tokens are not issued at the beginning of each time window, but will be issued at a uniform speed within this time window.

The token dispenser is a faucet. If the bucket receiving water below is full, the water (token) will naturally flow outside. The same is true in the process of issuing tokens. The capacity of the token bucket is limited. If the tokens of the rated capacity are currently filled, the new tokens will be discarded.

Token Acquisition
After each access request comes, a token must be obtained to execute the following logic. If the number of tokens is small and there are many access requests, some of the requests will naturally fail to obtain tokens. At this time, we can set up a "buffer queue" to temporarily store these redundant tokens.

The buffer queue is actually an optional option, not all programs that apply the token bucket algorithm will implement the queue. When there is a cache queue, those requests that have not obtained tokens temporarily will be queued in this queue until a new token is generated, and then a request is taken from the head of the queue to match the token.

When the queue is full, these access requests will be discarded. In practical applications, we can also add a series of special effects to this queue, such as setting the survival time of requests in the queue, or transforming the queue into a PriorityQueue, sorting according to a certain priority instead of first in first out.

Leaky Bucket Algorithm

Leaky Bucket is another bucket. The current limiting algorithm is on the same level as the bucket. So what is the difference between a leaky bucket and a token bucket?

The first half of the leaky bucket algorithm is similar to the token bucket, but the objects of the operation are different. The token bucket puts the token into the bucket, while the leaky bucket puts the data packet of the access request into the bucket. Similarly, if the bucket is full, new incoming packets will be discarded.

The second half of the leaky bucket algorithm has distinctive features, it will always only flow data packets out of the bucket at a constant rate. For example, if I set up a leaky bucket to store 100 data packets, and the outflow rate is 1 second, then no matter what rate data packets flow into the bucket, and no matter how many data packets are in the bucket, the leaky bucket can guarantee these data Packets are always processed at a constant rate of 1s. In addition, search the backend architect of the official account and reply "the structure is neat" in the backstage, and get a surprise gift package.

The difference between leaky bucket vs token bucket
According to their respective characteristics, it is not difficult to see that both algorithms have a "constant" rate and an "indeterminate" rate. The token bucket creates tokens at a constant rate, but the rate at which access requests obtain tokens is "unfixed". Anyway, as many tokens as there are issued, wait until the tokens are gone. The leaky bucket processes requests at a "constant" rate, but the rate at which these requests flow into the bucket is "indeterminate".

From these two characteristics, the natural characteristics of the leaky bucket determine that it will not have burst traffic, even if 1000 requests per second arrive, then its access rate to the output of the background service will always be constant. The token bucket is different, its feature can "pre-store" a certain amount of tokens, so when dealing with burst traffic, all tokens can be consumed in a short time, and its burst traffic processing efficiency will be higher than that of the leaky bucket, but the orientation The pressure on the background system will also increase accordingly.

sliding window

For example, if we have 5 user visits every second, and 10 user visits within the 5th second, then the number of visits in the time window from 0 to 5 seconds is 15. If our interface sets the upper limit of visits in the time window to 20, then when the time reaches the sixth second, the sum of the counts in this time window becomes 10, because the grid of 1 second has exited the time window, so in The number of visits that can be received within the sixth second is 20-10=10.

Sliding window is actually a calculator algorithm. It has a remarkable feature. The longer the time window span, the smoother the current limiting effect. For example, if the current time window is only two seconds, and all access requests are concentrated in the first second, when the time slides back by one second, the number of counts in the current window will change greatly. Extending the time window can reduce the chance of this happening

Common current limiting schemes
Legitimacy verification current limiting

For example, verification codes, IP blacklists, etc., these means can effectively prevent malicious attacks and crawler collection;

Guawa current limiting

In the field of current limiting, Guava provides several current limiting support classes headed by RateLimiter under its multi-threading module, but the scope of action is limited to the "current" server, that is to say, Guava's current limiting is a stand-alone current limiting , There is nothing you can do across machines or jvm processes. For example, I currently have two servers [Server 1, Server 2], and both servers have deployed a login service. If I want to control the traffic of these two machines , For example, control the sum of the visits of the two machines within 20 per second. If you use Guava to do it, you can only independently control the visits of each machine <=10.

Although Guava is not a solution for distributed systems, as a simple and lightweight client-side current-limiting component, it is very suitable for explaining the current-limiting algorithm

Gateway layer current limiting

The service gateway, as the first checkpoint in the entire distributed link, undertakes all user access requests, so it is a good entry point to limit traffic at the gateway level. The path from top to bottom is as follows:

User traffic is forwarded from the gateway layer to the background service.
The background service takes over the traffic and calls the cache to obtain data
. If there is no data in the cache, the
traffic to access the database decreases layer by layer from top to bottom. The gateway layer gathers the most and most intensive user access requests. Followed by background services.

Then after the verification logic of the background service, some of the wrong requests are cleared, and the remaining requests fall on the cache. If there is no data in the cache, the database at the bottom of the funnel will be requested, so the number of requests at the database level is the smallest (compared to other components. Generally speaking, the database is often the link with the worst concurrency capability. Even if the MySQL of the Ali system has undergone a lot of transformation, the concurrency of a single machine cannot be compared with components such as Redis and Kafka.)

At present, the mainstream gateway layer includes Nginx represented by software, and gateway layer components such as Gateway and Zuul in Spring Cloud

Nginx current limiting

In the system architecture, the proxy and routing forwarding of Nginx is a very important function of the gateway layer. Due to the inherent lightweight and excellent design of Nginx, it has become the first choice of many companies. Nginx considers it from the gateway level , can be used as the front-end gateway to resist most of the network traffic, so it is also a good choice to use Nginx for current limiting. In Nginx, it also provides common policy configurations based on current limiting.

Nginx provides two current limiting methods: one is to control the rate, and the other is to control the number of concurrent connections.

control rate

We need to use limit_req_zone to limit the number of requests per unit time, that is, rate limit,

Because the current limit statistics of Nginx are based on milliseconds, the speed we set is 2r/s. In conversion, a single IP is only allowed to pass 1 request within 500 milliseconds, and the second request is allowed to pass from 501ms.

Although the rate control on the optimized version
is very accurate, it is too harsh in the production environment. In reality, we should control the total number of visits per IP unit time, instead of being accurate to milliseconds as above. We can use burst keywords turn on this setting

burst=4 means that each IP allows up to 4 burst requests

Control the number of concurrency

Use limit_conn_zone and limit_conn to control the number of concurrency

Among them, limit_conn perip 10 means that a single IP can hold up to 10 connections at the same time; limit_conn perserver 100 means that the server can handle a total of 100 concurrent connections at the same time.

Note: Only when the request header is processed by the backend, the connection is counted.
middleware throttling

For a distributed environment, it is nothing more than a place like a central node to store current-limited data. For example, if I want the access rate of the control interface to be 100 requests per second, then I need to save the number of requests received in the current 1s somewhere, and allow all nodes in the cluster environment to access. So what technology can we use to store this temporary data?

Then everyone can think that it must be redis. Using the expiration time feature of Redis, we can easily set the time span of current limit (for example, 10 requests per second, or 10 requests per 10 seconds). At the same time, Redis also has a special skill - script programming. We can write the current limiting logic into a script and implant it into Redis, so that the burden of current limiting is completely separated from the service layer. At the same time, Redis has powerful concurrency features and high The available cluster architecture can also well support the current-limited access of huge clusters. [reids + lua]

Current limiting components

In addition to the methods described above, there are currently some open source components that provide similar functions, such as Sentinel is a good choice. Sentinel is an open source component produced by Ali, and it is included in the Spring Cloud Alibaba component library. Sentinel provides a rich API for current limiting and a visual management console, which can easily help us manage current limiting

Consider current limiting design from the perspective of architecture
In real projects, not only one current limiting method is used, but several methods are often used in conjunction with each other, so that the current limiting strategy has a sense of hierarchy and achieves the maximum utilization of resources. In this process, the design of the current limiting strategy can also refer to the aforementioned funnel model, which is wide at the top and tight at the bottom. The design of the current limiting scheme for different parts of the funnel should focus on the high availability of the current components as much as possible.

Take the actual project I participated in as an example. For example, we have developed an interface for the product details page. Through mobile phone Taobao diversion, the access request on the app side will first pass through Ali’s mtop gateway. At the gateway layer, our current limit will be compared. Loose, wait until the request reaches the product details page service in the background through the gateway, and then use a series of middleware + current limiting components to perform more detailed current limiting control on the service

Specific means to achieve current limiting
1) Tomcat uses maxThreads to achieve current limiting.

2) Nginx's limit_req_zone and burst to achieve rate limiting.

3) The limit_conn_zone and limit_conn commands of Nginx control the total number of concurrent connections.

4) The time window algorithm can be realized with the help of Redis ordered collection.

5) The leaky bucket algorithm can be implemented using Redis-Cell.

6) The token algorithm can be implemented by solving Google's guava package.

It should be noted that the current limiting scheme implemented with Redis can be used in distributed systems, while the current limiting implemented by Guava can only be applied to a stand-alone environment. If you think server-side current limiting is troublesome, you can directly use container current limiting (Nginx or Tomcat) without changing any code, but the premise is that it can meet the business needs of the project.
Tomcat current limiting

The maximum number of threads of Tomcat version 8.5 is configured in conf/server.xml, maxThreads is the maximum number of threads of Tomcat, when the request concurrency is greater than this value (maxThreads), the request will be queued for execution, thus completing the purpose of current limiting .

Notice:

The value of maxThreads can be increased appropriately. Tomcat defaults to 150 (Tomcat version 8.5), but this value is not as big as possible. It depends on the specific server configuration. It should be noted that each thread needs to consume 1MB of JVM The memory space is used as a thread stack, and the more threads there are, the heavier the burden of GC.
Finally, it should be noted that the operating system has certain restrictions on the number of threads in a process. The number of threads in each process of Windows is not allowed to exceed 2000, and the number of threads in each process of Linux is not allowed to exceed 1000.

おすすめ

転載: blog.csdn.net/weixin_40379712/article/details/129829669