Analysis of Guava RateLimiter current limiting principle

Current limiting is one of the three tools to protect high-concurrency systems, the other two are caching and downgrading. Current limiting is used in many scenarios to limit concurrency and request volume, such as flash sales and protecting your own system and downstream systems from being overwhelmed by huge traffic.

The purpose of current limiting is to protect the system by limiting the rate of concurrent access/requests or limiting the rate of requests within a time window. Once the rate limit is reached, service can be denied or traffic shaping can be performed.

Commonly used current limiting methods and scenarios include: limiting the total number of concurrencies (such as database connection pools, thread pools), limiting the number of instantaneous concurrencies (such as nginx's limitconn module, which is used to limit the number of instantaneous concurrent connections, and Java's Semaphore can also implement this), Limit the average rate within the time window (such as Guava's RateLimiter and nginx's limitreq module, which limits the average rate per second); others include limiting the remote interface call rate and limiting the MQ consumption rate. In addition, the current can be limited based on the number of network connections, network traffic, CPU or memory load, etc.

For example, if we need to limit the number of concurrent method calls to no more than 100 (the number of concurrent calls at the same time), we can use the semaphore Semaphore to achieve this. But if we want to limit the average number of times a method is called within a period of time to no more than 100, we need to use RateLimiter. Basic algorithm of current limiting

Let’s first explain two basic algorithms related to current limiting: leaky bucket algorithm and token bucket algorithm.

In the picture above, we can see that just like a funnel, the amount of water coming in is like access traffic, and the amount of water going out is like our system processing requests. When the access flow is too large, water will accumulate in this funnel. If there is too much water, it will overflow.

The implementation of the leaky bucket algorithm often relies on queues. If a request arrives and the queue is not full, it is directly put into the queue, and then a processor takes out the request from the head of the queue at a fixed frequency for processing. If the request volume is large, the queue will be full, and new requests will be discarded.

The token bucket algorithm is a bucket that stores tokens with a fixed capacity, and tokens are added to the bucket at a fixed rate. There is a maximum limit for the number of tokens stored in the bucket. Once exceeded, they will be discarded or rejected. When traffic or network requests arrive, each request must obtain a token. If it can be obtained, it will be processed directly, and a token will be deleted from the token bucket. If the results are different, the request will be flow-limited and either discarded directly or waited in the buffer.

Comparison between token bucket and leaky bucket:

Token buckets add tokens to the bucket at a fixed rate. Whether the request is processed depends on whether there are enough tokens in the bucket. When the number of tokens reduces to zero, new requests are rejected; leaky buckets flow out at a constant fixed rate. The rate of incoming requests is arbitrary. When the number of incoming requests accumulates to the capacity of the leaky bucket, new incoming requests will be rejected;

The token bucket limits the average inflow rate, allowing burst requests, which can be processed as long as there are tokens. It supports taking 3 tokens or 4 tokens at a time; the leaky bucket limits the constant outflow rate, that is, the outflow rate is one Fixed constant value, for example, it always flows out at a rate of 1, instead of 1 once and 2 next time, so as to smooth the burst inflow rate;

The token bucket allows a certain degree of burst, while the main purpose of the leaky bucket is to smooth the outflow rate; Guava RateLimiter

Guava is an excellent open source project in the Java field. It contains some core libraries used by Google in Java projects, including Collections, Caching, Concurrency, Common annotations, and String operations. Many very useful functions for I/O operations.

Guava's RateLimiter provides token bucket algorithm implementation: smooth burst current limiting (SmoothBursty) and smooth warm-up current limiting (SmoothWarmingUp) implementation.

The class diagram of RateLimiter is shown above, where RateLimiter is the entry class, which provides two sets of factory methods to create two subclasses. This is in line with the suggestion in "Effective Java" to use static factory methods instead of constructors. After all, the author of the book is also the main maintainer of the Guava library. It is better to "eat" the two together. http:// RateLimiter provides two factory methods, which will eventually call the following two functions to generate two subclasses of RateLimiter.

static RateLimiter create(SleepingStopwatch stopwatch, double permitsPerSecond) {

RateLimiter rateLimiter = new SmoothBursty(stopwatch, 1.0 /* maxBurstSeconds */);

rateLimiter.setRate(permitsPerSecond);

return rateLimiter;

}

static RateLimiter create(

SleepingStopwatch stopwatch, double permitsPerSecond, long warmupPeriod, TimeUnit unit,

double coldFactor) {

RateLimiter rateLimiter = new SmoothWarmingUp(stopwatch, warmupPeriod, unit, coldFactor);

rateLimiter.setRate(permitsPerSecond);

return rateLimiter;

}

Smooth burst current limiting

Use the static method of RateLimiter to create a rate limiter and set the number of tokens placed per second to 5. The returned RateLimiter object can guarantee that no more than 5 tokens will be given within 1 second, and will be placed at a fixed rate to achieve smooth output. public void testSmoothBursty() {

RateLimiter r = RateLimiter.create(5);

while (true) {

System.out.println("get 1 tokens: " + r.acquire() + "s");

}

/**

* output: Basically it is executed once every 0.2 seconds, which is consistent with the setting of issuing 5 tokens per second.

* get 1 tokens: 0.0s

* get 1 tokens: 0.182014s

* get 1 tokens: 0.188464s

* get 1 tokens: 0.198072s

* get 1 tokens: 0.196048s

* get 1 tokens: 0.197538s

* get 1 tokens: 0.196049s

}

RateLimiter uses the token bucket algorithm to accumulate tokens. If the frequency of obtaining tokens is relatively low, it will not cause waiting and obtain tokens directly. public void testSmoothBursty2() {

RateLimiter r = RateLimiter.create(2);

while (true)

{

System.out.println("get 1 tokens: " + r.acquire(1) + "s");

try {

Thread.sleep(2000);

} catch (Exception e) {}

System.out.println("get 1 tokens: " + r.acquire(1) + "s");

System.out.println("end");

/**

* output:

* get 1 tokens: 0.0s

* end

* get 1 tokens: 0.499796s

* get 1 tokens: 0.0s

}

RateLimiter can handle sudden traffic because it accumulates tokens. In the code below, one request will directly request 5 tokens, but since there are accumulated tokens in the token bucket at this time, it is enough to respond quickly.

When there are not enough tokens issued, RateLimiter adopts lagging processing, that is, the waiting time required for the previous request to obtain the token is borne by the next request, that is, waiting in place of the previous request. public void testSmoothBursty3() {

RateLimiter r = RateLimiter.create(5);

while (true)

{

System.out.println("get 5 tokens: " + r.acquire(5) + "s");

System.out.println("get 1 tokens: " + r.acquire(1) + "s");

System.out.println("end");

/**

* output:

* get 5 tokens: 0.0s

* get 1 tokens: 0.996766s lag effect, need to wait for the previous request

* get 1 tokens: 0.194007s

* get 1 tokens: 0.196267s

* end

* get 5 tokens: 0.195756s

* get 1 tokens: 0.995625s lag effect, need to wait for the previous request

* get 1 tokens: 0.194603s

* get 1 tokens: 0.196866s

}

Smooth preheat current limiting

RateLimiter's SmoothWarmingUp is a smooth current limiting with a warm-up period. There will be a warm-up period after it is started to gradually increase the distribution frequency to the configured rate.

For example, in the example below, create an average token distribution rate of 2 and a warm-up period of 3 minutes. Since the preheating time is set to 3 seconds, the token bucket will not issue a token every 0.5 seconds at the beginning, but will form a smooth linear downward slope with increasing frequency, reaching the original setting within 3 seconds. frequency, it will be output at a fixed frequency from now on. This function is suitable for scenarios where the system has just started and needs some time to "warm up". For example, the system has just started and some hotspot data in redis has not been set up yet, so it cannot handle too many requests now.

public void testSmoothwarmingUp() {

RateLimiter r = RateLimiter.create(2, 3, TimeUnit.SECONDS);

while (true)

{

System.out.println("get 1 tokens: " + r.acquire(1) + "s");

System.out.println("end");

/**

* output:

* get 1 tokens: 0.0s

* get 1 tokens: 1.329289s

* get 1 tokens: 0.994375s

* get 1 tokens: 0.662888s The sum of the three times obtained above is exactly 3 seconds

* end

* get 1 tokens: 0.49764s normal rate 0.5 seconds for one token

* get 1 tokens: 0.497828s

* get 1 tokens: 0.49449s

* get 1 tokens: 0.497522s

}

Source code analysis

After reading the basic usage examples of RateLimiter, let's learn how it is implemented. Let’s first understand the meaning of several important member variables. http://SmoothRateLimiter.java

The relevant code of SmoothWarmingUp is as follows, and the relevant logic is written in comments. http:// SmoothWarmingUp, the waiting time is to calculate the area of the trapezoid or square in the picture above. As mentioned above, smooth preheating current limiting will form a smooth linear decreasing slope with increasing frequency. In fact, it means that the value corresponding to the hypotenuse of this trapezoid will become smaller and smaller, which is equivalent to sliding down. In this way, the trapezoidal area will become smaller and smaller per unit time, that is, the waiting time will become smaller and smaller.

long storedPermitsToWaitTime(double storedPermits, double permitsToTake) {

/**

* The portion of current permits that exceeds the threshold

double availablePermitsAboveThreshold = storedPermits - thresholdPermits;

long micros = 0;

/**

* If the current number of stored tokens exceeds thresholdPermits

if (availablePermitsAboveThreshold > 0.0) {

/**

* The number of tokens that are to the right of the threshold and need to be consumed

double permitsAboveThresholdToTake = min(availablePermitsAboveThreshold, permitsToTake);

/**

*The area of the trapezoid

* High* (Top* Bottom) / 2

* High is permitsAboveThresholdToTake, which is the number of tokens that need to be consumed on the right side

* The bottom is longer permitsToTime(availablePermitsAboveThreshold)

* 顶较短 permitsToTime(availablePermitsAboveThreshold - permitsAboveThresholdToTake)

micros = (long) (permitsAboveThresholdToTake

* (permitsToTime(availablePermitsAboveThreshold)

+ permitsToTime(availablePermitsAboveThreshold - permitsAboveThresholdToTake)) / 2.0);

/**

* Subtract the number of tokens that have been acquired to the right of the threshold

permitsToTake -= permitsAboveThresholdToTake;

}

/**

* The area in the stationary period is exactly the length times the width

micros += (stableIntervalMicros * permitsToTake);

return micros;

}

double coolDownIntervalMicros() {

/**

* The number of tokens added per second is warmup time/maxPermits. In this case, the number of tokens added during warmuptime will be

* is maxPermits

return warmupPeriodMicros / maxPermits;

}

Analysis of Guava RateLimiter current limiting principle

Guess you like