Current Limiting: Principles and Actual Combat of the Three Algorithms of Counter, Leaky Bucket and Token Bucket (The Most Complete in History)

Limiting

Current limiting is a common interview question in interviews (especially interviews with big factories and high-P interviews)

insert image description here

Note: This article is continuously updated in PDF. For PDF files of the latest Nien architecture notes and interview questions, please go to the "Technical Freedom Circle" official account at the end of the article to obtain

Why limit current

simply put:

Current limiting is used in many scenarios to limit concurrency and request volume, such as flash sales, protecting your own system and downstream systems from being overwhelmed by huge traffic, etc.

Take Weibo as an example. For example, when a certain star announces his relationship, the number of visits has increased from the usual 500,000 to 5 million. The planning capability of the system can support a maximum of 2 million visits , so the current limiting rule must be implemented to ensure that it is a usable state, so that the server will not crash and all requests will be unavailable.

reference map

System architecture knowledge map (a system architecture knowledge map worth 10w)

https://www.processon.com/view/link/60fb9421637689719d246739

The architecture of seckill system

https://www.processon.com/view/link/61148c2b1e08536191d8f92f

Limiting thoughts

Increase the number of people entering as much as possible while ensuring availability, and the rest of the people are waiting in line, or returning friendly reminders to ensure that the users inside the system can use it normally and prevent the system from avalanche.

In daily life, what places need to limit the flow?

For example, there is a national scenic spot next to me. There may not be many people going there at ordinary times, but it will be overcrowded when it comes to May 1st or the Spring Festival. At this time, the management personnel of the scenic spot will implement a series of policies to limit the flow of people. Why should the flow be restricted
?

If the scenic spot can accommodate 10,000 people, and now 30,000 people have entered, there will be crowds of people, and accidents will happen if it is not done well. The result is that everyone’s experience will be bad. If an accident occurs, the scenic spot may have to be closed, resulting in It is not available to the outside world, and the consequence of this is that everyone feels that the experience is terrible.

Current Limiting Algorithm

There are many current limiting algorithms, and there are three common types, which are counter algorithm, leaky bucket algorithm, and token bucket algorithm. The following will explain them one by one.

Current limiting methods usually include counters, leaky buckets, and token buckets. Pay attention to the difference between current limiting and rate limiting (all requests will be processed), depending on
the business scenario.

(1) Counter:

Within a period of time (time window/time interval), the maximum number of processing requests is fixed, and the excess will not be processed.

(2) Leaky bucket:

The size of the leaky bucket is fixed, and the processing speed is fixed, but the incoming speed of requests is not fixed (when there are too many requests in emergencies, too many requests will be discarded).

(3) Token Bucket:

The size of the token bucket is fixed, and the speed of token generation is fixed, but the speed of consuming tokens (that is, requests) is not fixed (it can deal with some situations where there are too many requests at certain times); each request will take tokens from the token bucket card, if there is no token, the request is discarded.

counter algorithm

Counter current limit definition:

Within a period of time (time window/time interval), the maximum number of processing requests is fixed, and the excess will not be processed.

Simple and crude, such as specifying the size of the thread pool, specifying the size of the database connection pool, the number of nginx connections, etc., all belong to the counter algorithm.

The counter algorithm is the simplest and easiest algorithm in the current limiting algorithm.

For example, for example, we stipulate that for the A interface, the number of visits in one minute cannot exceed 100.

Then we can do this:

  • At the beginning, we can set a counter counter. Whenever a request comes, the counter will increase by 1. If the value of the counter is greater than 100 and the interval between the request and the first request is still within 1 minute, Then it means that there are too many requests and access is denied;
  • If the interval between the request and the first request is greater than 1 minute, and the value of the counter is still within the current limit range, then reset the counter, which is as simple and rude as that.

img

Realization of Calculator Current Limit

package com.crazymaker.springcloud.ratelimit;

import lombok.extern.slf4j.Slf4j;
import org.junit.Test;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;

// 计速器 限速
@Slf4j
public class CounterLimiter
{
    
    

    // 起始时间
    private static long startTime = System.currentTimeMillis();
    // 时间区间的时间间隔 ms
    private static long interval = 1000;
    // 每秒限制数量
    private static long maxCount = 2;
    //累加器
    private static AtomicLong accumulator = new AtomicLong();

    // 计数判断, 是否超出限制
    private static long tryAcquire(long taskId, int turn)
    {
    
    
        long nowTime = System.currentTimeMillis();
        //在时间区间之内
        if (nowTime < startTime + interval)
        {
    
    
            long count = accumulator.incrementAndGet();

            if (count <= maxCount)
            {
    
    
                return count;
            } else
            {
    
    
                return -count;
            }
        } else
        {
    
    
            //在时间区间之外
            synchronized (CounterLimiter.class)
            {
    
    
                log.info("新时间区到了,taskId{}, turn {}..", taskId, turn);
                // 再一次判断,防止重复初始化
                if (nowTime > startTime + interval)
                {
    
    
                    accumulator.set(0);
                    startTime = nowTime;
                }
            }
            return 0;
        }
    }

    //线程池,用于多线程模拟测试
    private ExecutorService pool = Executors.newFixedThreadPool(10);

    @Test
    public void testLimit()
    {
    
    

        // 被限制的次数
        AtomicInteger limited = new AtomicInteger(0);
        // 线程数
        final int threads = 2;
        // 每条线程的执行轮数
        final int turns = 20;
        // 同步器
        CountDownLatch countDownLatch = new CountDownLatch(threads);
        long start = System.currentTimeMillis();
        for (int i = 0; i < threads; i++)
        {
    
    
            pool.submit(() ->
            {
    
    
                try
                {
    
    

                    for (int j = 0; j < turns; j++)
                    {
    
    

                        long taskId = Thread.currentThread().getId();
                        long index = tryAcquire(taskId, j);
                        if (index <= 0)
                        {
    
    
                            // 被限制的次数累积
                            limited.getAndIncrement();
                        }
                        Thread.sleep(200);
                    }


                } catch (Exception e)
                {
    
    
                    e.printStackTrace();
                }
                //等待所有线程结束
                countDownLatch.countDown();

            });
        }
        try
        {
    
    
            countDownLatch.await();
        } catch (InterruptedException e)
        {
    
    
            e.printStackTrace();
        }
        float time = (System.currentTimeMillis() - start) / 1000F;
        //输出统计结果

        log.info("限制的次数为:" + limited.get() +
                ",通过的次数为:" + (threads * turns - limited.get()));
        log.info("限制的比例为:" + (float) limited.get() / (float) (threads * turns));
        log.info("运行的时长为:" + time);
    }


}

Serious problem with counter current limiting

Although this algorithm is simple, there is a very fatal problem, that is, the critical problem. Let's look at the picture below:
img

From the figure above, we can see that if there is a malicious user who sends 100 requests in an instant at 0:59, and sends another 100 requests in an instant at 1:00, then in fact, within 1 second, the user 200 requests were sent in an instant.

What we just stipulated is a maximum of 100 requests per minute (planned throughput), that is, a maximum of 1.7 requests per second. Users can instantly exceed our rate limit by making burst requests at the reset node of the time window.

Users may use this loophole in the algorithm to overwhelm our application in an instant.

Note: This article will continue to be updated. For more PDFs of the latest Nien 3 notes, please get them from the link below: Code Cloud

Leaky Bucket Algorithm

The basic principle of the leaky bucket algorithm current limiting is: water (corresponding to the request) enters the leaky bucket from the water inlet, and the leaky bucket discharges water at a certain speed (request for release). When the water inflow speed is too large, the total water volume in the bucket is greater than the bucket The capacity will be overflowed directly, and the request will be rejected, as shown in the figure.
The general flow-limiting rules of the leaky bucket are as follows:
(1) The water inlet (corresponding to client requests) flows into the leaky bucket at any rate.
(2) The capacity of the leaky barrel is fixed, and the water discharge (release) rate is also fixed.
(3) The capacity of the leaky bucket remains unchanged. If the processing speed is too slow, the amount of water in the bucket will exceed the capacity of the bucket, and the water droplets that flow in later will overflow, indicating that the request is rejected.

Leaky Bucket Algorithm Principle

The idea of ​​the leaky bucket algorithm is very simple:

Water (request) first enters the leaky bucket, and the leaky bucket flows out at a certain speed. When the water inflow speed is too high, it will overflow the bucket directly.

It can be seen that the leaky bucket algorithm can forcibly limit the data transmission rate.

2002319-20210220223842536-838208163

The leaky bucket algorithm is actually very simple. It can be roughly regarded as the process of water injection and leakage. Water flows into the bucket at any rate and flows out at a certain rate. When the water exceeds the capacity of the bucket, it is discarded, because the capacity of the bucket remains unchanged . The overall speed is guaranteed.

Water flows out at a certain rate,

insert image description here

Peak clipping : When a large amount of traffic enters, overflow will occur, so that the current limiting protection service is available

Buffering : not directly requesting to the server, buffering pressure

The consumption speed is fixed because the computing performance is fixed

Leaky Bucket Algorithm Implementation

package com.crazymaker.springcloud.ratelimit;

import lombok.extern.slf4j.Slf4j;
import org.junit.Test;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;

// 漏桶 限流
@Slf4j
public class LeakBucketLimiter {
    
    

    // 计算的起始时间
    private static long lastOutTime = System.currentTimeMillis();
    // 流出速率 每秒 2 次
    private static int leakRate = 2;

    // 桶的容量
    private static int capacity = 2;

    //剩余的水量
    private static AtomicInteger water = new AtomicInteger(0);

    //返回值说明:
    // false 没有被限制到
    // true 被限流
    public static synchronized boolean isLimit(long taskId, int turn) {
    
    
        // 如果是空桶,就当前时间作为漏出的时间
        if (water.get() == 0) {
    
    
            lastOutTime = System.currentTimeMillis();
            water.addAndGet(1);
            return false;
        }
        // 执行漏水
        int waterLeaked = ((int) ((System.currentTimeMillis() - lastOutTime) / 1000)) * leakRate;
        // 计算剩余水量
        int waterLeft = water.get() - waterLeaked;
        water.set(Math.max(0, waterLeft));
        // 重新更新leakTimeStamp
        lastOutTime = System.currentTimeMillis();
        // 尝试加水,并且水还未满 ,放行
        if ((water.get()) < capacity) {
    
    
            water.addAndGet(1);
            return false;
        } else {
    
    
            // 水满,拒绝加水, 限流
            return true;
        }

    }


    //线程池,用于多线程模拟测试
    private ExecutorService pool = Executors.newFixedThreadPool(10);

    @Test
    public void testLimit() {
    
    

        // 被限制的次数
        AtomicInteger limited = new AtomicInteger(0);
        // 线程数
        final int threads = 2;
        // 每条线程的执行轮数
        final int turns = 20;
        // 线程同步器
        CountDownLatch countDownLatch = new CountDownLatch(threads);
        long start = System.currentTimeMillis();
        for (int i = 0; i < threads; i++) {
    
    
            pool.submit(() ->
            {
    
    
                try {
    
    

                    for (int j = 0; j < turns; j++) {
    
    

                        long taskId = Thread.currentThread().getId();
                        boolean intercepted = isLimit(taskId, j);
                        if (intercepted) {
    
    
                            // 被限制的次数累积
                            limited.getAndIncrement();
                        }
                        Thread.sleep(200);
                    }


                } catch (Exception e) {
    
    
                    e.printStackTrace();
                }
                //等待所有线程结束
                countDownLatch.countDown();

            });
        }
        try {
    
    
            countDownLatch.await();
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }
        float time = (System.currentTimeMillis() - start) / 1000F;
        //输出统计结果

        log.info("限制的次数为:" + limited.get() +
                ",通过的次数为:" + (threads * turns - limited.get()));
        log.info("限制的比例为:" + (float) limited.get() / (float) (threads * turns));
        log.info("运行的时长为:" + time);
    }
}

The leaky bucket problem

The water outlet speed of the leaky bucket is fixed, that is, the request release speed is fixed.

Copied sayings from the Internet:

The leaky bucket cannot effectively deal with sudden traffic, but it can smooth out sudden traffic (rectification).

Actual question:

The speed of the leaky bucket exit is fixed, and it cannot flexibly respond to the improvement of the back-end capability. For example, through dynamic expansion, the back-end traffic is increased from 1000QPS to 1WQPS, and there is no way for leaky buckets.

Token Bucket Current Limit

The token bucket algorithm generates tokens at a set rate and puts them into the token bucket. Every time a user requests a token, if the token is insufficient, the request is rejected.
In the token bucket algorithm, when a new request arrives, a token will be taken from the bucket. If there is no token in the bucket, the service will be refused. Of course, the number of tokens is also capped. The number of tokens is strongly related to time and issuance rate. The longer the time elapses, the more tokens will be added to the bucket. If the token issuance speed is faster than the application speed, the token bucket will be filled with tokens , until the tokens occupy the entire token bucket, as shown in the figure.

The general rules for token bucket current limiting are as follows:
(1) The water inlet puts tokens into the bucket at a certain speed.
(2) The capacity of the token is fixed, but the speed of release is not fixed. As long as there are remaining tokens in the bucket, the application can be successful once the request comes, and then release.
(3) If the issuance speed of the token is slower than the arrival speed of the request, there will be no tokens to collect in the bucket, and the request will be rejected.

In a word, the sending rate of the token can be set, so that the sudden egress traffic can be dealt with effectively.

token bucket algorithm

The token bucket is similar to the leaky bucket. The difference is that some tokens are placed in the token bucket. After the service request arrives, the service will only be obtained after the token is obtained. For example, when we usually go to the cafeteria to eat, we usually use the The queue in front of the window in the cafeteria is like the leaky bucket algorithm. A large number of people gather outside the window in the cafeteria to enjoy the service at a certain speed. If too many people come in and the cafeteria cannot hold it, there may be some If people stand outside the cafeteria, they will not enjoy the service of the cafeteria, which is called overflow, and overflow can continue to request, that is, continue to queue, so what's the problem?

If there is a special situation at this time, such as some volunteers who are in a hurry, or the college entrance examination in the third year of high school, this situation is an emergency. If we also use the leaky bucket algorithm, we have to wait in line slowly, which does not solve our problem. Requirements, for many application scenarios, in addition to limiting the average data transmission rate, it is also required to allow a certain degree of burst transmission. At this time, the leaky bucket algorithm may not be suitable, and the token bucket algorithm is more suitable. As shown in the figure, the principle of the token bucket algorithm is that the system will put tokens into the bucket at a constant speed, and if the request needs to be processed, it needs to get a token from the bucket first, when there is no token in the bucket When the card is available, service is refused.

2002319-20210220223928172-1995912492

insert image description here

Token Bucket Algorithm Implementation

package com.crazymaker.springcloud.ratelimit;

import lombok.extern.slf4j.Slf4j;
import org.junit.Test;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;

// 令牌桶 限速
@Slf4j
public class TokenBucketLimiter {
    
    
    // 上一次令牌发放时间
    public long lastTime = System.currentTimeMillis();
    // 桶的容量
    public int capacity = 2;
    // 令牌生成速度 /s
    public int rate = 2;
    // 当前令牌数量
    public AtomicInteger tokens = new AtomicInteger(0);
    ;

    //返回值说明:
    // false 没有被限制到
    // true 被限流
    public synchronized boolean isLimited(long taskId, int applyCount) {
    
    
        long now = System.currentTimeMillis();
        //时间间隔,单位为 ms
        long gap = now - lastTime;

        //计算时间段内的令牌数
        int reverse_permits = (int) (gap * rate / 1000);
        int all_permits = tokens.get() + reverse_permits;
        // 当前令牌数
        tokens.set(Math.min(capacity, all_permits));
        log.info("tokens {} capacity {} gap {} ", tokens, capacity, gap);

        if (tokens.get() < applyCount) {
    
    
            // 若拿不到令牌,则拒绝
            // log.info("被限流了.." + taskId + ", applyCount: " + applyCount);
            return true;
        } else {
    
    
            // 还有令牌,领取令牌
            tokens.getAndAdd( - applyCount);
            lastTime = now;

            // log.info("剩余令牌.." + tokens);
            return false;
        }

    }

    //线程池,用于多线程模拟测试
    private ExecutorService pool = Executors.newFixedThreadPool(10);

    @Test
    public void testLimit() {
    
    

        // 被限制的次数
        AtomicInteger limited = new AtomicInteger(0);
        // 线程数
        final int threads = 2;
        // 每条线程的执行轮数
        final int turns = 20;


        // 同步器
        CountDownLatch countDownLatch = new CountDownLatch(threads);
        long start = System.currentTimeMillis();
        for (int i = 0; i < threads; i++) {
    
    
            pool.submit(() ->
            {
    
    
                try {
    
    

                    for (int j = 0; j < turns; j++) {
    
    

                        long taskId = Thread.currentThread().getId();
                        boolean intercepted = isLimited(taskId, 1);
                        if (intercepted) {
    
    
                            // 被限制的次数累积
                            limited.getAndIncrement();
                        }

                        Thread.sleep(200);
                    }


                } catch (Exception e) {
    
    
                    e.printStackTrace();
                }
                //等待所有线程结束
                countDownLatch.countDown();

            });
        }
        try {
    
    
            countDownLatch.await();
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }
        float time = (System.currentTimeMillis() - start) / 1000F;
        //输出统计结果

        log.info("限制的次数为:" + limited.get() +
                ",通过的次数为:" + (threads * turns - limited.get()));
        log.info("限制的比例为:" + (float) limited.get() / (float) (threads * turns));
        log.info("运行的时长为:" + time);
    }
}

Benefits of Token Bucket

One of the benefits of the token bucket is that it can easily handle sudden egress traffic (improvement of back-end capabilities).

For example, the token issuing speed can be changed, and the algorithm can increase the number of tokens issued according to the new sending rate, so that the egress burst traffic can be processed.

Guava RateLimiter

GuavaIt is an excellent open source project in the Java field. It includes some core libraries used by Google in Java projects, including Collections, Caching, Concurrency, Common annotations, String operations, I Many very useful functions for /O operations. Guava RateLimiterprovides token bucket algorithm implementations: Smooth Bursty and Smooth Warming Up.

img

RateLimiterThe class diagram is shown above,

Nginx leaky bucket current limiting

A simple demonstration of Nginx current limiting

Only one request is processed every six seconds, as follows

  limit_req_zone  $arg_sku_id  zone=skuzone:10m      rate=6r/m;
  limit_req_zone  $http_user_id  zone=userzone:10m      rate=6r/m;
  limit_req_zone  $binary_remote_addr  zone=perip:10m      rate=6r/m;
  limit_req_zone  $server_name        zone=perserver:1m   rate=6r/m;

This is from the request parameters, advance parameters to limit current

This is the key for counting the number of times current limiting is performed from the request parameters, advance parameters.

Define the current-limited memory area zone in the http block.

  limit_req_zone  $arg_sku_id  zone=skuzone:10m      rate=6r/m;
  limit_req_zone  $http_user_id  zone=userzone:10m      rate=6r/m;
  limit_req_zone  $binary_remote_addr  zone=perip:10m      rate=6r/m;
  limit_req_zone  $server_name        zone=perserver:1m   rate=10r/s;

Use the current limiting zone in the location block, as follows:

#  ratelimit by sku id
    location  = /ratelimit/sku {
    
    
      limit_req  zone=skuzone;
      echo "正常的响应";
}

test

[root@cdh1 ~]# /vagrant/LuaDemoProject/sh/linux/openresty-restart.sh
shell dir is: /vagrant/LuaDemoProject/sh/linux
Shutting down openrestry/nginx:  pid is 13479 13485
Shutting down  succeeded!
OPENRESTRY_PATH:/usr/local/openresty
PROJECT_PATH:/vagrant/LuaDemoProject/src
nginx: [alert] lua_code_cache is off; this will hurt performance in /vagrant/LuaDemoProject/src/conf/nginx-seckill.conf:90
openrestry/nginx starting succeeded!
pid is 14197


[root@cdh1 ~]# curl  http://cdh1/ratelimit/sku?sku_id=1
正常的响应
root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
正常的响应
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
限流后的降级内容
[root@cdh1 ~]#  curl  http://cdh1/ratelimit/sku?sku_id=1
正常的响应

Advance parameters from Header

1. Nginx supports reading non-nginx standard user-defined headers, but it needs to enable the underscore support of headers under http or server:

underscores_in_headers on;

2. For example, if we customize the header as X-Real-IP, this is what is needed to obtain the header through the second nginx:

$http_x_real_ip; (all lowercase, and there is an extra http_ in front)

 underscores_in_headers on;

  limit_req_zone  $http_user_id  zone=userzone:10m      rate=6r/m;
  server {
    
    
    listen       80 default;
    server_name  nginx.server *.nginx.server;
    default_type 'text/html';
    charset utf-8;


#  ratelimit by user id
    location  = /ratelimit/demo {
    
    
      limit_req  zone=userzone;
      echo "正常的响应";
    }


  
    location = /50x.html{
    
    
      echo "限流后的降级内容";
    }

    error_page 502 503 =200 /50x.html;

  }

test

[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
正常的响应
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:1" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER_ID:2" http://cdh1/ratelimit/demo
正常的响应
[root@cdh1 ~]# curl -H "USER_ID:2" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]#
[root@cdh1 ~]# curl -H "USER_ID:2" http://cdh1/ratelimit/demo
限流后的降级内容
[root@cdh1 ~]# curl -H "USER-ID:3" http://cdh1/ratelimit/demo
正常的响应
[root@cdh1 ~]# curl -H "USER-ID:3" http://cdh1/ratelimit/demo
限流后的降级内容

Three subdivision types of Nginx leaky bucket current limit, namely burst and nodelay parameter details

Only one request is processed every six seconds, as follows

limit_req_zone  $arg_user_id  zone=limti_req_zone:10m      rate=10r/m;

Leaky bucket current limiting without buffer queue

limit_req zone=limti_req_zone;

  • Process requests strictly according to the rate configured in limti_req_zone
  • If it exceeds the rate processing capacity, it will be dropped directly
  • Appears to have no delay for incoming requests

Assuming that 10 requests are submitted within 1 second, you can see a total of 10 requests, 9 of which fail, and return 503 directly.

Then check /var/log/nginx/access.log, which confirms that only one request is successful, and the others all return 503 directly, that is, the server rejected the request.
img

Leaky bucket current limiting with buffer queue

limit_req zone=limti_req_zone burst=5;

  • Process requests according to the rate configured in limti_req_zone
  • At the same time, a buffer queue with a size of 5 is set, and the requests in the buffer queue will wait for slow processing
  • Requests that exceed the burst buffer queue length and rate processing capacity are directly discarded
  • Appears as a delay in receiving requests

Assuming that 10 requests are submitted within 1 second, it can be found that within 1 second, after the server receives 10 concurrent requests, it will process 1 request first, and put 5 requests into the burst buffer queue for processing. The requests exceeding the number of (burst+1) are directly discarded, that is, 4 requests are directly discarded. **The 5 requests in the burst cache are processed every 6s.

Then check the /var/log/nginx/access.log log

img

Leaky Bucket Flow Limiting with Instantaneous Processing Capabilities

limit_req zone=req_zone burst=5 nodelay;

If nodelay is set, it will provide the ability to process (burst + rate) requests instantaneously . When the number of requests exceeds **(burst + rate)**, it will directly return 503. Requests within the peak range, there are no requests that need to wait situation .

Assuming that 10 requests are submitted within 1 second, it can be found that within 1 second, the server processes 6 requests (peak speed: one request within burst+10s). For the remaining 4 requests, return 503 directly. If you continue to send 10 requests to the server in the next second, the server will directly reject these 10 requests and return 503.

Then check the /var/log/nginx/access.log log

img

It can be found that within 1s, the server processed 6 requests (peak speed: burst + original processing speed). For the remaining 4 requests, return 503 directly.

However, the total quota is consistent with the speed * time, that is, the quota is used up, and it is necessary to wait for a time period with a quota before receiving new requests. If 5 requests are processed at a time, it is equivalent to 30s of quota, 6*5=30. Because 6s is set to process one request,
another request cannot be processed until after 30s, that is, if 10 requests are sent to the server at this time, nine 503s and one 200 will be returned

Note: This article will be continuously updated in pdf format. For more latest pdf notes of Nien 3 High, please get them from the link below: Yuque or Code Cloud

Distributed current limiting components

why

However, Nginx's current-limiting commands can only be valid in the same memory area, and the external gateways that are flashkilled in production scenarios are often deployed on multiple nodes, so this requires the use of distributed current-limiting components.

High-performance distributed current-limiting components can be developed using Redis+Lua, and JD.com’s snap-up is to use Redis+Lua to complete the current-limiting. And whether it is an Nginx external gateway or a Zuul internal gateway, Redis+Lua current limiting components can be used.

Theoretically, there are multiple dimensions to current limiting at the access layer:

(1) User-dimension current limiting: A user is only allowed to submit one request within a certain period of time. For example, the client IP or user ID can be used as the key for current limiting.

(2) Current limiting in the product dimension: For the same snap-up product, only a certain number of requests are allowed to enter within a certain period of time, and the flash sale product ID can be used as the key for current limiting.

When to use nginx current limit:

User-dimensional current limiting can be performed on ngix, because using nginx current-limiting memory to store user IDs is more efficient than using redis keys to store user IDs.

When to use redis+lua distributed current limiting:

The current limit of the commodity dimension can be carried out on redis, which does not require a large number of keys to calculate the number of visits. In addition, it can control the total number of access seckill requests of all access layer nodes.

redis+lua distributed current limiting component

--- 此脚本的环境: redis 内部,不是运行在 nginx 内部

---方法:申请令牌
--- -1 failed
--- 1 success
--- @param key key 限流关键字
--- @param apply  申请的令牌数量
local function acquire(key, apply)
    local times = redis.call('TIME');
    -- times[1] 秒数   -- times[2] 微秒数
    local curr_mill_second = times[1] * 1000000 + times[2];
    curr_mill_second = curr_mill_second / 1000;

    local cacheInfo = redis.pcall("HMGET", key, "last_mill_second", "curr_permits", "max_permits", "rate")
    --- 局部变量:上次申请的时间
    local last_mill_second = cacheInfo[1];
    --- 局部变量:之前的令牌数
    local curr_permits = tonumber(cacheInfo[2]);
    --- 局部变量:桶的容量
    local max_permits = tonumber(cacheInfo[3]);
    --- 局部变量:令牌的发放速率
    local rate = cacheInfo[4];
    --- 局部变量:本次的令牌数
    local local_curr_permits = 0;

    if (type(last_mill_second) ~= 'boolean' and last_mill_second ~= nil) then
        -- 计算时间段内的令牌数
        local reverse_permits = math.floor(((curr_mill_second - last_mill_second) / 1000) * rate);
        -- 令牌总数
        local expect_curr_permits = reverse_permits + curr_permits;
        -- 可以申请的令牌总数
        local_curr_permits = math.min(expect_curr_permits, max_permits);
    else
        -- 第一次获取令牌
        redis.pcall("HSET", key, "last_mill_second", curr_mill_second)
        local_curr_permits = max_permits;
    end

    local result = -1;
    -- 有足够的令牌可以申请
    if (local_curr_permits - apply >= 0) then
        -- 保存剩余的令牌
        redis.pcall("HSET", key, "curr_permits", local_curr_permits - apply);
        -- 为下次的令牌获取,保存时间
        redis.pcall("HSET", key, "last_mill_second", curr_mill_second)
        -- 返回令牌获取成功
        result = 1;
    else
        -- 返回令牌获取失败
        result = -1;
    end
    return result
end
--eg
-- /usr/local/redis/bin/redis-cli  -a 123456  --eval   /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua key , acquire 1  1

-- 获取 sha编码的命令
-- /usr/local/redis/bin/redis-cli  -a 123456  script load "$(cat  /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua)"
-- /usr/local/redis/bin/redis-cli  -a 123456  script exists  "cf43613f172388c34a1130a760fc699a5ee6f2a9"

-- /usr/local/redis/bin/redis-cli -a 123456  evalsha   "cf43613f172388c34a1130a760fc699a5ee6f2a9" 1 "rate_limiter:seckill:1"  init 1  1
-- /usr/local/redis/bin/redis-cli -a 123456  evalsha   "cf43613f172388c34a1130a760fc699a5ee6f2a9" 1 "rate_limiter:seckill:1"  acquire 1

--local rateLimiterSha = "e4e49e4c7b23f0bf7a2bfee73e8a01629e33324b";

---方法:初始化限流 Key
--- 1 success
--- @param key key
--- @param max_permits  桶的容量
--- @param rate  令牌的发放速率
local function init(key, max_permits, rate)
    local rate_limit_info = redis.pcall("HMGET", key, "last_mill_second", "curr_permits", "max_permits", "rate")
    local org_max_permits = tonumber(rate_limit_info[3])
    local org_rate = rate_limit_info[4]

    if (org_max_permits == nil) or (rate ~= org_rate or max_permits ~= org_max_permits) then
        redis.pcall("HMSET", key, "max_permits", max_permits, "rate", rate, "curr_permits", max_permits)
    end
    return 1;
end
--eg
-- /usr/local/redis/bin/redis-cli -a 123456 --eval   /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua key , init 1  1
-- /usr/local/redis/bin/redis-cli -a 123456 --eval   /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua  "rate_limiter:seckill:1"  , init 1  1


---方法:删除限流 Key
local function delete(key)
    redis.pcall("DEL", key)
    return 1;
end
--eg
-- /usr/local/redis/bin/redis-cli  --eval   /vagrant/LuaDemoProject/src/luaScript/redis/rate_limiter.lua key , delete


local key = KEYS[1]
local method = ARGV[1]
if method == 'acquire' then
    return acquire(key, ARGV[2], ARGV[3])
elseif method == 'init' then
    return init(key, ARGV[2], ARGV[3])
elseif method == 'delete' then
    return delete(key)
else
    --ignore
end

In redis, in order to avoid wasting network resources by repeatedly sending script data, you can use the script load command to cache script data and return a hash code as the call handle of the script.

Every time you call the script, you only need to send the hash code to call it.

Distributed Token Current Limiting Actual Combat

You can use redis+lua, the simple case below the actual combat one ticket:

The token is put into the token bucket at a rate of 1 per second, and a maximum of 2 tokens can be stored in the bucket, then the system will only allow continuous processing of 2 requests per second.

Or every 2 seconds, after the 2 tokens in the bucket are full, handle the emergency of 2 requests at a time to ensure system stability.

Current limit of commodity dimension

When the current limit of the flash product dimension, when the traffic of the product is far greater than the traffic involved, the request will be randomly discarded.

Nginx's token bucket rate limiting script getToken_access_limit.lua is executed in the access phase of the request. However, this script does not implement the core logic of rate limiting, and only calls the rate_limiter.lua script cached inside Redis to limit rate.

The relationship between the getToken_access_limit.lua script and the rate_limiter.lua script is shown in Figure 10-17.

img

Figure 10-17 Relationship between getToken_access_limit.lua script and rate_limiter.lua script

When is the rate_limiter.lua script loaded in Redis?

Like the Lightning Deal script, this script is loaded and cached in Redis when the Java program starts the Lightning Deal.

Another very important point is that the Java program will encode the sha1 code after the script is loaded, and cache it in Redis through a custom key (specifically "lua:sha1:rate_limiter"), so that the getToken_access_limit.lua script of Nginx can get it , and used when calling the evalsha method.

Note: Redis cluster is used, so each node needs to cache a copy of script data

/**
* 由于使用redis集群,因此每个节点都需要各自缓存一份脚本数据
* @param slotKey 用来定位对应的slot的slotKey
*/
public void storeScript(String slotKey){
    
    
if (StringUtils.isEmpty(unlockSha1) || !jedisCluster.scriptExists(unlockSha1, slotKey)){
    
    
   //redis支持脚本缓存,返回哈希码,后续可以继续用来调用脚本
    unlockSha1 = jedisCluster.scriptLoad(DISTRIBUTE_LOCK_SCRIPT_UNLOCK_VAL, slotKey);
   }
}

Common Current Limiting Components

Redission distributed current limiting adopts the idea of ​​token bucket and fixed time window. The trySetRate method sets the size of the bucket, uses the redis key expiration mechanism to achieve the purpose of time window, and controls the number of requests allowed to pass within the fixed time window.

The spring cloud gateway integrates redis current limiting, but it belongs to the gateway layer current limiting

The realization path of technical freedom:

Realize your architectural freedom:

" Have a thorough understanding of the 8-figure-1 template, everyone can do the architecture "

" 10Wqps review platform, how to structure it? This is what station B does! ! ! "

" Alibaba Two Sides: How to optimize the performance of tens of millions and billions of data?" Textbook-level answers are coming "

" Peak 21WQps, 100 million DAU, how is the small game "Sheep a Sheep" structured? "

" How to Scheduling 10 Billion-Level Orders, Come to a Big Factory's Superb Solution "

" Two Big Factory 10 Billion-Level Red Envelope Architecture Scheme "

… more architecture articles, being added

Realize your responsive freedom:

" Responsive Bible: 10W Words, Realize Spring Responsive Programming Freedom "

This is the old version of " Flux, Mono, Reactor Combat (the most complete in history) "

Realize your spring cloud freedom:

" Spring Cloud Alibaba Study Bible " PDF

" Sharding-JDBC underlying principle and core practice (the most complete in history) "

" Get it done in one article: the chaotic relationship between SpringBoot, SLF4j, Log4j, Logback, and Netty (the most complete in history) "

Realize your linux freedom:

" Linux Commands Encyclopedia: 2W More Words, One Time to Realize Linux Freedom "

Realize your online freedom:

" Detailed explanation of TCP protocol (the most complete in history) "

" Three Network Tables: ARP Table, MAC Table, Routing Table, Realize Your Network Freedom!" ! "

Realize your distributed lock freedom:

" Redis Distributed Lock (Illustration - Second Understanding - The Most Complete in History) "

" Zookeeper Distributed Lock - Diagram - Second Understanding "

Realize your king component freedom:

" King of the Queue: Disruptor Principles, Architecture, and Source Code Penetration "

" The King of Cache: Caffeine Source Code, Architecture, and Principles (the most complete in history, 10W super long text) "

" The King of Cache: The Use of Caffeine (The Most Complete in History) "

" Java Agent probe, bytecode enhanced ByteBuddy (the most complete in history) "

Realize your interview questions freely:

4000 pages of "Nin's Java Interview Collection" 40 topics

Please go to the official account of "Technical Freedom Circle" to get the PDF file updates of the above Nien architecture notes and interview questions↓↓↓

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/130035504