System current limiting protection strategy under high concurrency architecture

When the concurrent traffic of the system is too large, it may cause the system to be overwhelmed and the entire service to be unavailable.

For this scenario, the general solution is: if this traffic is exceeded, we refuse to provide services, so that our services will not hang.

Of course, although the current limit can protect the system from being overwhelmed, it will be very unhappy for the users who are limited by the current. So current limiting is actually a lossy solution. But lossy service is the best solution to being unavailable at all

tmall

The role of current limiting

In addition to the current-limiting usage scenarios mentioned above, the current-limiting design can also prevent malicious request traffic and malicious attacks

Therefore, the basic principle of current limiting is to protect the system by limiting the rate of concurrent access/requests or requests within a time window. Once the rate limit is reached, service can be denied (directed to an error page or informed that the resource is gone) , queue or wait (seckill, order), downgrade (return to bottom data or default data or default data, such as product details page stock is available by default)

Common current limitations in Internet companies include: limiting the total number of concurrent connections (such as database connection pools, thread pools), limiting the number of instantaneous concurrency (limit_conn module of nginx, used to limit the number of instantaneous concurrent connections), and limiting the average rate within the time window ( For example, Guava's RateLimiter and nginx's limit_req module limit the average rate per second); others also limit the call rate of remote interfaces and limit the consumption rate of MQ. In addition, you can limit the current based on the number of network connections, network traffic, CPU or memory load, etc.

With the cache, you can add a current limit, so you can deal with high concurrency. You don’t have to worry about instantaneous traffic causing the system to hang or avalanche. Ultimately, the service will be damaged instead of no service. However, the current limit needs to be evaluated well. Use it indiscriminately, otherwise some normal traffic will have some strange problems, resulting in poor user experience and user loss.

Common current limiting algorithms

sliding window

发送和接受方都会维护一个数据帧的序列,这个序列被称作窗口。发送方的窗口大小由接受方确定,目的在于控制发送速度,以免接受方的缓存不够大,而导致溢出,同时控制流量也可以避免网络拥塞。下面图中的4,5,6号数据帧已经被发送出去,但是未收到关联的ACK,7,8,9帧则是等待发送。可以看出发送端的窗口大小为6,这是由接受端告知的。此时如果发送端收到4号ACK,则窗口的左边缘向右收缩,窗口的右边缘则向右扩展,此时窗口就向前“滑动了”,即数据帧10也可以被发送。

1564909772103

滑动窗口演示地址

漏桶(控制传输速率Leaky bucket)

漏桶算法思路是,不断的往桶里面注水,无论注水的速度是大还是小,水都是按固定的速率往外漏水;如果桶满了,水会溢出;

桶本身具有一个恒定的速率往下漏水,而上方时快时慢的会有水进入桶内。当桶还未满时,上方的水可以加入。一旦水满,上方的水就无法加入。桶满正是算法中的一个关键的触发条件(即流量异常判断成立的条件)。而此条件下如何处理上方流下来的水,有两种方式

在桶满水之后,常见的两种处理方式为:

  1. 暂时拦截住上方水的向下流动,等待桶中的一部分水漏走后,再放行上方水。

  2. 溢出的上方水直接抛弃。

特点

  1. 漏水的速率是固定的

  2. 即使存在注水burst(突然注水量变大)的情况,漏水的速率也是固定的

image-20200917200938881

令牌桶(能够解决突发流量)

令牌桶算法是网络流量整形(Traffic Shaping)和速率限制(Rate Limiting)中最常使用的一种算法。典型情况下,令牌桶算法用来控制发送到网络上的数据的数目,并允许突发数据的发送。

令牌桶是一个存放固定容量令牌(token)的桶,按照固定速率往桶里添加令牌; 令牌桶算法实际上由三部分组成:两个流和一个桶,分别是令牌流、数据流和令牌桶

令牌流与令牌桶

系统会以一定的速度生成令牌,并将其放置到令牌桶中,可以将令牌桶想象成一个缓冲区(可以用队列这种数据结构来实现),当缓冲区填满的时候,新生成的令牌会被扔掉。这里有两个变量很重要:

第一个是生成令牌的速度,一般称为 rate 。比如,我们设定 rate = 2 ,即每秒钟生成 2 个令牌,也就是每 1/2 秒生成一个令牌;

第二个是令牌桶的大小,一般称为 burst 。比如,我们设定 burst = 10 ,即令牌桶最大只能容纳 10 个令牌。

数据流

数据流是真正的进入系统的流量,对于http接口来说,如果平均每秒钟会调用2次,则认为速率为 2次/s。

有以下三种情形可能发生:

数据流的速率 等于 令牌流的速率。这种情况下,每个到来的数据包或者请求都能对应一个令牌,然后无延迟地通过队列;

数据流的速率 小于 令牌流的速率。通过队列的数据包或者请求只消耗了一部分令牌,剩下的令牌会在令牌桶里积累下来,直到桶被装满。剩下的令牌可以在突发请求的时候消耗掉。

数据流的速率 大于 令牌流的速率。这意味着桶里的令牌很快就会被耗尽。导致服务中断一段时间,如果数据包或者请求持续到来,将发生丢包或者拒绝响应。

比如前面举的例子,生成令牌的速率和令牌桶的大小分别为 rate = 2, burst = 10 ,则系统能承受的突发请求速率为 10次/s ,平均请求速率为 2次/s 。 三种情形中的最后一种情景是这个算法的核心所在,这个算法非常精确,实现非常简单并且对服务器的压力可以忽略不计,因此应用得相当广泛,值得学习和利用

image-20200917201022871

特点

  1. 令牌可以积累:桶中最大的令牌数是b,表示可以积累的最大令牌数

  2. 允许突发流量:桶中token可以积累到n(b<=n<=0),此时如果有n个突发请求同时到达,这n个请求是可以同时允许处理的

限流算法实战

Semaphore

Semaphore is more commonly used for current limiting operations. For example, in the following scenario, 20 client requests are simulated. In order to reduce the access pressure, we limit the requested traffic through Semaphore.

public class SemaphoreTest {

    public static void main(String[] args) {  
        // 线程池 
        ExecutorService exec = Executors.newCachedThreadPool();  
        // 只能5个线程同时访问 
        final Semaphore semp = new Semaphore(5);  
        // 模拟20个客户端访问 
        for (int index = 0; index < 20; index++) {
            final int NO = index;  
            Runnable run = new Runnable() {  
                public void run() {  
                    try {  
                        // 获取许可 
                        semp.acquire();  
                        System.out.println("Accessing: " \+ NO);  
                        Thread.sleep((long) (Math.random() * 10000));  
                        // 访问完后,释放 
                        semp.release();  
                    } catch (InterruptedException e) {  
                    }  
                }  
            };  
            exec.execute(run);  
        }  
        // 退出线程池 
        exec.shutdown();  
    }  
}
复制代码

Guava's RateLimiter implementation

There are two implementations of RateLimiter in Guava: Bursty and WarmUp

bursty is an algorithm implementation based on token buckets, such as

RateLimiter rateLimiter=RateLimiter.create(permitPerSecond); //Create a bursty instance

rateLimiter.acquire(); //Get 1 permit, when the number of tokens is not enough, it will block until it is acquired

  1. import jar package

    <dependency>
       <groupId>com.google.guava</groupId>
       <artifactId>guava</artifactId>
       <version>23.0</version>
    </dependency>
    复制代码
  2. write test code

    public class PayService {
    
        RateLimiter rateLimiter=RateLimiter.create(10);//qps=10
    
        public void doRequest(String threadName){
            if(rateLimiter.tryAcquire()){
                System.out.println(threadName+": 支付成功");
            }else{
                System.out.println(threadName+": 当前支付人数过多,请稍候再试");
            }
        }
    
        public static void main(String[] args) throws IOException {
            PayService payService=new PayService();
            CountDownLatch latch=new CountDownLatch(1);
            Random random=new Random(10);
            for (int i = 0; i < 20; i++) {
                int finalI = i;
                new Thread(()->{
                    try {
                        latch.await();
                        int sleepTime = random.nextInt(1000);
                        Thread.sleep(sleepTime);
                        payService.doRequest("t-"+ finalI);
                    }catch (Exception e){
                        e.printStackTrace();
                    }
                }).start();
            }
            latch.countDown();
            System.in.read();
        }
    }
    复制代码

The next article will analyze Sentinel, Alibaba's open source current limiting framework!

Copyright notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated. Please indicate the source for reprinting Mic带你学架构! If this article is helpful to you, please help to follow and like, your persistence is the driving force for my continuous creation. Welcome to follow the WeChat public account of the same name to get more technical dry goods!

Guess you like

Origin juejin.im/post/7085224656013590565