Talk about the current limiting stunts of high concurrency systems

Original link: http://www.jianshu.com/p/2596e559db5c

  • Recently, I have been studying the problem of stress testing the client. If we break through the client's stress testing thread, port and other issues, if the server cannot handle network requests, it will cause serious downtime. There are three sharp tools for developing high-concurrency systems. Protection systems: caching, degradation and throttling
--------Original link below ----------------

Limiting

  • The purpose of current throttling is to protect the system by limiting the rate of concurrent access/requests or requests within a time window. Once the rate limit is reached, service can be denied (directed to an error page or notified that the resource is no longer available), queuing Or wait (such as seckill, comment, place an order), downgrade (return to bottom line data or default data, such as product details page inventory is available by default).

  • Common current limits for developing high-concurrency systems include: limiting the total number of concurrent connections (such as database connection pools and thread pools), limiting the number of instantaneous concurrency (such as the limit_conn module of nginx, which is used to limit the number of instantaneous concurrent connections), and limiting the number of concurrent connections within the time window. Average rate (such as Guava's RateLimiter, nginx's limit_req module, which limits the average rate per second); others such as limiting the call rate of remote interfaces and limiting the consumption rate of MQ. In addition, you can limit the current based on the number of network connections, network traffic, CPU or memory load, etc.

  • There is a silver bullet of caching first, and then limited flow to deal with the high concurrent traffic of 618 and Double Eleven. It can be said to be a powerful force in dealing with high concurrency problems. There is no need to worry about the instantaneous traffic causing the system to hang or avalanche, which will eventually damage the service. It is not non-service; the current limit needs to be evaluated well and cannot be used indiscriminately, otherwise there will be some strange problems in normal traffic, which will cause users to complain.

  • Don't be too entangled in algorithm problems in practical applications, because some current-limiting algorithms are implemented the same but have different descriptions; the specific current-limiting technology to be used should still be selected according to the actual scene, don't blindly find the best mode, white cat A black cat can solve a problem is a good cat.

  • Because many people have asked how to limit current in actual work, this article will introduce various current limiting methods in detail. Then, let's learn the lower current limiting technology in detail from the current limiting algorithm, application-level current limiting, distributed current limiting, and access layer current limiting.

Current limiting algorithm

Common current limiting algorithms include token buckets and leaky buckets. Counters can also be implemented with rude current limiting.

Token Bucket Algorithm

  • The token bucket algorithm is a bucket that stores fixed-capacity tokens, and tokens are added to the bucket at a fixed rate. The description of the token bucket algorithm is as follows:
    • Assuming the limit is 2r/s, add tokens to the bucket at a fixed rate of 500 milliseconds;
    • At most b tokens are stored in the bucket. When the bucket is full, the newly added tokens are discarded or rejected;
    • When a packet of size n bytes arrives, n tokens are removed from the bucket, and the packet is sent to the network;
    • If there are less than n tokens in the bucket, the token will not be deleted, and the packet will be throttled (either discarded or buffered).


leaky bucket algorithm

  • The Leaky Bucket Algorithm as a Meter can be used for Traffic Shaping and Traffic Policing. The Leaky Bucket Algorithm is described as follows:
    • A leaky bucket of fixed capacity that flows out water droplets at a constant constant rate;
    • If the bucket is empty, no water droplets need to flow;
    • Water can flow into the leaky bucket at any rate;
    • If the inflow droplet exceeds the capacity of the bucket, the inflow droplet overflows (is discarded), while the leaky bucket capacity is unchanged.


Comparison of token bucket and leaky bucket:

  • The token bucket is to add tokens to the bucket at a fixed rate. Whether the request is processed depends on whether there are enough tokens in the bucket. When the number of tokens decreases to zero, the new request is rejected;
  • The leaky bucket flows out requests at a constant and fixed rate, and the incoming request rate is arbitrary. When the number of incoming requests accumulates to the leaky bucket capacity, the new incoming requests are rejected;
  • The token bucket limits the average inflow rate (burst requests are allowed, which can be processed as long as there are tokens, and supports taking 3 tokens and 4 tokens at a time), and allows a certain degree of burst traffic;
  • The leaky bucket limits the constant outflow rate (that is, the outflow rate is a fixed constant value, for example, the outflow rate is 1, instead of 1 at one time and 2 at the next time), thereby smoothing the burst inflow rate;
  • The token bucket allows a certain degree of burst, and the main purpose of the leaky bucket is to smooth the inflow rate;
  • The two algorithms can be implemented the same, but the directions are opposite, and the current limiting effect is the same for the same parameters.
  • In addition, sometimes we also use counters to limit current, mainly to limit the total number of concurrency, such as database connection pool, thread pool, the number of concurrent kills; as long as the global total number of requests or the total number of requests in a certain period of time set the valve The value is limited, which is a simple and rude total number current limit, not the average rate current limit.

到此基本的算法就介绍完了,接下来我们首先看看应用级限流。

应用级限流

限流总并发/连接/请求数

  • 对于一个应用系统来说一定会有极限并发/请求数,即总有一个TPS/QPS阀值,如果超了阀值则系统就会不响应用户请求或响应的非常慢,因此我们最好进行过载保护,防止大量请求涌入击垮系统。
    如果你使用过Tomcat,其Connector 其中一种配置有如下几个参数:
    • acceptCount:如果Tomcat的线程都忙于响应,新来的连接会进入队列排队,如果超出排队大小,则拒绝连接;
    • maxConnections: 瞬时最大连接数,超出的会排队等待;
    • maxThreads:Tomcat能启动用来处理请求的最大线程数,如果请求处理量一直远远大于最大线程数则可能会僵死。
      详细的配置请参考官方文档。另外如Mysql(如max_connections)、Redis(如tcp-backlog)都会有类似的限制连接数的配置。

限流总资源数

  • 如果有的资源是稀缺资源(如数据库连接、线程),而且可能有多个系统都会去使用它,那么需要限制应用;可以使用池化技术来限制总资源数:连接池、线程池。比如分配给每个应用的数据库连接是100,那么本应用最多可以使用100个资源,超出了可以等待或者抛异常。

限流某个接口的总并发/请求数

  • 如果接口可能会有突发访问情况,但又担心访问量太大造成崩溃,如抢购业务;这个时候就需要限制这个接口的总并发/请求数总请求数了;因为粒度比较细,可以为每个接口都设置相应的阀值。可以使用Java中的AtomicLong进行限流:
    try {
      if(atomic.incrementAndGet() > 限流数) {
          //拒绝请求
      }
      //处理请求
    } finally {
      atomic.decrementAndGet();
    }
  • 适合对业务无损的服务或者需要过载保护的服务进行限流,如抢购业务,超出了大小要么让用户排队,要么告诉用户没货了,对用户来说是可以接受的。而一些开放平台也会限制用户调用某个接口的试用请求量,也可以用这种计数器方式实现。这种方式也是简单粗暴的限流,没有平滑处理,需要根据实际情况选择使用;

限流某个接口的时间窗请求数

  • 即一个时间窗口内的请求数,如想限制某个接口/服务每秒/每分钟/每天的请求数/调用量。如一些基础服务会被很多其他系统调用,比如商品详情页服务会调用基础商品服务调用,但是怕因为更新量比较大将基础服务打挂,这时我们要对每秒/每分钟的调用量进行限速;一种实现方式如下所示:
LoadingCache<Long, AtomicLong> counter =
        CacheBuilder.newBuilder()
                .expireAfterWrite(2, TimeUnit.SECONDS)
                .build(new CacheLoader<Long, AtomicLong>() {
                    @Override
                    public AtomicLong load(Long seconds) throws Exception {
                        return new AtomicLong(0);
                    }
                });
long limit = 1000;
while(true) {
    //得到当前秒
    long currentSeconds = System.currentTimeMillis() / 1000;
    if(counter.get(currentSeconds).incrementAndGet() > limit) {
        System.out.println("限流了:" + currentSeconds);
        continue;
    }
    //业务处理
}
  • 我们使用Guava的Cache来存储计数器,过期时间设置为2秒(保证1秒内的计数器是有的),然后我们获取当前时间戳然后取秒数来作为KEY进行计数统计和限流,这种方式也是简单粗暴,刚才说的场景够用了。

平滑限流某个接口的请求数

  • 之前的限流方式都不能很好地应对突发请求,即瞬间请求可能都被允许从而导致一些问题;因此在一些场景中需要对突发请求进行整形,整形为平均速率请求处理(比如5r/s,则每隔200毫秒处理一个请求,平滑了速率)。这个时候有两种算法满足我们的场景:令牌桶和漏桶算法。Guava框架提供了令牌桶算法实现,可直接拿来使用。
  • Guava RateLimiter提供了令牌桶算法实现:平滑突发限流(SmoothBursty)和平滑预热限流(SmoothWarmingUp)实现。
SmoothBursty
=================================
RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());

   将得到类似如下的输出:
   0.0
   0.198239
   0.196083
   0.200609
   0.199599
   0.19961
  • RateLimiter.create(5) 表示桶容量为5且每秒新增5个令牌,即每隔200毫秒新增一个令牌;
  • limiter.acquire()表示消费一个令牌,如果当前桶中有足够令牌则成功(返回值为0),如果桶中没有令牌则暂停一段时间,比如发令牌间隔是200毫秒,则等待200毫秒后再去消费令牌(如上测试用例返回的为0.198239,差不多等待了200毫秒桶中才有令牌可用),这种实现将突发请求速率平均为了固定请求速率。

再看一个突发示例:

RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire(5));
System.out.println(limiter.acquire(1));
System.out.println(limiter.acquire(1))

将得到类似如下的输出:
0.0
0.98745
0.183553
0.199909
  • limiter.acquire(5)表示桶的容量为5且每秒新增5个令牌,令牌桶算法允许一定程度的突发,所以可以一次性消费5个令牌,但接下来的
  • limiter.acquire(1)将等待差不多1秒桶中才能有令牌,且接下来的请求也整形为固定速率了。
RateLimiter limiter = RateLimiter.create(5);
System.out.println(limiter.acquire(10));
System.out.println(limiter.acquire(1));
System.out.println(limiter.acquire(1));

将得到类似如下的输出:
0.0
1.997428
0.192273
0.200616
  • 同上边的例子类似,第一秒突发了10个请求,令牌桶算法也允许了这种突发(允许消费未来的令牌),但接下来的limiter.acquire(1)将等待差不多2秒桶中才能有令牌,且接下来的请求也整形为固定速率了。

接下来再看一个突发的例子:

RateLimiter limiter = RateLimiter.create(2);
System.out.println(limiter.acquire());
Thread.sleep(2000L);
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());
System.out.println(limiter.acquire());

将得到类似如下的输出:
0.0
0.0
0.0
0.0
0.499876
0.495799
  • 创建了一个桶容量为2且每秒新增2个令牌;
  • 首先调用limiter.acquire()消费一个令牌,此时令牌桶可以满足(返回值为0);
  • 然后线程暂停2秒,接下来的两个limiter.acquire()都能消费到令牌,第三个limiter.acquire()也同样消费到了令牌,到第四个时就需要等待500毫秒了。
  • 此处可以看到我们设置的桶容量为2(即允许的突发量),这是因为SmoothBursty中有一个参数:最大突发秒数(maxBurstSeconds)默认值是1s,突发量/桶容量=速率*maxBurstSeconds,所以本示例桶容量/突发量为2,例子中前两个是消费了之前积攒的突发量,而第三个开始就是正常计算的了。令牌桶算法允许将一段时间内没有消费的令牌暂存到令牌桶中,留待未来使用,并允许未来请求的这种突发。
  • SmoothBursty通过平均速率和最后一次新增令牌的时间计算出下次新增令牌的时间的,另外需要一个桶暂存一段时间内没有使用的令牌(即可以突发的令牌数)。另外RateLimiter还提供了tryAcquire方法来进行无阻塞或可超时的令牌消费。
  • 因为SmoothBursty允许一定程度的突发,会有人担心如果允许这种突发,假设突然间来了很大的流量,那么系统很可能扛不住这种突发。因此需要一种平滑速率的限流工具,从而系统冷启动后慢慢的趋于平均固定速率(即刚开始速率小一些,然后慢慢趋于我们设置的固定速率)。Guava也提供了SmoothWarmingUp来实现这种需求,其可以认为是漏桶算法,但是在某些特殊场景又不太一样。
  • SmoothWarmingUp创建方式:
    • RateLimiter.create(doublepermitsPerSecond, long warmupPeriod, TimeUnit unit)
    • permitsPerSecond表示每秒新增的令牌数,warmupPeriod表示在从冷启动速率过渡到平均速率的时间间隔。
    • 示例如下:
RateLimiter limiter = RateLimiter.create(5, 1000, TimeUnit.MILLISECONDS);
for(int i = 1; i < 5;i++) {
    System.out.println(limiter.acquire());
}
Thread.sleep(1000L);
for(int i = 1; i < 5;i++) {
    System.out.println(limiter.acquire());
}

将得到类似如下的输出:
0.0
0.51767
0.357814
0.219992
0.199984
0.0
0.360826
0.220166
0.199723
0.199555
  • 速率是梯形上升速率的,也就是说冷启动时会以一个比较大的速率慢慢到平均速率;然后趋于平均速率(梯形下降到平均速率)。可以通过调节warmupPeriod参数实现一开始就是平滑固定速率。

到此应用级限流的一些方法就介绍完了。假设将应用部署到多台机器,应用级限流方式只是单应用内的请求限流,不能进行全局限流。因此我们需要分布式限流和接入层限流来解决这个问题。

分布式限流

  • 分布式限流最关键的是要将限流服务做成原子化,而解决方案可以使使用redis+lua或者nginx+lua技术进行实现,通过这两种技术可以实现的高并发和高性能。
  • 首先我们来使用redis+lua实现时间窗内某个接口的请求数限流,实现了该功能后可以改造为限流总并发/请求数和限制总资源数。Lua本身就是一种编程语言,也可以使用它实现复杂的令牌桶或漏桶算法。

redis+lua实现中的lua脚本:

local key = KEYS[1] --限流KEY(一秒一个)
local limit = tonumber(ARGV[1])        --限流大小
local current = tonumber(redis.call("INCRBY", key, "1")) --请求数+1
if current > limit then --如果超出限流大小
   return 0
elseif current == 1 then  --只有第一次访问需要设置2秒的过期时间
   redis.call("expire", key,"2")
end
return 1
  • 如上操作因是在一个lua脚本中,又因Redis是单线程模型,因此是线程安全的。如上方式有一个缺点就是当达到限流大小后还是会递增的,可以改造成如下方式实现:
    local key = KEYS[1] --限流KEY(一秒一个)
    local limit = tonumber(ARGV[1])        --限流大小
    local current = tonumber(redis.call('get', key) or "0")
    if current + 1 > limit then --如果超出限流大小
     return 0
    else  --请求数+1,并设置2秒过期
     redis.call("INCRBY", key,"1")
     redis.call("expire", key,"2")
     return 1
    end
    如下是Java中判断是否需要限流的代码:
public static boolean acquire() throws Exception {
    String luaScript = Files.toString(new File("limit.lua"), Charset.defaultCharset());
    Jedis jedis = new Jedis("192.168.147.52", 6379);
    String key = "ip:" + System.currentTimeMillis()/ 1000; //此处将当前时间戳取秒数
    Stringlimit = "3"; //限流大小
    return (Long)jedis.eval(luaScript,Lists.newArrayList(key), Lists.newArrayList(limit)) == 1;
}
  • 因为Redis的限制(Lua中有写操作不能使用带随机性质的读操作,如TIME)不能在Redis Lua中使用TIME获取时间戳,因此只好从应用获取然后传入,在某些极端情况下(机器时钟不准的情况下),限流会存在一些小问题。

使用Nginx+Lua实现的Lua脚本:

local locks = require "resty.lock"
local function acquire()
    local lock =locks:new("locks")
    local elapsed, err =lock:lock("limit_key") --互斥锁
    local limit_counter =ngx.shared.limit_counter --计数器

    local key = "ip:" ..os.time()
    local limit = 5 --限流大小
    local current =limit_counter:get(key)

    if current ~= nil and current + 1> limit then --如果超出限流大小
       lock:unlock()
       return 0
    end
    if current == nil then
       limit_counter:set(key, 1, 1) --第一次需要设置过期时间,设置key的值为1,过期时间为1秒
    else
        limit_counter:incr(key, 1) --第二次开始加1即可
    end
    lock:unlock()
    return 1
end
ngx.print(acquire())
  • 实现中我们需要使用lua-resty-lock互斥锁模块来解决原子性问题(在实际工程中使用时请考虑获取锁的超时问题),并使用ngx.shared.DICT共享字典来实现计数器。如果需要限流则返回0,否则返回1。使用时需要先定义两个共享字典(分别用来存放锁和计数器数据):
    http {
      ……
      lua_shared_dict locks 10m;
      lua_shared_dict limit_counter 10m;
    }
  • 有人会纠结如果应用并发量非常大那么redis或者nginx是不是能抗得住;不过这个问题要从多方面考虑:你的流量是不是真的有这么大,是不是可以通过一致性哈希将分布式限流进行分片,是不是可以当并发量太大降级为应用级限流;对策非常多,可以根据实际情况调节;像在京东使用Redis+Lua来限流抢购流量,一般流量是没有问题的。

  • 对于分布式限流目前遇到的场景是业务上的限流,而不是流量入口的限流;流量入口限流应该在接入层完成,而接入层笔者一般使用Nginx

参考资料


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325710702&siteId=291194637