Author: nick hao

Original link: cnblogs.com/haoxinyue/p/6792309.html

Great God opened Tao said in a blog: there are three weapon used to protect the system in the development of highly concurrent systems: cache, degradation and limiting . Some experience combined with the author of the article describes the current limit of related concepts, algorithms and conventional implementations.

Cache

Cache is better understood, in large high concurrent systems, if there is no cache database will be every minute burst, the system will instantly paralyzed. Use the cache system not only can improve access speed, increase the amount of concurrent access, but also the protection of databases, effective way to protect the system. Large sites generally are the "read" cache usage can easily be thought of. In the large "write" system, the cache is often played by a very important role. For example, the cumulative number of batch data is written, the memory cache inside the queue (production and consumption), and HBase data write mechanism, etc. are also measures to achieve protection system throughput by caching lift system or of. Even messaging middleware, you may think is a distributed data cache.

Demote

Service degradation when the server is pressure surge, according to the current business situation and traffic downgrade policies of some services and pages, thus freeing server resources to ensure the normal operation of the core tasks. Downgrade tend to specify a different level, face different abnormal levels perform different processing. According Service: can reject the service, the service may be delayed, it can sometimes be random service. According to the scope of services: possible to cut a feature, you can also cut some modules. In short service degradation requires different strategies depending on the downgrading of business needs. The main purpose is to undermine the service though but better than nothing.

Limiting

Limiting services can be considered a downgrade, is limiting inbound and outbound traffic restriction system has reached the purpose of protecting the system. In general throughput of the system can be estimated, in order to ensure the stable operation of the system, once they reach the required threshold limit, you need to take some measures to limit traffic and restrict flow to complete the goal. For example: delay processing, reject processing, or partially processing refuse and the like.

Limiting the algorithm

Common current limiting algorithm are: counter, leaky bucket and token bucket algorithm.

counter

Counter is the most simple and crude algorithm. For example, a service can only handle 100 requests per second. We can set a 1 second sliding window, the window 10 there is a grid, each grid 100 milliseconds, every 100 milliseconds moves once, each mobile needs to record the number of the current service request. 10 times the number of memory needs to be saved. LinkedList data structure can be implemented. Lattice each move when a judgment that the current visits and LinkedList difference whether the last of more than 100, if you need more than the current limit.

Obviously, when more grid divided sliding window, the more smooth scrolling sliding window, limiting the statistics will be more accurate.

Sample code is as follows:

//服务访问次数，可以放在Redis中，实现分布式系统的访问计数
Long counter = 0L;
//使用LinkedList来记录滑动窗口的10个格子。
LinkedList<Long> ll = new LinkedList<Long>();

public static void main(String[] args)
{
    Counter counter = new Counter();

    counter.doCheck();
}

private void doCheck()
{
    while (true)
    {
        ll.addLast(counter);

        if (ll.size() > 10)
        {
            ll.removeFirst();
        }

        //比较最后一个和第一个，两者相差一秒
        if ((ll.peekLast() - ll.peekFirst()) > 100)
        {
            //To limit rate
        }

        Thread.sleep(100);
    }
}

Leaky Bucket Algorithm

I.e. leaky bucket algorithm leaky bucket is a very common limiting algorithm, can be used to implement traffic shaping (Traffic Shaping) and flow control (Traffic Policing). Posted on Wikipedia a schematic diagram to help understand:

The main concept of the leaky bucket algorithm is as follows:

A fixed capacity of the bucket, the outflow rate of the water droplets in accordance with a fixed constant;
If the bucket is empty, you do not need out of the water droplets;
Water droplets can flow into the bucket at any rate;
If the inflow exceeds the capacity of the drum drops, droplets flowing into the overflow (discarded), the bucket capacity is constant.

漏桶算法比较好实现，在单机系统中可以使用队列来实现（.Net中TPL DataFlow可以较好的处理类似的问题，你可以在这里找到相关的介绍），在分布式环境中消息中间件或者Redis都是可选的方案。

令牌桶算法

令牌桶算法是一个存放固定容量令牌（token）的桶，按照固定速率往桶里添加令牌。令牌桶算法基本可以用下面的几个概念来描述：

令牌将按照固定的速率被放入令牌桶中。比如每秒放10个。
桶中最多存放b个令牌，当桶满时，新添加的令牌被丢弃或拒绝。
当一个n个字节大小的数据包到达，将从桶中删除n个令牌，接着数据包被发送到网络上。
如果桶中的令牌不足n个，则不会删除令牌，且该数据包将被限流（要么丢弃，要么缓冲区等待）。

如下图：

令牌算法是根据放令牌的速率去控制输出的速率，也就是上图的to network的速率。to network我们可以理解为消息的处理程序，执行某段业务或者调用某个RPC。

漏桶和令牌桶的比较

令牌桶可以在运行时控制和调整数据处理的速率，处理某时的突发流量。放令牌的频率增加可以提升整体数据处理的速度，而通过每次获取令牌的个数增加或者放慢令牌的发放速度和降低整体数据处理速度。而漏桶不行，因为它的流出速率是固定的，程序处理速度也是固定的。

整体而言，令牌桶算法更优，但是实现更为复杂一些。

限流算法实现

Guava

Guava是一个Google开源项目，包含了若干被Google的Java项目广泛依赖的核心库，其中的RateLimiter提供了令牌桶算法实现：平滑突发限流(SmoothBursty)和平滑预热限流(SmoothWarmingUp)实现。

1、常规速率：

创建一个限流器，设置每秒放置的令牌数：2个。返回的RateLimiter对象可以保证1秒内不会给超过2个令牌，并且是固定速率的放置。达到平滑输出的效果

public void test()
{
    /**
     * 创建一个限流器，设置每秒放置的令牌数：2个。速率是每秒可以2个的消息。
     * 返回的RateLimiter对象可以保证1秒内不会给超过2个令牌，并且是固定速率的放置。达到平滑输出的效果
     */
    RateLimiter r = RateLimiter.create(2);

    while (true)
    {
        /**
         * acquire()获取一个令牌，并且返回这个获取这个令牌所需要的时间。如果桶里没有令牌则等待，直到有令牌。
         * acquire(N)可以获取多个令牌。
         */
        System.out.println(r.acquire());
    }
}

上面代码执行的结果如下图，基本是0.5秒一个数据。拿到令牌后才能处理数据，达到输出数据或者调用接口的平滑效果。acquire()的返回值是等待令牌的时间，如果需要对某些突发的流量进行处理的话，可以对这个返回值设置一个阈值，根据不同的情况进行处理，比如过期丢弃。

2、突发流量：

突发流量可以是突发的多，也可以是突发的少。首先来看个突发多的例子。还是上面例子的流量，每秒2个数据令牌。如下代码使用acquire方法，指定参数。

System.out.println(r.acquire(2));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));

得到如下类似的输出。

如果要一次新处理更多的数据，则需要更多的令牌。代码首先获取2个令牌，那么下一个令牌就不是0.5秒之后获得了，还是1秒以后，之后又恢复常规速度。这是一个突发多的例子，如果是突发没有流量，如下代码：

System.out.println(r.acquire(1));
Thread.sleep(2000);
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));
System.out.println(r.acquire(1));

得到如下类似的结果：

等了两秒钟之后，令牌桶里面就积累了3个令牌，可以连续不花时间的获取出来。处理突发其实也就是在单位时间内输出恒定。这两种方式都是使用的RateLimiter的子类SmoothBursty。另一个子类是SmoothWarmingUp，它提供的有一定缓冲的流量输出方案。

/**
* 创建一个限流器，设置每秒放置的令牌数：2个。速率是每秒可以210的消息。
* 返回的RateLimiter对象可以保证1秒内不会给超过2个令牌，并且是固定速率的放置。达到平滑输出的效果
* 设置缓冲时间为3秒
*/
RateLimiter r = RateLimiter.create(2,3,TimeUnit.SECONDS);

while (true) {
    /**
     * acquire()获取一个令牌，并且返回这个获取这个令牌所需要的时间。如果桶里没有令牌则等待，直到有令牌。
     * acquire(N)可以获取多个令牌。
     */
    System.out.println(r.acquire(1));
    System.out.println(r.acquire(1));
    System.out.println(r.acquire(1));
    System.out.println(r.acquire(1));
}

输出结果如下图，由于设置了缓冲的时间是3秒，令牌桶一开始并不会0.5秒给一个消息，而是形成一个平滑线性下降的坡度，频率越来越高，在3秒钟之内达到原本设置的频率，以后就以固定的频率输出。图中红线圈出来的3次累加起来正好是3秒左右。这种功能适合系统刚启动需要一点时间来“热身”的场景。

Nginx

对于Nginx接入层限流可以使用Nginx自带了两个模块：连接数限流模块ngx_http_limit_conn_module和漏桶算法实现的请求限流模块ngx_http_limit_req_module。

1、ngx_http_limit_conn_module

我们经常会遇到这种情况，服务器流量异常，负载过大等等。对于大流量恶意的攻击访问，会带来带宽的浪费，服务器压力，影响业务，往往考虑对同一个ip的连接数，并发数进行限制。ngx_http_limit_conn_module 模块来实现该需求。该模块可以根据定义的键来限制每个键值的连接数，如同一个IP来源的连接数。并不是所有的连接都会被该模块计数，只有那些正在被处理的请求（这些请求的头信息已被完全读入）所在的连接才会被计数。

我们可以在nginx_conf的http{}中加上如下配置实现限制：

#限制每个用户的并发连接数，取名one
limit_conn_zone $binary_remote_addr zone=one:10m;

#配置记录被限流后的日志级别，默认error级别
limit_conn_log_level error;
#配置被限流后返回的状态码，默认返回503
limit_conn_status 503;

然后在server{}里加上如下代码：

#限制用户并发连接数为1
limit_conn one 1;

然后我们是使用ab测试来模拟并发请求：ab -n 5 -c 5 http://10.23.22.239/index.html

得到下面的结果，很明显并发被限制住了，超过阈值的都显示503：

另外刚才是配置针对单个IP的并发限制，还是可以针对域名进行并发限制，配置和客户端IP类似。

#http{}段配置
limit_conn_zone $ server_name zone=perserver:10m;
#server{}段配置
limit_conn perserver 1;

2、ngx_http_limit_req_module

上面我们使用到了ngx_http_limit_conn_module 模块，来限制连接数。那么请求数的限制该怎么做呢？这就需要通过ngx_http_limit_req_module 模块来实现，该模块可以通过定义的键值来限制请求处理的频率。特别的，可以限制来自单个IP地址的请求处理频率。限制的方法是使用了漏斗算法，每秒固定处理请求数，推迟过多请求。如果请求的频率超过了限制域配置的值，请求处理会被延迟或被丢弃，所以所有的请求都是以定义的频率被处理的。

在http{}中配置

#区域名称为one，大小为10m，平均处理的请求频率不能超过每秒一次。

limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

在server{}中配置

#设置每个IP桶的数量为5
limit_req zone=one burst=5;

上面设置定义了每个IP的请求处理只能限制在每秒1个。并且服务端可以为每个IP缓存5个请求，如果操作了5个请求，请求就会被丢弃。

使用ab测试模拟客户端连续访问10次：ab -n 10 -c 10 http://10.23.22.239/index.html

如下图，设置了通的个数为5个。一共10个请求，第一个请求马上被处理。第2-6个被存放在桶中。由于桶满了，没有设置nodelay因此，余下的4个请求被丢弃。

有道无术，术可成；有术无道，止于术

欢迎大家关注Java之道公众号

好文章，我在看❤️

Hollis在csdn 博客专家

发布了99 篇原创文章 · 获赞 4232 · 访问量 83万+

他的留言板关注

Panic, and asked how to do the interview actually limiting high concurrent systems?

Cache

Demote

Limiting

Limiting the algorithm

限流算法实现

Guess you like