Availability of limiting

In today's Internet has a great social environment infrastructure, above this scene is actually not so far away from us, but also becomes less extreme.

For example, the endless stream of marketing play, one after the social hot spots, and black under the Internet iceberg production, brush flourish, making this scenario becomes more so need to consider, to think twice.

Because any time you are likely to exceed the expected influx of traffic, and then overwhelm your system.

So limiting the role is very obvious: as long as the system is not down, the system simply because not enough resources, and can not cope with a large number of requests, in order to ensure that the limited resources of the system can provide maximum service capacity, and thus the system according to the preset a method of traffic rules (input or output) limit, the upper limit is to ensure that traffic does not exceed the received system can carry.

First, how do "limiting"

Talk from the front to the content, we also know that limiting is best to "limit" near the upper limit in a system processing power, so:

1, the upper limit of the ability to get through the system and other "stress test" the way in which level is the first step.

2, secondly, it is to develop strategies to intervene traffic. For example, how to set the standard, or whether to focus only on results but also pay attention to the smoothness of the process and so on.

3, finally, is to deal with the flow "is out of the intervention," the. You can not be directly discarded? How not, then the process?

Limit the ability of the system to obtain

The first step in this is not the focus of our content, that it is for the system to do a pressure test. May be performed in a separate environment can also select a node as a sample to measure the pressure, of course, we need to be isolated from other nodes in the plurality of nodes directly in a production environment.

Generally, we do pressure measured in order to obtain two results, "rate" and "concurrent."

The former represents the number of requests within a time unit can be processed, such as request / sec xxx times. The latter indicates that the system can handle at the same time the maximum number of requests, such as xxx concurrent times. Need to obtain "maximum" from the index, "average" or "median." Follow-limiting specific criteria need to set the value of the policy is to come from these indicators.

Digression: From the perspective of excellence, such as other cpu, memory and network bandwidth may be consumed as a reference element.

Develop strategies to intervene traffic

Common strategy on four kinds, I gave it a simple definition - "two windows two barrels."

Window is two: a fixed window, sliding window, two barrels is: leaky bucket, token bucket.

Fixed window

Fixed window is to define a "fixed" statistical period, such as one minute, or 30 seconds, 10 seconds so.

Then count the number of requests in the current cycle being received in each cycle, after accumulating if the counter reaches a set threshold to trigger "traffic intervention."

Until the next cycle, the counter is cleared, the flow returns to normal reception state.

Fixed window

The easiest strategy, and wrote a few lines of code did not.

全局变量 int totalCount = 0;  //有一个「固定周期」会触发的定时器将数值清零。

if(totalCount > 限流阈值) {

    return; //不继续处理请求。

}

totalCount++;

// do something...

Fixed window One thing to note is that if a request to enter very concentrated, then set "limit threshold" is equivalent to the maximum number of concurrent You need to bear.

So, if you need to think twice to concurrency issues, so "fixed period" here is set to be as short as possible. Because, in this case the value of the "current limit threshold" can be reduced accordingly.

Indeed, the current limit threshold can directly specify the number of concurrent use. For example, assuming a fixed period is 3 seconds, this threshold value can be set to "average number of concurrent * 3."

But no matter how set, the shortcomings fixed window is always there: Due to incoming traffic often is not a constant value, so once traffic enters the speed fluctuations or counter will be ahead of the expiration, leading to the rest of the time during this period request segments are "restricted."

Or it is dissatisfied with the counter, that is, "limit threshold" set too high, resulting in resources can not be fully utilized.

"Sliding window" can improve this problem.

Sliding window

Sliding window is actually fixed window made further subdivide the original cut finer particle size, such as 1 minute, the fixed window 60 cut into 1 second sliding window.

Then the statistical time shift over time after synchronization.

Sliding window

At the same time, we can draw a conclusion: If the "fixed period" fixed window is very small, then the sense there will be no sliding window. for example,

Now has a fixed window period is 1 second, and then cut assigned to the millisecond level can but worth the candle, will bring significant performance and resource depletion.

Sliding window is substantially the code logic:

全局数组 链表[]  counterList = new 链表[切分的滑动窗口数量];
//有一个定时器,在每一次统计时间段起点需要变化的时候就将索引0位置的元素移除,并在末端追加一个新元素。
int sum = counterList.Sum();
if(sum > 限流阈值) {
    return; //不继续处理请求。
}

int 当前索引 = 当前时间的秒数 % 切分的滑动窗口数量;
counterList[当前索引]++;
// do something...

Although sliding window can improve this problem, but still essentially pre-delineated time slice of the way, belongs to a "predict", means that almost certainly can not be 100% the best use.

However, the "bucket" model can do better, because "barrel" model with an extra buffer (bucket itself).

Leaky Bucket

First, talk about "leaky bucket" of it. The core rate leaky bucket model is fixed "export", no matter how much came out rate has been so much.

If the amount of the influx of multi-barrel are not fit, then the process of "flow to intervene."

Leaky Bucket

The entire implementation process we have to break it down.

  1. 控制流出的速率。这个其实可以使用前面提到的两个“窗口”的思路来实现。如果当前速率小于阈值则直接处理请求,否则不直接处理请求,进入缓冲区,并增加当前水位。

  2. 缓冲的实现可以做一个短暂的休眠或者记录到一个容器中再做异步的重试。

  3. 最后控制桶中的水位不超过最大水位。这个很简单,就是一个全局计数器,进行加加减减。

这样一来,你会发现本质就是:通过一个缓冲区将不平滑的流量“整形”成平滑的(高于均值的流量暂存下来补足到低于均值的时期),以此最大化计算处理资源的利用率。

实现代码的简化表示如下:

全局变量 int unitSpeed;  //出口当前的流出速率。每隔一个速率计算周期(比如1秒)会触发定时器将数值清零。

全局变量 int waterLevel; //当前缓冲区的水位线。

if(unitSpeed < 速率阈值) {

    unitSpeed++;
    //do something...
}else{
    if(waterLevel > 水位阈值){
        return; //不继续处理请求。
    }

    waterLevel++;
    while(unitSpeed >= 速率阈值){
        sleep(一小段时间)
    }

    unitSpeed++;
    waterLevel--;
    //do something...
}

更优秀的「漏桶」策略已经可以在流量的总量充足的情况下发挥你所预期的100%处理能力,但这还不是极致。

你应该知道,一个程序所在的运行环境中,往往不单单只有这个程序本身,会存在一些系统进程甚至是其它的用户进程。

也就是说,程序本身的处理能力是会被干扰的,是会变化的。所以,你可以预估某一个阶段内的平均值、中位数,但无法预估具体某一个时刻的程序处理能力。

又因此,你必然会使用相对悲观的标准去作为阈值,防止程序超负荷。

那么从资源利用率来说,有没有更优秀的方案呢?有,这就是「令牌桶」。

令牌桶

令牌桶模式的核心是固定“进口”速率。

先拿到令牌,再处理请求,拿不到令牌就被「流量干预」。

因此,当大量的流量进入时,只要令牌的生成速度大于等于请求被处理的速度,那么此刻的程序处理能力就是极限。

Token Bucket

也来分解一下它的实现过程。

  1. 控制令牌生成的速率,并放入桶中。这个其实就是单独一个线程在不断的生成令牌。

  2. 控制桶中待领取的令牌水位不超过最大水位。这个和「漏桶」一样,就是一个全局计数器,进行加加减减。

大致的代码简化表示如下(看上去像「固定窗口」的反向逻辑):

全局变量 int tokenCount = 令牌数阈值; //可用令牌数。有一个独立的线程用固定的频率增加这个数值,但不大于「令牌数阈值」。

if(tokenCount == 0){
    return; //不继续处理请求。
}
tokenCount--;
//do something...

聪明的你可能也会想到,这样一来令牌桶的容量大小理论上就是程序需要支撑的最大并发数。

的确如此,假设同一时刻进入的流量将令牌取完,但是程序来不及处理,将会导致事故发生。

所以,没有真正完美的策略,只有合适的策略。因此,根据不同的场景能够识别什么是最合适的策略是更需要锻炼的能力。

最佳实践

四种策略该如何选择?

首先,固定窗口。一般来说,如非时间紧迫,不建议选择这个方案,太过生硬。但是,为了能快速止损眼前的问题可以作为临时应急的方案。

其次,滑动窗口。这个方案适用于对异常结果「高容忍」的场景,毕竟相比“两窗”少了一个缓冲区。但是,胜在实现简单。

然后,漏桶。个人觉得这个方案最适合作为一个通用方案。虽说资源的利用率上不是极致,但是「宽进严出」的思路在保护系统的同时还留有一些余地,使得它的适用场景更广。

最后,令牌桶。当你需要尽可能的压榨程序的性能(此时桶的最大容量必然会大于等于程序的最大并发能力),并且所处的场景流量进入波动不是很大(不至于一瞬间取完令牌,压垮后端系统)。

分布式系统中带来的新挑战

new challenge

每一个上游系统都可以理解为是其下游系统的客户端。

然后我们回想一下前面的内容,可能你发现了,前面聊的「限流」都没有提到到底是在客户端做限流还是服务端做,甚至看起来更倾向是建立在服务端的基础上做。

但是你知道,在一个分布式系统中,一个服务端本身就可能存在多个副本,并且还会提供给多个客户端调用,甚至其自身也会作为客户端角色。

那么,在如此交错复杂的一个环境中,该如何下手做限流呢?

我的思路是通过「一纵一横」来考量。

都知道「限流」是一个保护措施,那么可以将它想象成一个盾牌。另外,一个请求在系统中的处理过程是链式的。那么,正如古时候军队打仗一样,盾牌兵除了有小部分在老大周围保护,剩下的全在最前线。因为盾的位置越前,能受益的范围越大。

分布式系统中最前面的是什么?接入层。

如果你的系统有接入层,比如用nginx做的反向代理。

那么可以通过它的ngx_http_limit_conn_module以及ngx_http_limit_req_module来做限流,是很成熟的一个解决方案。

如果没有接入层,那么只能在应用层以AOP的思路去做了。

但是,由于应用是分散的,出于成本考虑你需要针对性的去做限流。比如ToC的应用必然比ToB的应用更需要做,高频的缓存系统必然比低频的报表系统更需要做,Web应用由于存在Filter的机制做起来必然比Service应用更方便。

So limiting between applications to do in the end is what the client or the server?

Personal view is that the effect from the client server model is certainly better than the model, because when in the current limited state, even the act of establishing a client mode connection are saved. Another potential benefit is that, compared with the centralized server model, you can put pressure handful of server programs spread out. But the client also do cost more, because it is decentralized, if the required data between multiple nodes in common, it is a very troublesome thing.

So, eventually you personal advice: If you consider the cost of server mode, consider the effect on the client mode. Of course, is not absolute, such as a server-side most of the flow are derived from a particular client, then the client can do this directly in the current limit, which would be a good solution.

Database level, then the general connection string itself will include the concept of "maximum number of connections", it can play a limiting role. If you want finer control over database access layer can only be done in the framework of a unified package.

Talk over "vertical" So what "cross" is it?

side

Whether multiple copies of multiple clients, or with a server-side. There is the difference in performance is bound to each node, and how to set an appropriate threshold?

And how to change the policy of multiple nodes in the cluster as quickly as possible to take effect?

That is simple, the introduction of a performance monitoring platform and distribution center.

But these do Barbara finally, we expand the subsequent piece of content.

Further Reading

Bloom Filter

Cache Tour Series

ActiveMQ

Reference material

Limiting

Original: Big Box  availability of limiting


Guess you like

Origin www.cnblogs.com/wangziqiang123/p/11618286.html