No warm-up, this is not called high concurrency, high concurrent call!

As we all know, high concurrency system has three axes: 缓存, 熔断and 限流. But there is an ax, often forgotten in the corner, frustrated, that is 预热.


1


One, for example the phenomenon

Let me talk about two phenomena. These phenomena can only occur in highly concurrent systems.
Well, it has caused more problems.

1, DB after the restart, the moment of death

DB under a highly concurrent environment, restart the process after death. As business at its peak during the upstream load balancing strategy occurred reallocation. Just started instantly accepted the DB-third of traffic, and then load the crazy soared, until no further response.

The reason: the new start of DB, and various Cache is not ready, very different system state and normal operation. 1/10 possible usual amount, you can put it into death.

2, after the service is restarted, access exceptions

Another common problem is: I have a server problem occurs, the load balancing effect, the rest of the machine immediately carries these requests, a good run. When the service rejoins the cluster, it has undergone a large number of requests high time-consuming and, in case of a high volume of requests, even the armies of failure.

Causes probably can be attributed to:

(1) service starts, jvm not completely ready, JIT compilation is not.
(2) resources used by the application is not ready.
(3) Load balancing occurs rebalance.


These two issues are not well preheating

Warm Up, namely cold start / warm-up way. When the system is in a low level of long-term, when the sudden increase in traffic, directly to the system instantly pulled up to the high water level may overwhelm the system. By "cold start", so that the flow rate through increased slowly, gradually increased the upper threshold within a certain time, to the cooling system of a warm-up time, avoid the cooling system is overwhelmed.

I want this curve.


1And is not the case.



1


Second, the facts are much more complex

流量是不可预测的,这不同于自然增长的流量,或者人为的***----这是一个从无到有的过程。甚至一些自诩超高速的组件,如lmax的disruptor,在这种突然到来的洪峰之下也会崩溃。

warmup最合适的切入层面就是网关。如图:node4是刚启动的节点,集成在网关中的负载均衡组件,将能够识别出这台刚加入的实例,然后逐步放量到这台机器,直到它能够真正承受高速流量。


1

假如所有的请求,都经过网关,一切都好办的多,也有像Sentinel 之类的组件进行切入。但现实情况往往不能满足条件。比如:


(1)你的应用直接获取了注册中心的信息,然后在客户端组件中进行了流量分配。
(2)你的应用通过了一些复杂的中间件和路由规则,最终定位到某一台DB上。
(3)你的终端,可能通过了MQTT协议,直接连上了MQTT服务端。

我们进行一下抽象,可以看到:所有这些流量分配逻辑,包括网关,都可以叫做客户端。即所有的warmup逻辑都是放在客户端的,它们都与负载均衡紧密耦合在一起。

三、解决方式

1、接口放量

按照以上的分析,通过编码手段控制住所有的客户端调用,即可解决问题。

一个简单的轮询方式

(1)我要能拿到所有要调用资源的集合,以及启动时间,冷启动的配置等。
(2)给这些资源分配一些权重,比如最大权重为100,配置100秒之后冷启动成功。假如现在是第15秒,则总权重就是100*(n-1)+15。
(3)根据算好的权重,进行分配,流量会根据时间流逝逐步增加,直到与其他节点等同。
(4)一个极端情况,我的后端只有1个实例,根本就启动不起来。

拿SpringCloud来说,我们就要改变这些组件的行为。

Load balancing strategy (1) ribbon of.
Load balancing strategy (2) gateway.

Fortunately, they are the basic components, do not copy the code back and forth.

2, fly

As the name implies, it means that all the interfaces are accessible again in advance, so that system resources to prepare in advance. For example, through all the http connection, and send the request. This approach is partially effective, a number of lazy loading of resources will continue to come in loaded at this stage, but not all. JIT and other enhancements that may make the process very long warm-up, cursory way, can only have a role to a certain extent.

Another example is certain DB, after the start, will perform some very 有特点of sql, so that the heat is most needed data is loaded into PageCache years.

3, the state reserved

Take a snapshot of the system at the time of death, then on startup, restore back intact.

This process is more magical, because the general non-normal shutdown, the system no opportunity to make last words, we can only timed, do a snapshot of the system is running in.

Node at startup, and then the snapshot is loaded into memory. It is widely used in some type of memory component.

Four, End

By comparison, we found that the most likely way or encoded logic integrated in the warmup 客户端. This work may be painful, lengthy, but the outcome is good.
Of course, you can also "removed nginx-> modify the weights -> reload nginx" way. Sometimes very effective, but not always effective, usually but not always at ease at ease.
Everything with you. After all, no foreplay to the point, called reckless.



Guess you like

Origin blog.51cto.com/14378044/2415282