HyperMetro and Virtual Machine BUG

In the past few days, several bugs have been discovered.
Write a pseudo code (I am more familiar with pathon, so I will write it in python)

try:
    if 主机1正常 and 主机1内存占用比2低:
       主机1处理
    if  主机2正常 and 主机2内存占用比1
       主机2处理
except
    系统崩溃,告警

There is a logic to this: when the performance of host 1 is worse than that of host 2, the average performance is more inclined to 1.

In other words: if host 1 is placed first, then host 1 will be processed first.



For example: host 1 is equipped with i3, and host 2 is equipped with i9 9900K. Then host 1 is given priority because of its low memory usage.

For example, host 1 is in an unstable state (for example, host 1's network is unstable and performance is unstable), the system stability is consistent with the worst system, and the slowest processing speed of the system is consistent with the worst performance.

Only consider load balancing, not performance...
So, under weak network conditions, the performance of the system will collapse...

Under normal circumstances, the performance between hosts is similar. But if there is a naive person who insists on giving a poorly-performing host as more activity, it is not the average performance, but the volatility of the performance that deteriorates.

The second is the virtual machine BUG. Multiple virtual machines use one CPU core, and the process will be automatically selected when the system is in use.
Sometimes, the system does not allocate cores like this: 123 to A and 456 to B. Rather: I will give you the core, and you will share it by yourself. As long as the demand does not exceed the limit, I will satisfy you.
When two virtual machines are not "selecting idle cores", but "selecting cores randomly".
When using the same core, it will cause the system to freeze.

For example, the main frequency of a CPU is 3000. If it is multi-core, if three virtual machines use the same thread at the same time, it will cause the system to freeze.
The problems that arise are: low memory footprint, low system footprint, and low performance footprint, but the three sub-threads all use the same core, and the system performance is extremely low.

The problem is that the rich must have a U. Large companies have their own distribution rules, so try to avoid it. But sometimes, programmers have not considered the CPU allocation rules...

——————————————————————————————
There is another problem: when a service prefers a port, then it does not check whether the system is stable .
That is to say: each connection is judged only once and the optimal node is selected. Unless the system actively closes the connection, it will hang all the time.
When the system crashes, the client does not try to reconnect, but repeatedly tries and pin the connection until the maximum number of times.
(If you don’t understand, you can look at the three-way handshake and four waved hands. The server unilaterally stops the service and cannot stop the connection.)
If someone happily matches 200 attempts, every 0.5 seconds, the system will be stuck. 100 seconds.

If the host is not crashing, but the weak network is intermittent, then the stability of the system will be worse. The system will not actively choose better nodes, but will try again and again.

For example: the connection success rate is 20%, and the maximum number of attempts is 20. The client will try again and again. Cause a large number of service timeouts.

Guess you like

Origin blog.csdn.net/weixin_45642669/article/details/113543250