Sentinel gets a bug in CPU utilization in docker

Sentinel Introduction

Current limiting, fusing, and downgrading are very important contents in microservice governance. At present, there are not many open source components on the market. Guava can be used for simple scenarios, and Hystrix and Sentinel can be used for complex scenarios. Today I am going to talk about Sentinel. Sentinel is an open source product from Alibaba. It can be used online on a large scale with less custom development. From the use experience, it has the following advantages:
  • Lightweight, almost negligible performance loss, only slightly reflected in 10,000 QPS on a single machine;
  • The out-of-the-box console can dynamically and flexibly configure various current-limiting and downgrading rules, and persistence rules require custom plug-ins;

  • Support stand-alone and cluster current limiting, support non-intrusive access to multiple frameworks, such as dubbo, grpc, springMVC, several reactive gateway frameworks, and even Envoy current limiting supported by the latest version;

  • Rich current limit rules, which can be limited by qps, number of threads, hotspot parameters, and system adaptive current limit; fuse rules are also rich, according to response time, abnormal number, abnormal proportion, etc.

BUG to get CPU utilization in docker

The classic usage scenario is that when the service consumer calls the provider, if the provider is weakly dependent, a downgrade rule for an abnormal proportion can be set; for the interface provided by the service provider, a qps or thread limit rule can be set. and then set a "life" of the system adaptive limiting. System adaptive current limiting is that the system restricts access according to its own situation, such as the entrance qps, the total number of threads, cpu load, cpu utilization and other system-level indicators, which can be described as the last life-saving artifact.

Sentinel has a problem getting cpu utilization in docker. First look at the code to obtain CPU utilization:

The CPU load and CPU utilization are obtained through MXBean. From the Java documentation, it can be seen that the getSystemLoadAverage and getSystemCpuLoad methods respectively obtain the average load of the system and the CPU utilization after "normalization".

Java documentation

If it is running on a physical machine or a virtual machine, these codes can get the data we want, but it is not necessary in docker. What is obtained in docker is the CPU load and CPU utilization of the host machine. So I went to Sentinel and filed an issue (this is also the benefit of using open source products). It didn't take long to reply that I used JDK10, but it is not that simple to upgrade a JDK in a production environment.

github issue

After a long time, someone finally solved the problem through code.

Understand system load

The first time I saw this code is a circle, mainly unfamiliar with the definition of cpu utilization and cpu load. After consulting some information, I learned that cpu utilization refers to the cpu occupation time of the program divided by the running time of the program, such as In the case of core, a java program runs for 10 seconds, which takes up 1 second of the CPU, then the CPU utilization rate is 10%. Note that this percentage is not necessarily less than 100%, because there are multi-core parallel capabilities, such as a 4-core The machine ran a java program for 10 seconds and took up 5 seconds of CPU time per core. Then the total CPU time was 20 seconds and the CPU utilization rate was 200%. However, it is pointed out in the documentation of OperatingSystemMXBean that it is normalized, that is, the CPU utilization rate is divided by the CPU core number. Cpu load is well explained in Ruan Yifeng's article "Understanding Linux System Load" . To sum up, cpu load is the number of running processes plus the number of processes waiting to run.

Understand Linux system load

Why do you need cpu load as an indicator of cpu utilization? Because when the system is fully loaded, the CPU utilization is also 100%. Whose load is higher? You need CPU load to compare. CPU load not only indicates the current CPU utilization, but also predicts the future utilization.
Understand the CPU utilization and CPU load combined with Java documentation to understand the meaning of this code, calculate the difference between the running time of each JVM and the time difference of occupied CPU, and use the difference of CPU occupied time divided by the JVM to run The time difference, divided by the number of CPU cores, calculates the normalized CPU utilization. The difference is calculated every time Sentinel in order to obtain a more accurate "instantaneous" CPU utilization, rather than a historical average .
This code has three flaws. One is to accurately obtain the CPU core number assigned by Docker from the JDK8u131 version. The previous version called OperatingSystemMXBean.getAvailableProcessors and Runtime.getRuntime (). AvailableProcessors () will return the host's core number. The versions are larger than this version; the second is that this code can only count the cpu occupancy of a single process. If two java programs are running in the container, then each process can only count the cpu occupied by itself without knowing where the entire system is In this state, from the production environment, the probability of this situation is not large. The docker container generally runs a single process; the third is the final CPU utilization calculated by the host CPU utilization and the current process CPU utilization. The larger value of Docker may be very different when the CPU of docker is restricted or bound, that is, when the CPU resources are isolated. At this time, it is not necessary to pay attention to the CPU utilization of the host machine.

------

Welcome to scan the code and pay attention to my public number "Master Bug Catcher", a small expert in finding bugs, focusing on the back end, bug analysis, and source code analysis.



Guess you like

Origin juejin.im/post/5e918f2c6fb9a03c2f4e11e2