Understand load average and cpu usage (transfer)

 

Reprinted from: https://my.oschina.net/laichendong/blog/283799?p=1

In  "find the most CPU-consuming Java code in linux" , the cpu usage and load average are mentioned. But it did not explain the specific meaning of the two and the relationship between them. They are two important indicators to measure the stress of a linux system. Use the # top command to view the relevant values. To understand these two concepts, we must first start with the time slice of the CPU. Everyone knows that the Windows, Linux, and Mac OS X we use now are "multitasking operating systems", which means that they can run multiple programs "simultaneously", such as watching movies and chatting on QQ. However, in fact, a CPU core can only do one thing at the same time, so how does the operating system achieve "multitasking"? The approximate approach is to have multiple processes take turns using the CPU for a short period of time. Since this "short period of time" is very short (between 5ms-800ms on linux), the user (human) cannot feel it, as if several programs are running at the same time. The "a small period of time" mentioned above is what we call a CPU time slice. The CPU usage is the occupancy of the CPU time slice by the program. For example, for a period of time, the process of watching a movie takes up 30ms of CPU resources, QQ takes up another 10ms, and then idles for 60ms. Then the movie took another 30ms, QQ took 10ms, and was idle for 60ms. If it is at this level for a period of time, the CPU usage rate during this period is about 40%.  Generally speaking, CPU usage above 75% is a relatively high value. Tip: In the top command, press the number key 1 to view the usage of each CPU core. Load average is relatively complicated. It represents the CPU load, but the information it contains is not the CPU usage status, but the statistics of the sum of the number of processes that the CPU is processing and waiting for the CPU to process over a period of time, that is, the length of the CPU usage queue. Statistics.20130322165207 For example: Go to an amusement park and ride a roller coaster. Assuming that a roller coaster can hold 30 people, when 1-30 people come to play the roller coaster, the load of the roller coaster is considered to be <1; when there are exactly 30 people, load=1; when more than 30 people, load>1. If 45 people want to ride the roller coaster, that means 30 people can go directly to the roller coaster, and the other 15 people need to wait. At this time, the load of the roller coaster is 45/30 = 1.5. That is to say, a load of 1.5 means that the system is currently running at full capacity, and there are still requests waiting for the equivalent of 50% full load. Maybe you would say. No, I often see that the load of my machine is around 3, but my system is running normally, and it does not feel like it is overloaded by 3 times. Well, then I guess you have a machine with at least 4 cores! in multi-core systems  . The load average is determined based on the number of cores, which can be simply understood as the sum of the loads of each core. Calculated according to the 100% load of each core, 4 cores, the value of load average is 4. So how high the load is a critical point? There is no absolute value for this. A relatively accepted statement is:  load average should be <= cpu cores * 0.7 . But there is a problem with this calculation. When the number of cpu cores gets bigger and bigger, the 30% idleness gets bigger and bigger. Another accepted statement is:  load average should be <= cpu cores - 1 to 2. This calculation method also has a disadvantage. When the number of cpu cores is getting smaller and smaller, the proportion of the previous 1-2 is also getting larger and larger. This is clearly debatable. We found that either with #top or with the #uptime command. The loads viewed are all 3 values. Represents the average load of the last 1 minute, 5 minutes and 15 minutes, respectively. So which one should we look at, and which value should we take? In fact, the reason for giving 3 values ​​is that we hope we can combine them. Or you want to display a dynamic chart-like data, such as showing 120% of the load in the last minute. And the last 5 minutes and 15 minutes show that the load is 50%. Maybe you don't have to worry too much. But if you find that the system load has been maintained above 120%. I'm afraid you should add a machine. There is another situation,  high Load, but low CPU usage. This is a bizarre situation that many people don't understand. Take the example of the roller coaster above. Suppose a total of 60 people come to ride the roller coaster. The roller coaster runs for 5 minutes at a time. Between runs, it took 5 minutes for the first batch of 30 people to get off the bus, the second batch of 30 people to get on the bus, put on the safety gear, etc. In this case, the utilization rate of the roller coaster is about 50%. And the load of the roller coaster is 2. Corresponding to our CPU, when there are too many running processes (threads), frequent context switching consumes a lot of CPU time, resulting in less CPU time slices (low CPU usage) actually used for computing, but there are many Process is waiting to run (high Load).

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326177860&siteId=291194637