The stress test measures three indicators of CPU: CPU Utilization, Load Average and Context Switch Rate

Simple understanding of cpu usage load average under Linux
 
The load average can generally be observed through top or uptime in the following format: 
load average: 0.10, 0.05, 0.58 
 
Represents the average load of the CPU in the last 1 minute, 5 minutes, and 15 minutes, respectively. 
 
If it is a single-core CPU, 1.00 means that the CPU is fully loaded, 
If it is a multi-core CPU, if the load average reaches the number of cores of the CPU, it means that the CPU is fully loaded. 
If there are multiple physical CPUs, when the load average reaches the total number of cores of all physical CPUs, it means that the system CPU is fully loaded. 
 
In short, the number of CPU cores is the basis for us to measure the CPU load according to the load average. 
 
So when the CPU load hits what value should we pay attention to? The empirical value given by the reference article is the number of CPU cores minus 0.3, 
That is, if the total number of CPU cores is 4, when the value of the load average in 15 minutes reaches 3.7, we should enter the system and take a look.

 

 

CPU Utilization is easy  to understand, that is, the utilization rate of the CPU is higher than 75% (some say it is 80% or higher). In addition to this indicator, combined with Load Average and Context Switch Rate, it is possible that the high CPU is caused by the high levels of the latter two indicators.

 

Load Average  , this is hard to measure. I searched around the Internet and found a few reasonable explanations. In my 100 concurrent user tests, these two values ​​are: 77.534%, 6.108, the CPU utilization is relatively high, and the Load Average seems to be a bit high. Later, I found the following two blog posts:  Understand Load Average and do a stress test  , "Load Average is the  load of the  CPU , and the information it contains is not the usage of the  CPU , but what the  CPU is processing and waiting for the  CPU to process over a period of time. The statistical information of the sum of the number of processes, that is, the statistical information of the length of the  CPU usage queue.  ”, which basically explains the principle of multi-process and multi-thread programs. To understand the load average (translation) of a Linux processor, it is simply a sentence:

 

    Load Average < number of CPUs * number of cores * 0.7

 

For example, for a 1-core CPU, Load Average < 1 * 1 * 0.7; for a 4-core CPU, Load Average must be < 1 * 4 * 0.7 = 2.8.

 

View cpu information: grep 'model name' /proc/cpuinfo

 

Context Switch Rate  . It is Process (Thread) switching. If there are too many switches, the CPU will be busy switching, which will also affect the throughput.  Section 2 of the article " High-Performance Server Architecture " is about this problem. How much is appropriate? I googled a lot, but there is no definite explanation. Context Switch is generally composed of two parts: interrupt and process (including thread) switching. An interrupt (Interrupt) will cause a switch, and the creation and activation of a process (thread) will also cause a switch. The value of CS is also related to TPS (Transaction Per Second). Assuming that each call will cause N times of CS, then it can be obtained

 

     Context Switch Rate = Interrupt Rate + TPS* N

 

CSR minus IR is the process/thread switch. If the main process receives the request and hands it over to the thread for processing, the thread is returned to the main process after processing. Here are 2 switches. You can also substitute the values ​​of CSR, IR, and TPS into the formula to get the number of switches caused by each transaction. Therefore, to reduce CSR, you must work hard on each TPS-induced handover, and only if the value of N is reduced, the CSR can be reduced, ideally N=0, but in any case if N >= 4, then check carefully Inspection of. In addition, the CSR<5000 mentioned on the Internet, I think the standard should not be so single.

 

other information:

These three indicators can be monitored in LoadRunner; in addition, in linux, you can also use vmstat to view r (Load Arerage), in (Interrupt) and cs (Context Switch)

#vmstat 1 5

procs --------------memory-------------    ----swap-- ---io--   -system------cpu----
 r   b   swpd   free     buff       cache         si   so    bi    bo   in   cs us sy id wa
 0  0 244644  29156 415720 2336484    0    0     1    49    2    1      1  0 98    0
 0  0 244644  29140 415720 2336484    0    0     0    28    9    115  0  0 99    1
 0  0 244644  29140 415720 2336484    0    0     0    24    62  256  0  0 100  0
 0  0 244644  29140 415720 2336484    0    0     0     0     5    93    0  0 100  0
 0  0 244644  29140 415720 2336484    0    0     0     0     58  255  0  0 100  0

 

 

Interrupt Rate包括内核由于进程的时间片中断。(在 Linux 2.6 中,系统时钟每 1 毫秒中断一次时钟频率,用 HZ 宏表示,定义为 1000,即每秒中断 1000 次。系统不一样,内核不一样配置100、250的都有。)

内核的时钟频率可以通过如下命令知道

cat /boot/config-`uname -r` | grep '^CONFIG_HZ='

CONFIG_HZ=100

每秒总的时钟中断数就是 = cpu个数 * 核数 * CONFIG_HZ

 

cat /proc/interrupts

          CPU0       CPU1       CPU2       CPU3       
LOC:   97574747   52361843  105207680   69447653   Local timer interrupts
RES:     107368     257510      98635     186294   Rescheduling interrupts
CAL:      14174      14206      14164        194   function call interrupts
TLB:    1007949     853117     992546     591410   TLB shootdowns

可以查看中断的类型以及次数 

 

后记:

发现一篇Linux性能监控的文章,特贴在这里,供参考

http://blog.csdn.net/tianlesoftware/archive/2011/02/21/6198780.aspx

 

https://my.oschina.net/tantexian/blog/648911

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326174473&siteId=291194637