CPU Utilization is easy to understand, that is, the utilization rate of the CPU is higher than 75% (some say it is 80% or higher). In addition to this indicator, combined with Load Average and Context Switch Rate, it is possible that the high CPU is caused by the high levels of the latter two indicators.
Load Average , this is hard to measure. I searched around the Internet and found a few reasonable explanations. In my 100 concurrent user tests, these two values are: 77.534%, 6.108, the CPU utilization is relatively high, and the Load Average seems to be a bit high. Later, I found the following two blog posts: Understand Load Average and do a stress test , "Load Average is the load of the CPU , and the information it contains is not the usage of the CPU , but what the CPU is processing and waiting for the CPU to process over a period of time. The statistical information of the sum of the number of processes, that is, the statistical information of the length of the CPU usage queue. ”, which basically explains the principle of multi-process and multi-thread programs. To understand the load average (translation) of a Linux processor, it is simply a sentence:
Load Average < number of CPUs * number of cores * 0.7
For example, for a 1-core CPU, Load Average < 1 * 1 * 0.7; for a 4-core CPU, Load Average must be < 1 * 4 * 0.7 = 2.8.
View cpu information: grep 'model name' /proc/cpuinfo
Context Switch Rate . It is Process (Thread) switching. If there are too many switches, the CPU will be busy switching, which will also affect the throughput. Section 2 of the article " High-Performance Server Architecture " is about this problem. How much is appropriate? I googled a lot, but there is no definite explanation. Context Switch is generally composed of two parts: interrupt and process (including thread) switching. An interrupt (Interrupt) will cause a switch, and the creation and activation of a process (thread) will also cause a switch. The value of CS is also related to TPS (Transaction Per Second). Assuming that each call will cause N times of CS, then it can be obtained
Context Switch Rate = Interrupt Rate + TPS* N
CSR minus IR is the process/thread switch. If the main process receives the request and hands it over to the thread for processing, the thread is returned to the main process after processing. Here are 2 switches. You can also substitute the values of CSR, IR, and TPS into the formula to get the number of switches caused by each transaction. Therefore, to reduce CSR, you must work hard on each TPS-induced handover, and only if the value of N is reduced, the CSR can be reduced, ideally N=0, but in any case if N >= 4, then check carefully Inspection of. In addition, the CSR<5000 mentioned on the Internet, I think the standard should not be so single.
other information:
These three indicators can be monitored in LoadRunner; in addition, in linux, you can also use vmstat to view r (Load Arerage), in (Interrupt) and cs (Context Switch)
#vmstat 1 5
procs --------------memory------------- ----swap-- ---io-- -system------cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 244644 29156 415720 2336484 0 0 1 49 2 1 1 0 98 0
0 0 244644 29140 415720 2336484 0 0 0 28 9 115 0 0 99 1
0 0 244644 29140 415720 2336484 0 0 0 24 62 256 0 0 100 0
0 0 244644 29140 415720 2336484 0 0 0 0 5 93 0 0 100 0
0 0 244644 29140 415720 2336484 0 0 0 0 58 255 0 0 100 0
Interrupt Rate包括内核由于进程的时间片中断。(在 Linux 2.6 中,系统时钟每 1 毫秒中断一次时钟频率,用 HZ 宏表示,定义为 1000,即每秒中断 1000 次。系统不一样,内核不一样配置100、250的都有。)
内核的时钟频率可以通过如下命令知道
cat /boot/config-`uname -r` | grep '^CONFIG_HZ='
CONFIG_HZ=100
每秒总的时钟中断数就是 = cpu个数 * 核数 * CONFIG_HZ
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
LOC: 97574747 52361843 105207680 69447653 Local timer interrupts
RES: 107368 257510 98635 186294 Rescheduling interrupts
CAL: 14174 14206 14164 194 function call interrupts
TLB: 1007949 853117 992546 591410 TLB shootdowns
可以查看中断的类型以及次数
后记:
发现一篇Linux性能监控的文章,特贴在这里,供参考
http://blog.csdn.net/tianlesoftware/archive/2011/02/21/6198780.aspx
https://my.oschina.net/tantexian/blog/648911