Description of some main indicators of Linux server CPU monitored by LoadRunner

Main indicators of CPU:

CPU Utilization

CPU usage, the percentage of CPU usage time, is relatively high above 75%.

At any time, the CPU has 7 states:
1.idle, which indicates that the CPU is idle and waiting for work assignment.
2.user, which indicates that the CPU is running the user's process
3.system, which indicates that the CPU is performing kernel work
4.nice, which indicates that the CPU The time spent on processes whose priorities have been changed by the nice command (note: the processes whose priorities have been changed by the nice command only refer to those processes with a negative nice value. The time spent on tasks whose priorities have been changed by the nice command will also be Calculated in system and user time, so the total time may add up to more than 100%)
5.iowait, indicating the time the CPU waits for the IO operation to complete
6.irq, indicating the time the CPU spends in responding to hard interrupts
7.softirq, indicating the CPU The time spent on responding to soft interrupts.
We generally use vmstat to see four states: sy, us, id, wa. By combining it with load avg, we can basically know the state of the CPU.

Most performance tools express CPU time in percentage. When system time usage is high, you can use the "oprofile" tool to find out where the time is spent. When iowait is high, you need to analyze your IO devices, such as Disk, network card.

 

Average load

Load average, the average number of processes that were in the "Ready" state at the same time in the last minute.

How do you understand Load? It's like a road with N lanes. If N processes enter the lane, then there is exactly one for each person. If one more car takes up the lane, it will have to wait until one car vacates the lane.
In CPU, it can be understood as the number of tasks that the CPU can process in parallel, which is "number of CPUs * number of cores". If CPU Load = number of CPUs * number of cores, then it means that the CPU is exactly at full load. If it is a little more, it may be necessary Something goes wrong and a task cannot be allocated to the processor in time. To ensure performance, it is best to be less than the number of CPUs * the number of cores * 0.7.


Load Average is the load of the CPU. The information it contains is the statistical information of the sum of the number of processes that the CPU is processing and waiting for the CPU to process within a period of time, that is, the statistical information of the length of the CPU usage queue.

The value of Load Average should be less than "number of CPUs * number of cores * 0.7", otherwise it will be too high.

For example:
a 1-core CPU, Load Average < 1 * 1 * 0.7 = 0.7;
a 4-core CPU, Load Average must be < 1 * 4 * 0.7 = 2.8.
View cpu information: grep 'model name' /proc/cpuinfo

This data is also included in the data seen using vmstat. vmstat looks at r (Load Average).

image

In addition, the top command should add up the CPU usage of each core and calculate a sum, so in the case of multiple cores, the top command will calculate more than 100%.

In Linux, process has two states:
1. runnable
2. blocked waiting for an event to complete
A blocked state process may be waiting for data obtained by an I/O operation, or the result of a system call.
If a process is in the runnable state, it means that it will wait for CPU time with other processes in the runnable state, rather than getting CPU time immediately. A process in the runnable state does not need to consume CPU time, only when Linux schedules the process from the runnable queue Choose which process to execute next.
When the process is in the runnable state and is waiting for CPU time, the waiting queue they form is called Run Queue. The larger the Run Queue, the longer the waiting queue.
Performance tools usually display the number of runnable processes and the number of blocked processes.
Another very common system status is load average. The system load refers to the sum of running and runnable processes.
For example: If there are two processes running and three waiting to be run (runnable), then the system load is five.
Load average refers to the average load within a specified time. Generally, the three numbers displayed by the load average are 1 minute, 5 minutes and 15 minutes respectively.

 

Interrupt rate


Number of device interrupts per second. The number of interrupt requests issued by the CPU received by the hardware driver.

This interrupt is usually triggered as shown below:

When a drive has a time to be accessed by the kernel. For example: if a disk controller obtains a data block from the disk and the kernel needs to read and use this block, then the disk controller will trigger an interrupt; the kernel receives each
interrupt, and an interrupt handler runs. If the interrupt is registered, Otherwise, this interrupt is ignored.
In the system, the interrupt handler has a very high priority and executes very quickly.
Many times, some interrupt processing does not require high processing priority, so there are also soft-interrupt handlers.

If there are many interrupts, the kernel will take a lot of time to handle them.

You can check /proc/interrupts to know which CPU the interrupt occurred on.

 

The Interrupt Rate includes the time slice interruption of the kernel due to the process.
In Linux 2.6, the system clock interrupts the clock frequency in HZ units every 1 millisecond (i.e. 1000 interrupts per second).
The systems are different, and the kernel configurations are different, including 100 and 250.

The clock frequency of the core can be known through the following command: 

cat /boot/config-`uname -r` | grep '^CONFIG_HZ='

CONFIG_HZ=100

The total number of clock interrupts per second is = number of cpu * number of cores * CONFIG_HZ

cat /proc/interrupts can check the type and number of interrupts

          CPU0       CPU1       CPU2       CPU3      
LOC:   97574747   52361843  105207680   69447653   Local timer interrupts
RES:     107368     257510      98635     186294   Rescheduling interrupts
CAL:      14174      14206      14164        194   function call interrupts
TLB:    1007949     853117     992546     591410   TLB shootdowns

The in (Interrupt) checked with vmstat is this parameter

image

Context Switch Rate

Most modern CPUs can only run one process at a time.
Although there are some CPUs, such as CPUs with hyper-threading technology, that can run more than one process at the same time. Linux treats this kind of CPU as multiple single-threaded CPUs.
The Linux kernel constantly switches between different processes, creating the illusion that a single CPU is processing multiple tasks at the same time.
Switching between different processes is called Context Switch.
When the system performs Context Switch, the CPU saves all the context information of the old process and obtains all the context information of the new process.
Context information includes a large amount of Linux tracking information for each process, especially some resources:
which processes are executing, what memory is allocated, which files it has opened, etc.
Switching Context will trigger a large amount of information movement, which is relatively high overhead.
Try to keep context switches small if possible.

In order to minimize context switches, you first need to know how they are generated.
First, kernel scheduling triggers context switches. In order to ensure that each process shares CPU time equally, the kernel periodically interrupts the running process. If appropriate, the
kernel scheduler will start another process instead of letting the current process continue to execute. Each periodic interrupt or scheduled interrupt will May trigger context switch.
The number of scheduled interrupts per second varies between different architectures and different kernel versions.
A simple way to get the number of interrupts per second is to monitor the /proc/interrupts file, see the following example:
root@localhost asm-i386]# cat /proc/interrupts | grep timer; sleep 10 ; cat /proc/interrupts | grep timer
0: 24060043 XT-PIC timer
0: 24070093 XT-PIC timer
You can see the changes in the number of timers within the specified time, and the number of interrupts generated per second is 1,000.
If your context switch is much larger than the timer interrupt. Then context switch is more likely to be caused by I/O requests or other long-term system calls (such as sleep).
When an application requests an operation that cannot be implemented immediately, the kernel initiates a context switch operation:
saving the requested process and trying to switch to another runnable process. This will keep the CPU working.

 

Context Switch is generally composed of two parts:
interrupt and process (including thread) switching. An interrupt (Interrupt) will cause a switch, and the creation and activation of a process (thread) will also cause a switch.
The value of Context Switch is also related to TPS (Transaction Per Second). Assuming that each call will cause N CS, then we can get

     Context Switch Rate = Interrupt Rate + TPS* N

CSR minus IR is the process/thread switching. If the main process receives the request and hands it to the thread for processing, and the thread completes the processing and returns it to the main process, there are two switchings here. You can also substitute the values ​​of CSR, IR, and TPS into the formula to get the number of switches caused by each transaction. Therefore, to reduce CSR, we must work hard on the switching caused by each TPS. As long as the value of N decreases, CSR can be reduced. Ideally, N=0, but in any case, if N >= 4, you must check carefully. examine.

Use vmstat to check cs (Context Switch), which is this parameter

image

References:

Loadrunner's explanation of monitoring Unix system performance indicators
http://blog.csdn.net/marising/archive/2010/01/08/5160210.aspx

The stress test measures three indicators of the CPU: CPU Utilization, Load Average and Context Switch Rate
http://blog.csdn.net/marising/archive/2010/01/12/5182771.aspx

Monitor server Linux resource status during LoadRunner stress test
http://blog.csdn.net/marising/archive/2010/01/08/5160210.aspx

Understand Load Average and do stress testing
http://www.blogjava.net/cenwenchu/archive/2008/06/30/211712.html

Analysis of CPU load
http://www.penglixun.com/tech/system/cpu_load_analyse.html

Linux cpu related performance indicators
http://www.51testing.com/?uid-3787-action-viewspace-itemid-5527

Guess you like

Origin blog.csdn.net/ghj1976/article/details/6129318