Linux performance optimization (10)-CPU performance analysis tool

1. CPU performance indicators

1. CPU usage

CPU usage is the most common performance indicator that describes the percentage of non-idle time to total CPU time. According to different tasks running on the CPU, it is divided into user CPU, system CPU, waiting IO CPU, soft interrupt and hard interrupt, etc.
User CPU usage includes user-mode CPU usage (user) and low-priority user-mode CPU usage (nice), which represents the percentage of time the CPU is running in user mode. User CPU usage is high, usually indicating that some applications are busy.
The system CPU usage rate indicates the percentage of time the CPU is running in kernel mode (excluding interrupts). The high system CPU usage indicates that the kernel is busy.
The CPU usage rate of waiting for IO, iowait, represents the percentage of time waiting for IO. High iowait usually indicates that the IO interaction time between the system and the hardware device is relatively long.
The CPU usage rate of soft interrupt and hard interrupt respectively represents the percentage of time that the kernel calls the soft interrupt handler and the hard interrupt handler. The high interruption CPU usage rate usually indicates that the system has a large number of interruptions.
Stealing CPU usage (steal) and guest CPU usage (guest) respectively represent the percentage of CPU time occupied by other virtual machines and the percentage of CPU time running guest virtual machines.

2. CPU load average

The average CPU load is the average number of active processes of the system, which reflects the overall load of the system. It mainly includes three values, which refer to the average load of the past 1 minute, the past 5 minutes, and the past 15 minutes. Ideally, the average load is equal to the number of logical CPUs, which means that each CPU is just fully utilized. If the average load is greater than the number of logical CPUs, the load is heavy.

3. Process context switching

Process context switching includes voluntary context switching caused by the inability to obtain resources and involuntary context switching caused by forced scheduling by the system. Context switching is a core function to ensure the normal operation of Linux, but too many context switching will consume the CPU time of the original running process, which will be consumed on the preservation and restoration of data such as registers, kernel stack, and virtual memory, shortening the real running time of the process , Become a performance bottleneck.

4. CPU cache hit rate

Since the development of CPU is much faster than the development of memory, the processing speed of CPU is much faster than the access speed of memory. When the CPU accesses the memory, it needs to wait for the response of the memory.
Linux performance optimization (10)-CPU performance analysis tool
The speed of the CPU cache is between the CPU and the memory, and the hot memory data is cached. According to the ever-increasing hotspot data, the CPU cache is divided into L1, L2, L3 and other three-level caches according to different sizes. Among them, L1 and L2 are commonly used in single-core, and L3 is used in multi-core. From L1 to L3, the size of the three-level cache increases successively, and the performance decreases successively. The CPU cache hit rate is used to measure the reuse of the CPU cache. The higher the hit rate, the better the performance.

Two, CPU performance analysis tool

You can use tools
Linux performance optimization (10)-CPU performance analysis tool
to view CPU performance indicators:
Linux performance optimization (10)-CPU performance analysis tool
CPU performance indicators that can be viewed by performance analysis tools: There are many CPU performance indicators, and many CPU performance indicators have a certain correlation. User CPU usage is high, so you should check the user mode of the process instead of the kernel mode. Because the user CPU usage reflects the CPU usage in the user mode, and the CPU usage in the kernel mode only reflects the system CPU usage.
Linux performance optimization (10)-CPU performance analysis tool
The top output can view various CPU usage, zombie processes and average load and other information. The vmstat output can view the number of context switches, the number of interrupts, the running status and the number of processes in the uninterruptible status.
The pidstat output can view the user CPU usage of the process, system CPU usage, and voluntary and involuntary context switching.
The increase in the CPU usage of the process user in the output of pidstat will cause the increase in the user CPU usage of the top output. When you find that there is a problem with the user's CPU usage output by top, you can compare it with the output of pidstat to see if the problem is caused by a certain process. After identifying the process causing the performance problem, use process analysis tools to analyze the behavior of the process, such as using strace to analyze the system call situation, and using perf to analyze the execution of functions at all levels in the call chain.
The average load increase in the top output result can be compared with the number of processes in the running state and uninterruptible state output by vmstat to observe which process caused the load increase. If the number of uninterruptible processes increases, then IO analysis needs to be done, using tools such as dstat or sar to further analyze the IO situation. If the number of running processes increases, use top and pidstat to find out which process is in the running state, and then use the process analysis tool for further analysis.
When you find that the CPU usage of soft interrupts in the top output structure increases, you can check the changes of various types of soft interrupts in the /proc/softirqs file to determine which soft interrupt is causing the problem. If the problem is caused by network reception interruption, use network analysis tools sar and tcpdump to analyze.

Guess you like

Origin blog.51cto.com/9291927/2594169