Linux performance optimization (11)-principle of CPU performance optimization

One, CPU context switch

1. CPU context

Linux is a multitasking operating system that supports simultaneous operation of tasks far greater than the number of CPUs. Before each task runs, the CPU needs to know where the task is loaded and where it starts to run. That is, the system needs to set up the CPU register and the Program Counter (PC) for the CPU in advance. The CPU register is a small but extremely fast memory built into the CPU. The program counter is used to store the position of the instruction currently being executed by the CPU or the position of the next instruction to be executed. The CPU register and the program counter are the dependent environment that the CPU must run before any task, that is, the CPU context.
Linux performance optimization (11)-principle of CPU performance optimization
The CPU context will be stored in the system kernel and loaded again when the task is rescheduled and executed to ensure that the original state of the task is not affected, and the continuous operation of multitasking is maintained.
CPU context switch is to first save the CPU context (CPU register and program counter) of the previous task, then load the context of the new task to the CPU register and program counter, and finally jump to the new location pointed to by the program counter, and run the new task.
According to different tasks, CPU context switching is divided into process context switching, thread context switching, and interrupt context switching.

2. Process context switching

Linux divides the process running space into kernel space and user space according to the privilege level, namely Ring 0 and Ring 3 of the CPU privilege level. The kernel space (Ring 0) has the highest authority and can directly access all resources; the user space (Ring 3) can only access restricted resources, and cannot directly access hardware devices such as memory. It must be trapped in the kernel through system calls to access privileged resources .
Linux performance optimization (11)-principle of CPU performance optimization
The process can run in both user space and kernel space. The process is called the user state of the process when it is running in the user space, and it is called the kernel state of the process when it falls into the kernel space.
The process switching from the user mode to the kernel mode needs to be completed through a system call. A process context switch (privileged mode switch) will occur, and a context switch will also occur when the process switches from the kernel mode back to the user mode. Each process context switch requires tens of nanoseconds to several microseconds of CPU time. If the switch is frequent, CPU time will be wasted on saving and restoring resources such as registers, kernel stacks, and virtual memory, resulting in an increase in the average load of the system .

3. Thread context switching

The thread is the basic unit of scheduling, and the process is the basic unit of resource ownership. The kernel task scheduling object is the thread, and the process provides the thread with resources such as virtual memory and global variables. When a process has multiple threads, the threads share the same resources such as virtual memory and global variables. Shared resources do not need to be modified during context switching. The thread's own private data, such as stacks and registers, need to be saved during context switching. There are two types of thread context switching: the first type, the two threads belong to different processes, because the resources are not shared, the thread switching process is the same as the process context switching; the second type, the two threads belong to the same process, because the virtual memory is shared, When threads are switched, resources such as virtual memory remain intact, and only non-shared data such as private data and registers of the thread need to be switched. Intra-process thread switching consumes fewer resources than multi-process process switching.

4. Interrupt context switching

In order to quickly respond to hardware events, interrupt processing will interrupt the normal scheduling and execution of the process, and instead call interrupt handlers to respond to device events. When interrupting other processes, the current state of the process needs to be saved. After the interruption ends, the process can still resume running from its original state. Interrupt context switching does not involve process user mode, so even if the interrupt process interrupts a process that is in user mode, there is no need to save and restore user mode resources such as virtual memory and global variables of the process. The interrupt context only includes the state necessary for the execution of the kernel-mode interrupt service program, including CPU registers, kernel stack, hardware interrupt parameters, etc.
For the same CPU, interrupt processing has a higher priority than the process, so interrupt context switching does not occur at the same time as process context switching. Because interrupts interrupt normal process scheduling and execution, most interrupt handlers are short and concise, so that execution ends as quickly as possible. Interrupt context switching needs to consume CPU, too many switching times will also consume a lot of CPU, seriously reducing the overall performance of the system.

5. System call

When the system is called, the original user mode instruction position in the CPU register needs to be saved first; in order to execute the kernel mode code, the CPU register needs to be updated to the new position of the kernel mode instruction, jump to the kernel mode to run the kernel task; and after the system call ends , CPU registers need to restore the original saved user state, and then switch to user space to continue the process. Therefore, two CPU context switches occurred during a system call. The process of system call does not involve process user mode resources such as virtual memory, and does not switch processes.
Process context switching refers to switching from one process to another process, and the same process is always running during the system call. The system call process is usually called a privileged mode switch, not a context switch. During the system call, CPU context switching cannot be avoided. Processes are managed and scheduled by the kernel, and process switching can only occur in kernel mode. Process context includes not only user space resources such as virtual memory, stacks, and global variables, but also the state of kernel space such as kernel stacks and registers. Therefore, process context switching is one more step than system call: before saving the kernel state and CPU registers of the current process, the virtual memory and stack of the process need to be saved; and after loading the kernel state of the next process, it needs to be refreshed. The virtual memory and user stack of the process.
Process context switching requires the kernel to run on the CPU to complete.
Linux performance optimization (11)-principle of CPU performance optimization
Each context switch requires tens of nanoseconds to several microseconds of CPU time. When process context switches are frequent, it is easy to cause the CPU to spend a lot of time on saving and restoring resources such as registers, kernel stacks, and virtual memory. Shorten the time to actually run the process.
Linux manages the mapping relationship between virtual memory and physical memory through TLB (Translation Lookaside Buffer). When the virtual memory is updated, the TLB also needs to be refreshed, and memory access will slow down accordingly. In a multi-processor system, the L3 cache is shared by multiple processors. Refreshing the cache will not only affect the current processor process, but also other processor processes that share the cache.
Context switching is only required for process switching, that is, context switching is only required for process scheduling. Linux maintains a ready queue for each CPU, sorts active processes (that is, processes that are running and waiting for the CPU) according to priority and waiting time for the CPU, and then selects the process that needs the CPU most, that is, the highest priority and waiting for the CPU The longest process to run.
The scenarios that trigger process scheduling are as follows:
(1) In order to ensure that all processes can be fairly scheduled, the CPU time is divided into time slices, and the time slices are allocated to each process in turn. When the time slice of a process is exhausted, it will be suspended by the system and switched to other processes waiting for the CPU to run.
(2) When the process is insufficient in system resources (such as insufficient memory), it can run only after the resources are satisfied, the process will be suspended, and the system will schedule other processes to run.
(3) When the process actively suspends itself through the sleep function sleep method, it will reschedule.
(4) When a higher priority process is running, in order to ensure the operation of the high priority process, the current process will be suspended and run by the high priority process.
(5) When a hardware interrupt occurs, the process on the CPU will be suspended by the interrupt, and the interrupt service routine in the kernel will be executed instead.

6. CPU context switching indicators

Linux performance optimization (11)-principle of CPU performance optimization
cs (context switch): The number of context switches per second.
in (interrupt): The number of interrupts per second.
r (Running or Runnable): The length of the ready queue, that is, the number of processes that are running and waiting for the CPU.
b (Blocked): The number of processes in an uninterruptible sleep state.

Vmstat only gives the overall context switch situation of the system. Use pidstat to view the specific CPU context switch of the process.
pidstat -w 5 can view the context switching of each process.
Linux performance optimization (11)-principle of CPU performance optimization
cswch: the number of voluntary context switches
per second nvcswch: the number of non-voluntary context switches per second.
Voluntary context switching refers to the context switching caused by the process being unable to obtain the required resources. For example, voluntary context switching occurs when system resources such as IO and memory are insufficient. More voluntary context switches indicate that processes are waiting for resources, and other problems such as IO may occur.
Involuntary context switching refers to a context switch that occurs when a process is forcibly scheduled by the system due to the time slice has expired. For example, involuntary context switching is prone to occur when a large number of processes are competing for CPU. More involuntary context switches indicate that processes are being forced to schedule and are all competing for CPU.
The increase in the number of interrupts indicates that the CPU is occupied by the interrupt handler. It is also necessary to analyze the specific interrupt type by viewing the /proc/interrupts file.

2. Average CPU load

1. Process status

R is the abbreviation of Running or Runnable, which means that the process is running or waiting to run in the ready queue of the CPU.
D is the abbreviation of Disk Sleep, that is, Uninterruptible Sleep, which generally means that the process is interacting with the hardware, and the interaction process is not allowed to be interrupted by other processes or interrupts. Processes in the D state will increase the average load.
Z is the abbreviation of Zombie, which means the zombie process, the process has ended, but the parent process has not reclaimed resources (such as the process descriptor, PID, etc.).
S is the abbreviation of Interruptible Sleep, which means that the process is suspended by the system because it is waiting for an event. When the event that the process is waiting for occurs, it will be awakened and enter the R state.
I is the abbreviation of Idle, that is, the idle state, which is used in uninterruptible sleep kernel threads. Processes in the I state will not.
T is the abbreviation of Stopped or Traced, indicating that the process is in a pause or tracking state. Send a SIGSTOP signal to a process, it will become a paused state (Stopped) in response to the SIGSTOP signal; then send a SIGCONT signal to the process, the process will resume running (if the process is started directly in the terminal, you need to use the fg command to restore to Running in the foreground).
X is the abbreviation of Dead, which means that the process has died, so it will not be displayed in the top or ps command.
When iowait rises, the process is likely to be in an uninterruptible state for a long time because it cannot get a response from the hardware. The uninterruptible state is to ensure that the process data is consistent with the hardware state, and under normal circumstances, the uninterruptible state will end in a short time. Therefore, the short-term uninterruptible state process can generally be ignored. But if the system or hardware fails, the process may remain in an uninterruptible state for a long time, and even lead to a large number of uninterruptible processes in the system.
Under normal circumstances, when a process creates a child process, it should wait for the end of the child process through the system call wait or waitpid, and reclaim the resources of the child process; when the child process ends, it will send a SIGCHLD signal to the parent process, so the parent process still You can register the processing function of the SIGCHLD signal to recover resources asynchronously. If the parent process does not do this or the child process executes too fast, the parent process has not had time to process the child process state, the child process has exited early, and the child process will become a zombie process. Usually, the zombie process lasts for a relatively short time and will die after the parent process reclaims resources; or after the parent process exits, it will die after being reclaimed by the init process. Once the parent process has not dealt with the termination of the child process and has been kept running, the child process will always be in a zombie state. A large number of zombie processes will run out of PID process numbers, resulting in new processes that cannot be created and must be avoided.

2. Introduction to CPU load average

The average load refers to the average number of processes that the system is in a runnable state and an uninterruptible state per unit time, that is, the average number of active processes, and has no direct relationship with the CPU usage rate.
A runnable state process refers to a process that is using or waiting for the CPU, that is, a process in the R state (Running or Runnable).
The uninterruptible state process is the process in the kernel state key process (uninterruptible), such as waiting for the I/O response of the hardware device, which is the D state (Uninterruptible Sleep, also known as Disk Sleep) process.
When a process reads and writes data to the disk, in order to ensure the consistency of the data, the process cannot be interrupted by other processes or interrupts before getting the disk reply. At this time, the process is in an uninterruptible state. If the process is interrupted at this time, the disk data and the process data are likely to be inconsistent.
The interruptible state is a protection mechanism of the system to processes and hardware devices.
The average load is actually the average number of active processes, so the average load index needs to be referenced with the number of logical CPUs in the current computer system. Under ideal conditions, when each CPU happens to run a process, each CPU can be fully utilized.
The average load index can be viewed using the top or uptime commands.
Linux performance optimization (11)-principle of CPU performance optimization
The three average values ​​observed by uptime are the average load of the last 1min, 5min, and 15min, and the average load of three different time intervals. It provides a data source for analyzing the system load trend, and can understand the current system load more comprehensively and three-dimensionally. situation.
(1) If the average load values ​​of 1 minute, 5 minutes, and 15 minutes are basically the same, the system load is very stable.
(2) If the 1-minute average load value is much smaller than the 15-minute average load value, it means that the load of the system in the last 1 minute is decreasing, but there is a large load in the past 15 minutes.
(3) If the 1-minute average load value is much greater than the 15-minute leather champion load value, it means that the load in the last 1-minute is increasing, which may be temporary or may continue to increase, so continuous observation is required. Once the 1-minute average load approaches or exceeds the number of logical CPUs, it indicates that the current system is experiencing an overload problem.

3. CPU usage

1. Introduction to CPU usage

Average load refers to the number of processes in a runnable state and an uninterruptible state per unit time. Therefore, the load average includes not only processes that are using CPU, but also processes waiting for CPU and waiting for IO.
CPU usage is the statistics of CPU usage per unit time, displayed in percentage.
The CPU usage rate is the statistics of CPU busyness per unit time, and it does not necessarily correspond to the average load. CPU-intensive processes, using a lot of CPU will lead to an increase in average load, at this time the average load is consistent with the CPU usage; IO-intensive processes, waiting for IO will also lead to an increase in average load, but the CPU usage is not necessarily high ; A large number of process scheduling waiting for the CPU will also lead to an increase in the average load, and the CPU usage will be relatively high at this time.
The average load is an index to quickly view the overall performance of the system, reflecting the overall load of the system. But the average load index cannot locate the performance bottleneck of the system.
High average load may be caused by CPU-intensive processes;
high average load does not necessarily mean high CPU usage, it may be a long IO wait time; when you find high average load, you can use mpstat, pidstat to assist in analyzing and locating the source of load .

2. Definition of CPU usage

CPU usage is the percentage of total CPU time except idle time.
Linux performance optimization (11)-principle of CPU performance optimization
The CPU usage calculated from the /proc/stat data is the average CPU usage since booting.
In order to calculate the CPU usage rate, performance analysis tools usually take two values ​​at a period of time (for example, 3 seconds), and then calculate the average CPU usage rate during the time period after making the difference.
Linux performance optimization (11)-principle of CPU performance optimization
Linux provides statistical information about the running status of each process, namely /proc/[pid]/stat. The performance analysis tool will calculate the average CPU usage at an interval of time according to /proc/stat and /proc/[pid]/stat. The default time interval used by different performance analysis tools may be different, so use multiple performance analysis tools to compare When analyzing, be sure to use the same interval time. For example, top uses a 3 second interval by default, while ps uses the entire life cycle of the process.

3. CPU usage time

As a multitasking operating system, Linux divides the time of each CPU into very short time slices, and then allocates them to each task in turn through the scheduler, thus causing the illusion of multiple tasks running at the same time. In order to maintain CPU time, Linux triggers a time interrupt through a pre-defined beat rate (represented as HZ in the kernel), and uses the global variable Jiffies to record the number of beats since booting. Every time a time interruption occurs, the Jiffies value is increased by 1. The beat rate HZ is a configurable option of the kernel, which can be set to 100, 250, 1000, etc. Different systems may set different values, which can be checked by querying the /boot/config kernel option.
grep 'CONFIG_HZ=' /boot/config-$(uname -r)
The beat rate HZ is a kernel option and cannot be directly accessed by user space programs. In order to facilitate user space programs, the kernel provides a user space beat rate USER_HZ, which is fixed at 100HZ. Linux provides the user space with information about the internal state of the system through the /proc virtual file system, while /proc/stat provides statistics about the system's CPU and tasks.
cat /proc/stat | grep ^cpu
Linux performance optimization (11)-principle of CPU performance optimization
The first column is the CPU number, and the unnumbered CPU in the first row represents the accumulation of all CPUs. The other columns represent the cumulative number of beats of the CPU in different scenarios, and the unit is USER_HZ, which is 10ms (1/100 second).
The proc/stat indicators are as follows:
user (abbreviated us) represents user mode CPU time, not including nice time, but including guest time.
nice (abbreviated as ni) represents low-priority user mode CPU time, and the nice value of the process is adjusted to the CPU time between 1-19. Nice can range from -20 to 19. The larger the value, the lower the priority.
System (abbreviated as sys) stands for CPU time in kernel mode.
Idle (abbreviated id) stands for idle time, not including the time waiting for IO (iowait).
iowait (abbreviated wa) represents the CPU time waiting for IO.
irq (abbreviated as hi) represents the CPU time for processing hard interrupts.
softirq (si in abbreviation) represents the CPU time for processing soft interrupts.
Steal (abbreviated as st) represents the CPU time occupied by other virtual machines when the system is running in a virtual machine.
Guest (abbreviated as guest) represents the time to run other operating systems through virtualization, that is, the CPU time to run the virtual machine.
guest_nice (gnice for short) represents the time to run the virtual machine with low priority.
The top output %Cpu line is the CPU usage rate of the system. The default display is the average value of all CPUs. Press the number 1 to switch to view each CPU usage rate.
Linux performance optimization (11)-principle of CPU performance optimization
Linux performance optimization (11)-principle of CPU performance optimization
In the process real-time information, each process has a %CPU column, which represents the CPU usage rate of the process, which is the sum of the user mode and kernel mode CPU usage rates, including the CPU used by the user space of the process, the kernel space CPU executed by system calls, The ready queue is waiting for the running CPU. The virtualized environment also includes the CPU occupied by running virtual machines.
Use pidstat to view the specific CPU usage of the process:
Linux performance optimization (11)-principle of CPU performance optimization
%usr indicator: user-mode CPU usage
% system: kernel-mode CPU usage
%guest: running virtual machine CPU usage
%wait: waiting for CPU usage
%CPU: total CPU usage The
Average part will calculate the average of the 2 sets of data.

4. Abnormal CPU usage

For some unexplainable high CPU usage situations, it may be caused by short-term applications.
(1) Other binary programs are directly called in the application, but the running time is relatively short, and it is not easy to find through tools such as top.
(2) The application keeps crashing and restarting, and the resource initialization during the startup process may take up a lot of CPU resources.
The short-term process view can be viewed using the execsnoop tool.

Four, interrupt

1. Introduction to interrupts

Interrupt is an asynchronous event processing mechanism that can improve the concurrent processing capability of the system. Since the interrupt handler will interrupt the operation of other processes, in order to reduce the impact on the normal process operation scheduling, the interrupt handler needs to run as fast as possible. If the interrupt service program runs for a long time, especially when the interrupt handler is responding to the interrupt, it will temporarily close the interrupt, causing other interrupts to fail to respond until the last interrupt processing is completed, that is, the interrupt may be lost.
In order to solve the problem of excessive execution of the interrupt handler and interrupt loss, Linux divides the interrupt handling process into two stages. The upper half is used to quickly handle interrupts. It runs in interrupt disable mode. The main processing is closely related to hardware or time sensitive. The lower half of the work is used to delay the processing of the unfinished work of the upper half, usually running as a kernel thread.
The upper half directly processes hardware requests, namely hard interrupts, which are characterized by fast execution; while the lower half is triggered by the kernel, namely soft interrupts, which are characterized by delayed execution. The hardware interrupt will interrupt the task being performed by the CPU, and then immediately execute the interrupt handler; the software interrupt is executed in the kernel thread mode, and each CPU corresponds to a soft interrupt kernel thread, the name is ksoftirqd/C number, and the soft interrupt corresponding to CPU0 The name of the kernel thread is ksoftirqd/0. Soft interrupts not only include the lower half of the hardware device interrupt handler, some kernel-defined events are also soft interrupts, such as kernel scheduling and RCU lock (Read-Copy Update, RCU is one of the most commonly used locks in the Linux kernel), etc. .
After the network card receives the data packet, it will notify the kernel of new data through the hardware interrupt method. The kernel will call the interrupt handler to read the network card data into the memory, and update the hardware register status (indicating that the data has been read), and finally send another The soft interrupt signal informs the lower part to do further processing. After the lower part is awakened by the soft interrupt signal, the network data needs to be found in the memory, and then the data is parsed and processed layer by layer according to the network protocol stack, until it is sent to the application.
Soft interrupts in Linux include various types such as network transceiver, timing, scheduling, RCU lock, etc. The /proc/softirqs in the proc file system can observe the operation of soft interrupts.
In Linux, each CPU corresponds to a soft interrupt kernel thread, whose name is ksoftirqd/CPU number. When the frequency of soft interrupt events is too high, the kernel thread will cause the soft interrupt processing to be untimely due to the high CPU usage, which will cause performance problems such as network transmission and reception delays and slow scheduling. Soft interrupt CPU usage is too high is also one of the most common performance problems.

2. Interrupt viewing

The proc file system is a communication mechanism between the kernel space and the user space. It can be used to view the data structure of the kernel or dynamically modify the kernel configuration.
/proc/softirqs provides the running status of soft interrupts.
/proc/interrupts provides the operating status of hard interrupts.
Linux performance optimization (11)-principle of CPU performance optimization
The content in the first column is the soft interrupt type. The soft interrupt includes 10 categories, corresponding to different work types. NET_RX means network receiving interruption, NET_TX means network sending interruption. Usually the cumulative number of the same type of interrupt on different CPUs is the same order of magnitude, and the difference is not much.
The content of /proc/softirqs is the cumulative number of interrupts since the system is running. The production usually pays attention to the rate of change of the number of interrupts, so use watch -d cat /proc/softirqs to view the changes of soft interrupts.
Linux performance optimization (11)-principle of CPU performance optimization
TIMER (timed interrupt), NET_RX (network reception), SCHED (kernel scheduling), RCU (RCU lock) and other soft interrupts are constantly changing. NET_RX, NET_TX soft interrupt types need to use the sar tool to view the network data transmission and reception.
Linux performance optimization (11)-principle of CPU performance optimization
The first column indicates the time of the report.
The second column IFACE represents the network card.
The third column rxpck/s represents the number of network frames received per second, PPS.
The fourth column txpck/s represents the number of network frames sent per second, PPS.
The fifth column rxkB/s represents the number of kilobytes received per second, BPS.
The sixth column txkB/s represents the number of kilobytes sent per second, BPS.

3. Uninterruptible process

The uninterruptible state means that the process is interacting with the hardware. In order to protect the consistency of the process data and hardware, the system does not allow other processes or interrupts to interrupt the process in the uninterruptible state. The process is in an uninterruptible state for a long time, which usually indicates that the system has IO performance problems.

4. Zombie process

The zombie process indicates that the process has exited, but its parent process has not reclaimed the resources occupied by the child process. The short-term zombie state is usually ignored, but the process is in the zombie state for a long time, which means that there may be applications that have not processed the child process exit normally.

Guess you like

Origin blog.51cto.com/9291927/2594259