Low CPU usage and high load, cause analysis

Reason summary

The reason for this can be summed up in one sentence: too many processes are waiting for the completion of disk I/O, resulting in an excessively long process queue, but there are few processes running on the CPU, which means that the load is too large and the CPU usage is low.

The following content is the specific principle analysis:
before analyzing why the load is high, first introduce related concepts such as load, multitasking operating system, and process scheduling.

what is load

What is load: Load is the statistics of the sum of the number of processes that the CPU is processing and waiting for the CPU to process in a period of time, that is, the statistics of the length of the CPU usage queue . The smaller the number, the better (if it exceeds the CPU core * 0.7 is unusual)

The load is divided into two parts: CPU load, IO load

For example, consider a program that performs large-scale scientific calculations. Although the program does not frequently input and output from disk, the processing takes a considerable amount of time to complete. Because the program is mainly used for processing such as calculation and logical judgment, the processing speed of the program mainly depends on the calculation speed of the CPU. Programs with such cpu load are called "computing intensive programs".

There is also a class of programs that mainly search for arbitrary files from a large amount of data stored on disk. The processing speed of this search program does not depend on the CPU, but on the read speed of the disk, that is, input/output (I/O). The faster the disk, the shorter the retrieval time. Such I/O-loaded programs are called "I/O-intensive programs".

What is a multitasking operating system

The Linux operating system is capable of handling several tasks with different names at the same time. However, in the process of running multiple tasks at the same time, the limited hardware resources such as cpu and disk need to be shared by these task programs. The need to switch between these tasks, even at short intervals, is called multitasking.

When there are few tasks running, the system does not wait for such a switching action to occur. But when tasks are added, for example task A is performing calculations on the CPU, then if tasks B and C also want to perform calculations, they need to wait for the CPU to be idle. That is to say, even if a task is run and processed, it cannot be run until it is his turn. Such a waiting state is manifested as a program running delay.

Numbers with "load average" in uptime output

1
2
[root@localhost ~] # uptime
  11:16:38 up  2:06,  4  users ,  load average: 0.00, 0.02, 0.05

Load average from the left is the number of waiting tasks per unit time in the past 1 minute, 5 minutes, and 15 minutes, which means how many tasks are waiting on average. When the load average is high, this means that there are more tasks waiting to be run, so there will be a large delay in the waiting time for the task to run, which reflects the high load at this time.

Process scheduling

What is process scheduling:

Process scheduling is also called by some people as cpu context switching means: CPU switching to another process requires saving the state of the current process and restoring the state of the other process: the currently running task goes to the ready (or suspended, interrupted) state, the other The selected ready task becomes the current task. Process scheduling includes saving the running environment of the current task and restoring the running environment of the task to be run.

In the Linux kernel, each process has a management table called "process descriptor". The process descriptor is adjusted to be sorted in descending priority order, and processes (tasks) have been run in a reasonable order. This adjustment is the job of the process scheduler.

The scheduler divides and manages the state of the process, such as:

  • Waiting for the status of allocating cpu resources.
  • The state of waiting for disk input and output to complete.

The following is the difference between the status of the process:

condition illustrate
running state Can run at any time as long as the cpu is idle
Interruptible Sleep A long wait state for which recovery time is unpredictable. For example, input from a keyboard device.
Uninterruptible sleep: (uninterruptible) It is mainly a waiting state for a short time. For example disk I/O waits. Process blocked by IO
ready state (runnable) The interrupt state that runs in response to a pause signal.
zombie Processes are created and destroyed by the parent process; when the parent process does not destroy its child process, when it is destroyed, its child process will become a zombie because no parent process is destroyed.

The following example illustrates the process state transition:

There are three processes A, B, and C running at the same time. First of all, each process is in a runnable state after it is generated, that is, the beginning of the running state, not the current running state. Since the running state and the runnable waiting state cannot be distinguished in the linux kernel, the runnable state will be described below. and the running state are both called the running state.

  • Process A: running
  • Process B: running
  • Process C: running

The three running processes immediately become scheduling objects. At this point, it is assumed that the scheduler assigns the running permission of the CPU to process A.

  • Process A: running
  • Process B: running
  • Process C: running

Process A allocates CPU, so process A starts processing. Processes B and C wait here for process A to evict the CPU. Suppose process A needs to read data from disk after performing some computations. Then after A sends a request to read disk data, no work will be done until the requested data arrives. This state is called "blocked waiting for an I/O operation to complete". Before the I/O is completed, process A has been waiting, and it will go to an uninterruptible sleep state (uninterruptible) and not use the CPU. Therefore, the scheduler checks the priority calculation results of process B and process C, and gives the CPU running authority to the party with higher priority. It is assumed here that process B has a higher priority than process C.

  • Process A: uninterruptible (waiting for disk input/output/uninterruptible state)
  • Process B: running
  • Process C: running

As soon as process B starts running, it needs to wait for the user's keyboard input. So B enters the state of waiting for the user's keyboard input, and is also blocked. As a result, process A and process B are both waiting for output and running process C. At this time, both process A and process B are in a waiting state, but waiting for disk input and output and waiting for keyboard input are different states. Waiting for keyboard input is an indefinite event wait, while reading from disk is an event wait that must be completed in a short time. These are two different wait states. The status of each process is as follows:

  • Process A: uninterruptible (waiting for disk input/output/uninterruptible state)
  • Process B: interruptible (waiting for keyboard input/output/interruptible state)
  • Process C: running (running)

Suppose this time that while process C is running, the data requested by process A arrives at the buffer device from the disk. Immediately after the hard disk sends an interrupt signal to the kernel, the kernel knows that the disk read is completed and restores the process A to a runnable state.

  • Process A: running
  • Process B: interruptible (waiting for keyboard input/output/interruptible state)
  • Process C: running (running)

Process C also becomes some kind of wait state after that. For example, the occupied time of the CPU exceeds the upper limit, the task ends, and the I/O waits. Once these conditions are met, the scheduler can complete the process state switch from process C to process A.

Meaning of load:

Load represents the "average number of waiting processes". In the above process state transition process, except for the running state, all other states are waiting states, so will other states be added to the load waiting process?

It turns out that only the process in the running state (running) and the uninterruptible state (interruptible) will be added to the load waiting process, that is, the process in the following two cases will show the value of the load.

  • Even if you need to use the CPU immediately, you still need to wait for other processes to use up the CPU
  • Even if you need to continue processing, you must wait for the disk input and output to complete before proceeding

The following describes an intuitive scenario to explain why only the running and interruptible states are added to the load.

For example, in the processing that takes up CPU resources, for example, in the process of animation coding, although you want to perform other processing of the same type, the result is that the system response becomes very slow, and when a large amount of data is read from the disk, the system's Response will also become very slow. But on the other hand, no matter how many processes are waiting for keyboard input and output operations, it will not slow down the system response.

What scenario will cause the CPU to be low and the load to be very high?

The meaning of the load is obvious through the above specific analysis. The load can be summed up in one sentence: the number of processes that need to be processed but must wait for the processing of the process in front of the queue to complete. Specifically, it is the following two situations:

  • A process waiting to be granted permission to run on the CPU
  • Processes waiting for disk I/O to complete

The CPU is low and the load is high, which means that there are too many processes waiting for the disk I/O to complete, which will cause the queue length to be too large, which means that the load is too large, but in fact, the CPU is allocated to perform other tasks or Idle, the specific scenarios are as follows.

Scenario 1: Too many disk read and write requests will cause a lot of I/O waits

As mentioned above, the work efficiency of the cpu is higher than that of the disk, and the process running on the cpu needs to access the disk file. At this time, the cpu will initiate a request to the kernel to call the file, and let the kernel go to the disk to fetch the file. At this time, it will switch to other processes or idle, the task will transition to an uninterruptible sleep state. When there are too many read and write requests, there will be too many processes in the uninterruptible sleep state, resulting in high load and low CPU.

Scenario 2: There are statements without indexes or deadlocks in MySQL

We all know that MySQL data is stored in the hard disk. If you need to perform SQL queries, you need to load the data from the disk into the memory first. When the data is particularly large, if the executed SQL statement does not have an index, the number of rows in the scanned table will be too large, resulting in I/O blocking , or there is a deadlock in the statement, which will also cause I/O blocking , resulting in There are too many uninterruptible sleeping processes, resulting in excessive load.

For specific solutions, you can run the show full processlist command in MySQL to check the thread waiting status, and take out the statements for optimization.

Scenario 3: The external hard disk is faulty. It is common that NFS is hung up, but the NFS server is faulty.

For example, if our system is mounted with an external hard disk such as NFS shared storage, there are often a large number of read and write requests to access the files stored in NFS. If the NFS server fails at this time, it will cause the process to read and write requests that cannot obtain resources all the time. As a result, the process is always in an uninterruptible state, resulting in a high load.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325816451&siteId=291194637