On the Linux performance optimization from the perspective of the average load

1. Average load

The average load here, talking about the Linux system load, you can command top or uptime view.

                                      

                                   

Load average: refers to the last 1 minute, 5 minutes, 20 minutes system, per unit time, is in a running state or the average number of active processes can not be interrupted state. It is often not understood, CPU usage per unit time.

Then, other content top command will not speak, above the second figure, uptime command execution results in front of some of what does it mean?

18:14:31    //当前时间
up 41 days, 8:24  //系统运行时间
2 users     //正在登陆的用户数

Look at the man page for a detailed description of the uptime command:

DESCRIPTION
       uptime  gives  a  one line display of the following information.  The current time, how long
       the system has been running, how many users are currently logged on,  and  the  system  load
       averages for the past 1, 5, and 15 minutes.

       This is the same information contained in the header line displayed by w(1).

       System  load  averages  is  the average number of processes that are either in a runnable or
       uninterruptable state.  A process in a runnable state is either using the CPU or waiting  to
       use  the CPU.  A process in uninterruptable state is waiting for some I/O access, eg waiting
       for disk.  The averages are taken over the three time intervals.  Load averages are not nor‐
       malized for the number of CPUs in a system, so a load average of 1 means a single CPU system
       is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.

You can see from the document, the process can be run state, referring to the waiting CPU is being used or process, that is, we use the ps command to see the state in R (Running or Runnable) process.

Uninterruptible state of the process, it is in the process of being critical processes in kernel mode, which is a key process can not be interrupted, such as the most common is to wait for a hardware device IO response, that is seen in the ps command D (Uninterruptable Sleep or process Disk Sleep) state. Not interrupt status is a protective mechanism of the process and hardware devices.

Since it is the average number of active processes, then the ideal state for each CPU just run a process, so that each CPU to be fully utilized. For example, when the average load is 2, what does it mean?

  • In only two CPU systems, means that the CPU have just been fully occupied.
  • On the four-CPU system, it means that the CPU is idle 50%.
  • On a CPU system, meaning that less than half of the competitive process CPU.

In determining the average load time of reasonable and unreasonable, of course, we must first know the number of system CPU, you can use the command grep 'model name' / proc / cpuinfo | wc -l. When the load is greater than the average number of system CPU, indicating that the system is overloaded. The average load here that there are three values, should see that value? Average of the three time periods represented by three values, is the source of data for trend analysis system load. In the actual production environment, we need to focus on how high the average load it? In my opinion, when the average load above 70% CPU number of systems that need to troubleshoot the problem of the high load. But this is not absolute, more reliable, we should put up load monitoring, based on historical data and found significantly higher load, such as double, then it would have to analyze the problem of.

2. Average load! = CPU usage

Mentioned above, the average load, is represented by a unit of time, is in a running state or non-state interrupts the average number of active processes, including the process is using the CPU, waiting for CPU or waiting for I / O process. And CPU utilization, represents the statistical unit of time, the situation of the CPU is busy, with an average load can not be completely equated. Because there are three possibilities:

A scene, CPU-intensive process, extensive use will lead to average CPU load increase, this time to coincide with each other;

Scene II, I / O-intensive process, waiting for I / O, resulting in an average load will increase, but not necessarily high CPU utilization;

Scene Three, a lot of waiting for the process of scheduling the CPU, can lead to very high average load, CPU usage will be high at this time.

3. Analog Case

First to introduce two tools: stress and sysstat.

stress: Linux system stress test tools, process simulation is used as an abnormal increase in the average load scenarios.

sysstat: contains commonly used Linux performance tools to monitor and analyze the linux system performance. Here we introduce two: mpstat and pidstat.

  • mpstat is commonly used in multi-core CPU performance analysis tools, real-time view performance metrics for each CPU, and the average index of all the CPU. 
  • pidstat is commonly used in the process of performance analysis tools, real-time view of each process CPU, memory, I / O, and context switching and other performance indicators.         

The following simulated at the above three scenarios: 2 assumes that the system is CPU 8GB RAM

A scene, CPU-intensive: Run stress --cpu 1 --timeout 600

              Execute commands on the second terminal watch -d uptime, real-time view of changes in average load

              In order to implement the third terminal mpstat -P ALL 5, to see the amount of CPU usage changes

              You will find that the average load increases, but also because a CPU usage reaches 100%, and finally can use the command pidstat -u 5 1, it is clear that the process can be found stress CPU usage reaches 100%.

Scene II, IO-intensive: Run stress -i 1 --timeout 600 or stress-ng -i 1 --hdd 1 --timeout 600

              Followed by the implementation of good-looking watch -d uptime, mpstat -P ALL 5 

              You will find that the average load increase is due to increased iowait of a CPU. Finally, you can use the command pidstat -d 5 1, it is clear that the process can be found stress caused.

Scene Three, the process of switching: stress -c 8 --timeout 600

                               watch  -d uptime、 pidstat -u 5 1

               8 process can be seen in the competition for two CPU, each process is waiting for CPU time high.

Taking these three scenarios, the average load increase might be CPU-intensive, there may be more IO busy.

 

These are the time to learn geek column (Ni Pengfei: Linux Performance Tuning combat) personal summary

https://time.geekbang.org/column/intro/140

发布了37 篇原创文章 · 获赞 20 · 访问量 4946

Guess you like

Origin blog.csdn.net/qq_24436765/article/details/103497306