[Linux performance optimization] Do you know what load balancing is?

what is load balancing

First of all, think about it, when you find that your service is slow, what command will you use first to troubleshoot? The first thing I usually do is to execute top or uptime command to understand the load on the system. For example, as follows, if I enter the top command in the command line, it will display the resource usage of each process like the task processor in the Windows system.

        I enter the uptime command in the command line, and the system will immediately give the following results:

                What do the parameters in update mean? They are the current time, the running time of the system, the number of users who are logging in, and the average load of the system in the past 1 minute, 5 minutes and 15 minutes.

22:29                 // 当前时间
up 3 days, 12:24      // 系统运行时间
2 user                // 正在登录用户数
2.44 4.13 4.42        // 1分钟、5分钟和15分钟内的平均负载

        I guess someone will say, isn't the average load the CPU usage per unit time? The above 2.44 means that the CPU usage is 244%, which is not the case. The average load refers to the average number of processes that the system is in a runnable and uninterruptible state per unit time, that is, the average number of active processes, which is not directly related to the CPU usage.

        Specifically, what is a process in a runnable state and an uninterruptible state?

        Runnable process: a process that is using the CPU or waiting for the CPU, that is, a process in the Running (running) or Runnable (ready) state that we often see with the ps command.

        Uninterruptible process: a process that is in a key process in the kernel state, and these processes are uninterruptible, such as waiting for the I/O response of the hardware device, which is what we see in the ps command Uninterruptible The Sleep state is also called the Disk Sleep process. For example, when a process reads and writes data to the disk, in order to ensure the consistency of the data, it cannot be interrupted by other processes or interrupts before getting a reply from the disk. At this time, the process is in an uninterruptible state. If the process at this time is interrupted, it is prone to the problem of inconsistency between disk data and process data. Therefore, the uninterruptible state is actually a protection mechanism of the system for processes and hardware devices.

        Therefore, you can simply understand that the average load is the number of active processes per unit time, but it is actually the exponentially decaying average of the number of active processes. We don't need to worry about the detailed meaning of this "exponential decay average". This is just a faster calculation method of the system. It is no problem for us to directly treat it as the average number of active processes.

        Then let's think about it further. When we look at the average load, what is the reasonable value of this value?

What is the reasonable load average

        Since the average load represents the number of active processes per unit time, ideally, each CPU runs exactly one process, which can fully utilize the CPU. For example, when the load is 2, for a dual-core CPU, it means that all CPUs are just fully occupied. For a quad-core CPU, it means that half of the CPU is idle. For a single-core CPU, it means that half of the processes do not compete for the CPU.

So how can you know how many CPUs your server has? You can use the following command to check:

$ grep 'model name' /proc/cpuinfo | wc -l
2

        In this way, we can judge whether our system is overloaded according to the number of CPUs we serve and the average load. Generally, if the average load is greater than the number of CPUs, the system is already overloaded.

        Here comes a new problem. In the example, we can see that the average load has three values. Which one should we refer to? Actually, we all watch. The average value of three different time intervals actually provides us with a data source for analyzing system load trends, allowing us to understand the current load status in a more comprehensive and three-dimensional manner.

        Suppose we see load averages of 1.73, 3.60, 7.98 on a single CPU system, then in the past 1 minute, the system was 73% overloaded, 260% overloaded in 5 minutes, and in 15 minutes, There is an overload of 698%. From the overall trend, the load of the system is constantly decreasing.

        In the actual production environment, when the average load is higher than 70% of the number of CPUs, you should analyze and troubleshoot the problem of high load. Once the load is too high, it may cause the process to respond slowly, thereby affecting the normal function of the service. The figure of 70% is not absolute. The most recommended method is to monitor the average load of the system, and then judge the change trend of the load based on more historical data. When you find that the load has an obvious upward trend, for example, the load has doubled, you can do analysis and investigation again.

        

Relationship between load average and CPU usage

        Please think about a question. The average load represents the average number of active processes per unit time. Does a high average load necessarily mean high CPU usage?

        In fact, not necessarily, let's analyze it in detail. We still come from the definition of average load, which refers to the number of processes in a runnable state and an uninterruptible state per unit of time. So, it includes not only the processes that are using the CPU, but also the processes that are waiting for the CPU and waiting for I/O. The CPU usage is the statistics of CPU busyness per unit time, which does not necessarily correspond exactly to the average load. For example:

  • For CPU-intensive processes, using a lot of CPU will lead to a higher load average, where the two are consistent;
  • For I/O-intensive processes, waiting for I/O can also lead to higher load averages, but not necessarily high CPU usage;
  • The scheduling of a large number of processes waiting for the CPU will also lead to an increase in the average load, and the CPU usage will be relatively high at this time.

summary

The load average provides a quick view of the overall performance of the system, reflecting the overall load situation. Specifically, the average load refers to the number of processes in a runnable state and an uninterruptible state per unit time. But just looking at the load average itself, we can't directly find out where the bottleneck is. Because the high load average may be caused by a CPU-intensive process, or it may be that the I/O is busier.

Linux commands involved in this article

top: It is a commonly used performance analysis tool under Linux, which can display the resource usage status of each process in the system in real time, similar to the task manager of Windows.

ps: It is used to display the status of the current process. The difference from top is that top displays the dynamic changes of resources, and ps displays the instantaneous status.

uptime: It can print how long the system has been running in total and the average load of the system. The information displayed by the uptime command is: the current time, how long the system has been running, how many users are currently logged in, and the average load of the system in the past 1 minute, 5 minutes, and 15 minutes.

Guess you like

Origin blog.csdn.net/zzu_seu/article/details/130918988