An article to tell you the ins and outs of the average load

# Average load? Not that hard ## Introduction As a developer, we are not around the past a hurdle is monitoring the system on the server, and the abnormal situation when the system requires fast investigation, the average load we understand today is monitoring important ring. Before learning ** ** average load, we need to understand the relevant indicators, see questions based on indicators. Because of this, we understand the deep knowledge follows from light. 1. Common related commands 2. Average load 3. 4. case analysis tools to understand the commonly used ##-related commands commonly used top command, uptime command, htop Command. ### top performance analysis tools commonly used, can resource consumption, top commands the overall situation and the various processes of dynamic real-time view of the system to provide dynamic interactive interface, but also provides a hotkey operation. Comments about the top command can refer https://man.linuxde.net/top. There are detailed analysis of the top command. The website provides a lot of query command, if the command can forget up queries. > Note: The site is a collection of tools that can be website [2019-08-29-23-24-24] (https://img2018.cnblogs.com/blog/1778247/201909/1778247-20190902001356194-1188084346!. png) ### uptime display system uptime and average load of the system, display time were: the time now, the system has been running for a long time, several online users, load average. ** watch ** can be combined to use. `` `Linux ruiqi @ ruiqi: ~ / content $ uptime 15:24:41 up 3 min, 2 users, load average: 0.34, 0.48, 0.22` `` ### htop htop is more detailed than the top of the monitoring software, more convenient operation. It has the following advantages: 1. The operation is relatively simple than the top 2. By default graphical interface support mouse operation 3. Transverse or longitudinal view of the process list, view all processes, including of course the complete command line. There are many more operating skills, htop is not installed by default in the various linux system needs to be installed on various systems. Display mode has a similar reference with the top. ! [2019-08-29-23-35-32] ## average load (https://img2018.cnblogs.com/blog/1778247/201909/1778247-20190902001356758-1048227129.png) ### What is the average load in each of the above commands, you can see ** load average ** parameter, which is the average load of English identity. Were three sequential 1min, 5min, 15min. You may have friends that do not load average CPU usage is incorrect? Remark on also, I say no not right. How do you say that? General analysis of load can be divided into the following three: 1. CPU-intensive process that needs to consume a lot of CPU resources, which consume a lot of CPU time will lead to increased average load. 2. IO-intensive process, need to wait for IO, this time can lead to increased load, but using this CPU usage is not high. 3. Another is the case of waiting for the CPU, and the CPU consumption is high, cpu wait too high, the average load is very high. ** said the situation raised draw load, average load that what is it? ** Before looking at average load, to understand the several states under Linux in the process: - TASK_RUNNINT: referred ** R **, executable state - TASK_INTERRUPTIBLE: referred ** S **, interruptible sleep state, able to respond signal. - TASK_UNINTERRUPTIBLE: referred ** D **, uninterruptible sleep state. The main display is the kernel of the state in dealing with some of the flow is not interrupted, do not interrupt status can be considered a protective mechanism, to ensure consistency between system processes and equipment. - TASK_STOPPED || TASK_TRACED: referred ** T **, pause or track the status of the state, - TASK_DEAD - EXIT_ZOMBIE: referred ** Z **, Exit status, the process is a zombie process, which can not be kill, the task does not respond to the signal. - TASK_DEAD-EXIT_DEAD: referred ** X **, exit status, the process is about to be destroyed. The average load it can simply be understood as a certain period of time, the system is runnable ** and ** ** ** Mean uninterruptible state of the process. 1. runnable state is said above us TASK_RUNNING, to run the state, the state that are using CPU or waiting for CPU processes. 2. Do not interrupt status of the process: we mentioned above TASK_UNINTERRUPTIBLE, referred to as D process. So we can see that the process can not just run average load condition, also contains a non-interruptible process. The average load of ### criteria using said top, uptime, htop command, it is convenient to view the current system load conditions. But how much is a reasonable average data load what is it? As each server or client owned hardware configuration, we can not simply define a specific value to illustrate good and bad average load. But we understand ** Ideally average load is equal to the number of CPU **, according to this article, we first determine the number of CPU machine, there are a variety of ways. The number of #### CPU - read from the file `` `linux $ grep 'model name' / proc / cpuinfo model name: Intel (R) Core (TM) i7-6700K CPU @ 4.00GHz model name: Intel (R) Core (TM) i7-6700K CPU @ 4.00GHz or reuse wc -l statistics about $ grep 'model name' / proc / cpuinfo | wc -l 2 `` `- top / htop top again using shortcut keys 1 you can see how many CPU [2019-08-31-12-38-45] (https:! //img2018.cnblogs. com / blog / 1778247/201909 / 1778247-20190902001356970-666420820.png) using htop, see figures can be seen how many CPU [2019-08-31-12-39-28] (https:! //img2018.cnblogs. after com / blog / 1778247/201909 / 1778247-20190902001357100-880167808.png) #### standard to judge the number of the CPU understood, back to the viewing angle ** load average **, we found that there are three parameter values, respectively, 1m , 5m, 15m, then we use the time which represents the standard load it? In fact, very simple, they sub-table represent 1m, 5m, the average case load within 15m, represents the trend of the load this time to run. The value of the size in different times, the system can judge the trend of load change, change in the average load obtained easily. Under normal circumstances, the system can accommodate more than 70 percent of the time load, monitoring personnel need attention, and to see whether the system is abnormal situation. Of course, only the 70% of theory, depending on the machine to do a different determination, such as a server belonging to the old server, for which the average load may reduce the load indicator. When the load is too high to make timely emergency measures. ## tools to understand because we do not realize the actual environment, then we need simulation environment, so in case you need to learn before implementing tools to help us assist experimental environment to build ### Stress stress is stress testing tool, It is to generate CPU / Menory / IO / Disk load installation tool #### Stress Posix system - installed on ubuntu `` `ubuntu sudo apt-get install stress` `` - centos install `` `centos Centos 7 ## enabled third-party source rpm -ivh http://apt.sw.be/redhat/el7/en/x86_64/rpmforge/RPMS/rpmforge-release-0.5.3-1.el7.rf.x86_64. rpm ## installation stress yum install stress Centos 6 ## enabled the tripartite source yum install epel-release ## installation Stress yum install stress `` `#### Stress Parameters` `` linux -? Displays help information -v display version number does not display operating information -q -n, - dry-run has finished the instruction specified implementation -t --timeout N starts running after stopping the operation waits N seconds --backoff N N produce subtle -c --cpu each process of n processes each process is repeated constantly calculate the square root of a random number generating n -i --io process repeated calls sync (), sync () for the contents of memory written to disk -m --vm n generates the n-th process, which is frequently called memory allocation and deallocation free function malloc --vm-bytes B specifies the number of bytes of memory malloc (default 256MB) --vm-hang N each indicative of consumption memory processes after allocating memory into hibernation, as opposed to the normal memory allocation and release of infinite process, which is conducive to simulate only machine -d --hadd n n produces a small amount of memory write and unlink function execution process --hadd-bytes B specifies the number of bytes written, the default is 1GB - Unlink file -hadd-noclean time units do not write ASCII data may be randomly seconds s, minutes m, h h, days d, of y, the file size unit may be K, M, G `` `#### Stress tutorial -! multi-CPU `` `linux stress -c 13` `` [2019-08-31-00-57-02] (https://img2018.cnblogs.com/blog/1778247/201909/1778247- 20190902001357231-1808307360. png) - Multi operation io `` `linux stress --io 5 stress -i 5` `` - io generate a plurality of processes and a plurality of CPU 1 minute and the timing to stop `` `Linux stress -c 4 -i 4 - timeout 1m stress: info: [19613] dispatching hogs: 4 cpu, 4 io, 0 vm, 0 hdd stress: info: [19613] successful run completed in 60s `` `- output file to the local` `` Linux stress -d 1 --hdd-bytes 1G `` `stress and of course more than the content, but it can not simulate the stress ** Note that more complex scenarios **, pressure and stress CPU is in user mode, kernel mode and does not generate pressure. If you need a more complex stress test we can also use stress-ng. ### sysstat sysstat performance monitoring is a common tool kit, which contains a plurality of tool performance. For example, we use the following mpstat, pidstat, iostat, sar and other instructions. #### Installation 1. ubuntu `` `linux sudo apt-get install sysstat` `` 2.centos `` `centos yum install sysstat` `` #### mpstat mpstat statistics command will output an average CPU usage of all : `` `linux ~ $ mpstat -A this command is the equivalent to mpstat -u -l ALL -p ALL Linux 4.15.0-55-generic (ruiqi) 08/31/2019 _x86_64_ (2 CPU) 07:15:31 AM CPU% usr% nice% sys% iowait% irq% soft% steal% guest% gnice% idle 07:15:31 AM all 0.77 0.01 1.83 17.85 0.00 0. 75 0.00 2.89 0.25 0.00 0.08 41.03 0.00 29.01 `` `command Description: - P: carry ALL, the statistics for each CPU's output. `` `Linux ruiqi @ ruiqi: ~ $ mpstat -P ALL Linux 4.15.0-55-generic (ruiqi) 08/31/2019 _x86_64_ (2 CPU) 07:17:31 AM CPU% usr% nice% sys% iowait % irq% soft% steal% guest% gnice% idle 07:17:31 AM all 0.77 0.01 1.83 17.83 0.00 0.21 0.00 0.00 0.00 79.34 07:17:31 AM 0 0.74 0.01 1.42 16.57 0.00 0.42 0.00 0.00 0.00 80.85 07:17: 31 AM 1 0.80 0.02 2.25 19.09 0.00 0.01 0.00 0.00 0.00 77.83 `` `- N: output information representative of the CPU every few seconds - I: Representative statistics for each output processor interrupt - u: output representing all statistical CPU ### pidstat pidstat information used to monitor the current thread of kernel processes and management, and can also check the status of child processes and threads `` `linux pidstat -t -p process ID # 23 every two seconds for the specified process cpu information statistical output three times pidstat -p ALL # show all process information pidstat -u 5 1 # show CPU information, every five seconds to show a group of pidstat -d 2 # statistics io output messages, 2 seconds for the statistics `` `# # load case can be seen in several scenarios raised from the above example, in order to help us understand the contents of this area, the We create an example to demonstrate the next. ### machine environment to build a virtual machine environment as follows: - dual core - memory is 2g - The operating system is ubuntu18.04 - installation of stress, top, htop ### CPU-intensive process CPU-intensive process that needs to consume a lot of CPU resources, which consume a lot of CPU time will lead to increased average load. CPU-intensive simulation process by means of stress. `` `Linux stress -c 10 --timeout 600` `` top command to view the average load. ! [2019-08-31-14-53-58] (https://img2018.cnblogs.com/blog/1778247/201909/1778247-20190902001357351-73041736.png) From the graph you can see load average is gradually increased. The system has entered the high load. mpstat -p ALL displays all the information, to check what caused the increased load! [2019-08-31-20-37-36] (https://img2018.cnblogs.com/blog/1778247/201909/1778247-20190902001357490- 713155813.png) seen from the figure, CPU usage basically waiting to be 100% or 0 iowait representative average load increase is caused by the CPU use rate. That is exactly how to see which leads to a process CPU usage so high, where you can use pidstat command. `` `Linux pidstat -p ALL # show all process information pidstat -u 5 1 # show CPU information, every five seconds to show a group of pidstat -d 2 # statistics io output messages, 2 seconds statistical output statistics pidstat -t -p process ID # 23 every two seconds cpu information specified process three times `` `### other IO-intensive process, need to wait for IO, this time will lead to increased load, but the use of such a case is not high CPU load caused by the increase in multi-process mode. Both cases are using mpstat to observe the load, find or excessive cpu io wait or process leading to a multi-cpu load caused by the increase in waiting, and finally use pidstat find the corresponding process, check their status. ## Summary This article describes the average load source, incidentally, to say the pressure testing tool stress, stress-ng, mpstat, pidstat instruction. We use these tools to assist system monitoring and problem identification. · END · Although the road is far, it certainly lines to paper originating in the same name as the micro-channel public number "fat Qi upgrade the road," reply "1024" you know, give a praise chant. Micro letter ID:! YoungRUIQ [Public number] (https://img2018.cnblogs.com/blog/1778247/201909/1778247-20190902001358148-748666685.png) · END · Although the road is far, certainly the line to the primary in the article of the same name micro letter "upgrade path fat Qi" public number, reply "1024" you know, give a praise chant. WeChat ID: YoungRUIQ [Public Number] (https://img2018.cnblogs.com/blog/1778247/201909/1778247-20190902001358148-748666685.png)! png) · END · Although the road is far, certainly the line to the primary in the article "upgrade path fat Qi" of the same name micro-channel public number, reply "1024" you know, give a praise chant. WeChat ID: YoungRUIQ [Public Number] (https://img2018.cnblogs.com/blog/1778247/201909/1778247-20190902001358148-748666685.png)! png) · END · Although the road is far, certainly the line to the primary in the article "upgrade path fat Qi" of the same name micro-channel public number, reply "1024" you know, give a praise chant. WeChat ID: YoungRUIQ [Public Number] (https://img2018.cnblogs.com/blog/1778247/201909/1778247-20190902001358148-748666685.png)!

Guess you like

Origin www.cnblogs.com/ruiqi-pang/p/11444374.html