Linux Troubleshooting System Performance Issues - CPU Load

What exactly is a system?

The system has 4 core elements

  • CPU ( Central Processing Unit)
  • Memory
  • Core
  • Network Manager

CPU load issues:

CPU status indicators:

 CPU Load:  Mathematically speaking, CPU load is the amount of work the CPU is performing as a percentage of its total capacity. Each process waiting for the CPU increments the load by 1, and the process that offers decrements the load by 1. The load average is a measure of the number of tasks (not only CPU time but also disk activity) waiting in the kernel's run queue over a period of time.

CPU Utilization (Usage):  It is a measure of how long the CPU is not idle. It can also be defined as a measure of how busy the CPU is right now.

CPU  load and  CPU utilization are two different things. These numbers often have interrelated patterns, but they are not the same.

Scenario 1 - High CPU load and low CPU utilization:

  • When a process remains blocked in I/O because the disk is busy. For example, a simple mkdir call can cause high CPU load if it is blocked in I/O because the I/O is busy or stalled.
  • When you have a process going to sleep uninterruptedly. The reason a process can go to uninterruptible sleep is whether the driver is waiting for network or disk I/O

Scenario 2 - High CPU Utilization and Low CPU Load: 

  • When there is one thread/process consuming an entire core and you don't have a ton of processes running or waiting for the CPU.

System Troubleshooting Tools:

There are some tools and instructions that can help you understand and resolve CPU load issues. 

  • uptime
  • ps
  • top

uptime: uptime provides information about system startup time , number of active users , and load averages over 1 , 5, and 15 minutes . The last three numbers help you see if the usage spike is long-term or short-term. The decimal represents the number of active tasks requesting CPU resources to perform the operation. If the last number is too high, then this is a problem that should be addressed. If you have 1 core with a value greater than 1 (1.5), the difference (0.5) represents the percentage of the number of processes queued for execution 

To properly understand and analyze these numbers, you should first know how many processors are on the machine. The file cpuinfo can be queried for the number of processors.

cat /proc/cpuinfo | grep processors
processor : 0
processor : 1
processor : 2
processor : 3

 The code and screenshots above were taken from a system with 4 cores. From the above indicators, it can be understood that in the past 1 minute, the usage time of each CPU was 70% (2.83/4) , and in the past 5 minutes, the usage time of each CPU was more than 79% (3.17/4 ). Over 71% (2.84/4) of each CPU was used in the past 15 minutes . This means that it can be interpreted as that the CPU load has decreased during the last 5 minutes, because some processes were served or the waiting time of the processes decreased.

ps:  The process status ( ps ) command is something I often use to get the status of processes running on my system and see if any processes are being respawned or going into weird states like uninterruptible sleep, zombie state. (I will explain in detail about the ps process status in another article)

ps -afx

top: top provides a rich and self-updating layout of process information. top provides a load average similar to the uptime output, it also includes per-CPU metrics when "1" is typed after running the top command. Since we're trying to troubleshoot performance issues, the %CPU column should be the first thing to focus on . Also note that by default processes with high %CPU usage are the ones shown at the top. These processes should be checked and terminated if no longer needed.

 

killall <process>

But most of the time, you can't just kill the process to optimize system performance, you can use the nice command to limit the CPU resources it gets, below I list some scenarios about its use.

Use nice to adjust the scheduling priority of a program or process in a Linux system: nice to determine the priority of a process is a command that is capable of executing a utility with the ability to change the scheduling priority. By default, new processes start with a nice value of 0. The nice value should be between -19 and 20. The higher the nice value, the program or process is more "friendly", that is, it is willing to give up CPU time to other programs or processes. On the contrary, the lower the nice value, the more "selfish" the process or program is, because it will grab all the resources it can get, that is, take up more CPU time.

Suppose you are running a script mytest.sh. If it's not a mission critical script, you can give it a higher nice value and execute it as follows

nice -10 /home/test_user/mytest.sh

If you want to prioritize it and devote more resources, you can set the nice value as follows

nice --15 /home/test_user/mytest.sh

Use the yes command for stress testing:

You can use the yes command to create a high CPU load environment for yourself. The yes command will continue to output "y" (or your own custom content), until you interrupt the process. You can direct this output to /dev/null and use the ampersand (&) symbol to make the process run in the background. You can start the same command multiple times to increase the system load. This will keep the system busy, and you can use the top command to view the load and utilization of the system. You can also run other applications to check their performance and CPU usage.

Finally, you can use the killall command to end all yes processes.

yes > /dev/null &
killall yes

Through this blog, I want you to understand the concept and difference of CPU load and utilization in Linux systems, and how to use tools such as uptime, ps, and top to monitor and analyze system status. I also showed you how to use the yes command for stress testing. I hope you can better understand and manage Linux systems after reading this blog. Now, I want to ask you a question: How do you optimize the performance of your Linux system? Please share your experiences and insights below. looking forward to your reply!

Guess you like

Origin blog.csdn.net/qq_61813593/article/details/130367019