Linux performance tools that programmers must know

foreword

In actual development, sometimes we will receive monitoring alarms of some services, such as high CPU and high memory. At this time, we will log in to the server for troubleshooting. This blog will cover that: Linux Performance Tools.

An online troubleshooting simulation

Background: After the service runs smoothly for a period of time, the CPU suddenly spikes.

top

Through the top command, you can confirm which process caused the CPU to soar (perhaps a false positive?).

You can see that the PID is 2816 in the figure, and the CPU usage is very high.

View thread information under the process

Use top -Hp 2816 to observe the threads under the process. It can be found in the figure that the thread CPU of 2825 is very high.

base conversion

Here, it is very convenient to use Python to convert the decimal thread ID into hexadecimal. Why do you want to do this?

Because the hexadecimal NID is used in the next thread DUMP file. 

thread dump file

thread state transition

In practice, we should use the jstack pid to DUMP several times, because the thread has state transitions, so multiple DUMP is beneficial to capture more information about the thread.

In the figure, you can observe that one thread has obtained the lock, is running, and has not released it, while another thread has been waiting for the lock. At this point, you can go and look at the code to analyze the reason why the lock is not released.

Detailed explanation of performance monitoring tool top

In the above case, top is used, but in practice, the amount of information in top is very large, which is analyzed in detail here.

top
first row:

There are 2 times involved, one is the system time and the other is the machine running time. [What we should focus on is the running time of the machine, why? Sometimes, restarting the machine can cause a lot of problems, you know!

How many users are logged into the system? [More information can be found through who/w/history]

What do the 3 load values ​​mean?

They represent the load conditions of the 1MIN, 5MIN, and 15MIN machines respectively. How to determine the size of the load? It needs to be combined with the number of CPU cores. For example, if the machine is a 4-core CPU, then if the load value exceeds 4, it means that the load is very heavy! [Press 1 under top to observe the number of CPUs]

The above information can also be obtained through the uptime command.

second line:

The main thing is how many tasks there are in total, and the focus should be on the number of tasks in the zombie state.

The third row:

Mainly some information about the CPU.

US/SY refers to the proportion of CPU used by user processes and system processes.

NI, or NICE, indicates the proportion of processes whose thread priorities have been adjusted. Normally, this proportion should not be very large.

ID means idle; WA means the waiting time for resources. For example, if the service records a lot of logs under instantaneous heavy traffic, then this value will soar, because it will consume resources.

HI, hard interrupt, is generally caused by peripherals. If HI is high, it means that there is a problem with the peripherals at the hardware level. SI stands for soft interrupt.

ST, that is, steel, if the host is virtual, there will be this ST information, that is, the percentage of the time slice in which the virtual machine obtains the CPU from the host.

 
user space and system space
Fourth and fifth lines:

Here are mainly two conceptual things: buffer and cache.

What is the main buffer? It should be the data to be processed, mainly to deal with the speed mismatch between the 2 systems. The cache, in general, should be the cache of the result data, such as loading some information from the DB for query.

The SWAP partition is to use the hard disk as a part of the cache. If the SWAP exchange is very frequent, it means that the memory is not enough!

List Description:

PID process ID, USER user, PR priority, VIRT virtual memory, RES resident memory, SHR shared memory

It should be pointed out here that RES represents the actual memory occupied by the process, not the requested memory size. That is to say, the physical size of the memory occupied by the current process is RES-SHR.

Well, here it is, this blog is over

After reading the above content, I believe you have a deeper understanding of Linux. As a Linux enthusiast, if you encounter confusion and need to communicate during your learning, you can come to our website ( http://www.magedu.com/ ) for help, and to learn about the industry's highest rated Linux courses, you can call: 18519746220 .

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325122511&siteId=291194637