The system appears in a large number of non-interruptible process and how to do zombie process
Run short application time is relatively short, it is difficult in top or ps found to show a snapshot overview of the process and the system here, you need tools to record events to match the diagnosis, such as execsnoop or perf top
Mentioned cpu utilization type, other users cpu addition, the system further comprises a cpu (context switch), waiting io the cpu (response waiting disks) and an interrupt cpu (including hardware interrupts and software interrupts), etc.
- Process Status
When iowait elevated, the process is likely to get a response because of hardware, but a long time in an uninterruptible state. The ps or top may be found to have a d state, i.e. not interrupt status ( Uninterruptible SLEEP )
Top , PS is the tool most commonly used to view the process state, Top of the s column indicates the status of the process --R \ D \ Z \ S \ I and several other states
--R is running or Runnable , it indicates that the process in the cpu 's ready queue, waiting to run or are running
--D is Disk SLEEP , uninterruptible sleep state ( Uninterruptible SLEEP ) said it was generally interact with the hardware, and interaction not allowed to be interrupted by other processes or interruption
--Z is Zombie , zombie process, which is in fact the process has ended, but the parent process has not yet recovered its resources
--S is the Interruptible SLEEP , Interruptible sleep state, waiting for an event represented because the system is suspended, when the process of waiting for the event, it will wake up and enter the R state
--I is IDLE , idle state, used in uninterruptible sleep kernel threads. To note, D process status will lead to increased average load, the I process the state will not.
--T or T , stoped or traced , that the process is paused or tracking state, is sent to a process sigstop signal, it will be due to a pause state in response to this signal becomes ( stopped ); retransmission SIGCONT , the process will return
--X , indicates that the process is dead, it is not in top or ps see in
--top command, press a switch to the cpu
If you send a system or hardware failure, the process may not be interrupted in the state to maintain for a long time, and even lead to a large number of systems can not interrupt the process, should be noted, is not there a system I / O and other performance issues.
[root@mysqlhq ~]# yum install dstat -y
Here dstat is a new performance tools, absorbs the vmstat , the iostat , ifstat advantage of such tools, the system can monitor the current CPU , disk IO , memory and network usage
- uninterruptible state , indicates that the process is interacting with the hardware, in order to protect the consistency of process data and hardware, the system does not allow other processes or interrupt interrupt this process. Process a long time in an uninterruptible state usually indicates io performance problems.
- zombie process means the process has exited, but the parent process has not yet recovered resources occupied by child processes. Short zombie state usually do not have to bother, but a long time in a zombie state process, it should be noted, and may have application does not properly handle the child process exits.
--1 iowait too high, the system reached the cpu number
--iowait analysis
[root@mysqlhq ~]# dstat 1 10 ##间隔1秒输出10组数据 You did not select any stats, using -cdngy by default. ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 1 0 99 0 0 0| 329k 133k| 0 0 | 0 0 | 202 249 0 0 100 0 0 0| 0 15k|4086B 842B| 0 0 | 176 225 0 0 100 0 0 0 | 0 206k | 3282B 362B | 0 0 | 182 258 0 0 100 0 0 0 | 0 5120B | 3341B 362B | 0 0 | 141 174 0 0 100 0 0 0 | 0 0 | 2946B 362B | 0 0 | 144 178 0 0 100 0 0 0| 0 10k|2142B 362B| 0 0 | 151 208 0 0 100 0 0 0| 0 15k|2640B 362B| 0 0 | 171 213
Look read and writ , when analyzing iowait elevated, disk read request ( read ) or writ request, it is likely to lead to read or write disk
According to top command, observe D process state
Find the process pid , the case of 2171
# # -D io statistics show, -p process ID 1 second intervals Output 3 sets of data [root @ mysqlhq ~] # pidstat -d -p 2171 1 3 Linux 3.10.0-514.ky3.kb3.x86_64 (mysqlhq) _x86_64_ 06/11/2019 (. 4 the CPU) 04:51:13 the PM the UID the PID kB_rd / S kB_wr / S kB_ccwr / S IODELAY the Command 04:51:14 the PM 1001 2171 0.00 0.00 0.00 0 zabbix_agentd 04:51:15 2171 the PM 1001 0 0.00 0.00 0.00 zabbix_agentd 04:51:16 PM 1001 2171 0.00 0.00 0.00 0 zabbix_agentd Average: 1001 2171 0.00 0.00 0.00 0 zabbix_agentd
kB_rd for read per second KB number, kB_wr express written per KB number, IODELAY represent io delays are all 0 represents no read and write at this time, the issue is not present in 2171
Other analysis using the same method D process state
[mysqlhq the root @ ~] # pidstat -d intervals of 1 second. 1 20 ## 20 sets of data outputs the Linux 3.10.0-514.ky3.kb3.x86_64 (mysqlhq) 06/11/2019 _x86_64_ (. 4 the CPU) 04:55: the PID kB_rd the UID the PM 37 [/ S kB_wr / kB_ccwr S / S IODELAY the Command 04:55:38 the PM 1000 3093 0.00 15.84 0.00 0 mysqld 04:55:38 PM UID PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command 04:55:39 PM 1000 3093 0.00 12.00 0.00 0 mysqld
Observed, mysqld process disk write, and write data per second is 15KB , if this value is large, that is the problem with this process
Process tries to access the disk, you must use a system call, so the next to find out mysqld system calls the process
strace tool is most commonly used to track the process of system calls
[root@mysqlhq ~]# strace -p 1000 strace: attach: ptrace(PTRACE_ATTACH, ...): No such process [root@mysqlhq ~]# strace -p 3093 Process 3093 attached restart_syscall(<... resuming interrupted call ...>) = 1 fcntl(31, F_GETFL) = 0x2 (flags O_RDWR) fcntl(31, F_SETFL, O_RDWR|O_NONBLOCK) = 0 accept(31, {sa_family=AF_INET6, sin6_port=htons(37136), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 133 fcntl(31, F_SETFL, O_RDWR) = 0 setsockopt(133, SOL_IP, IP_TOS, [8], 4) = 0 setsockopt(133, SOL_TCP, TCP_NODELAY, [1], 4) = 0
If the execution fails
strace -p 6082
strace: attach: ptrace(PTRACE_SEIZE, 6082): Operation not permitted
Whether general encounter this problem, first check the status of the normal process
[root@mysqlhq ~]# ps aux|grep 6082
Use perf top view
$ perf record -g
$ perf report
Screenshot swapper is the kernel of the scheduling process, you can ignore
View the process of system calls
- zombie process
To solve the zombie process, he needs to find their roots, that is, to find the parent, then the parent process to resolve in
Parent process to find the law
# -A represents the output of the command-line options
# P table PID
# S represents the parent of the specified process
[root@mysqlhq ~]# pstree -aps 3093
Find the parent process and resolve
summary:
iowait high does not necessarily represent io have performance bottlenecks, when the system has only io type of process at runtime, iowait will be high, but in fact, disk read and write far from the degree of performance bottlenecks.
Met iowait elevated, first use dstat , pidstat and other tools, make sure the disk is not io the problem, and then find the process that led to io
Waiting io process is generally not interrupt status with ps find command D status process, mostly for suspicious processes, if turned into a zombie process, the trace can not directly analyze the process of system calls, use this case perf tools, class analysis system cpu clock event, eventually found a direct io problem.
Problem zombie process, using pstree identify the parent process, check the parent process wait () / waitpid () call