[Debugging] How to check the dead position of a program in Linux|GDB|strace

1. strace -p [process number]

strace -p 1002297

strace: Process 1002297 attached
futex(0x7fcbb95f3f84, FUTEX_WAIT_PRIVATE, 1, NULL

You can see that it died in futex(0x7fcbb95f3f84, FUTEX_WAIT_PRIVATE, 1, NULL

 

Use strace to find an example of the cause of process stuck

 

I recently encountered a situation where the process is stuck, but it may not be able to reproduce during the debugging process. It will only be triggered under certain conditions after running for a period of time. For this kind of situation where the operation cannot damage the scene, we can use gdb -p and strace -p to track.


First, we use ps auxf to see where our process has been executed :

You can see that the execution reaches docker exec -i 178.20.1.229_0115034556 ls and then it is stuck


Then we further use strace to see which system callback has executed this operation :

Write picture description here

Here you can see that it is dead in the system callback read. The specific meaning of descriptor 19 can be checked by entering /proc/pid/fd :

We can find that 19 represents the pipe, and we are dead here on the read pipe.
/************************************************* ************************************************** *****************************/
Dividing line, this problem reappears later,
we first use ps auxf to check the process number and the process has been executed At which step , you can see that the process number is 27678, which is stuck in docker exec

root 27678 0.3 0.4 512172 16500 Sl python /wns/cloud/app/com_host/main.pyc
root 25011 0.0 0.0 4332 652 S \_ /bin/sh -c docker exec -i mongo_docker_master ls
root 25014 0.0 0.2 136592 10600 Sl         \_ docker exec -i mongo_docker_master ls

continued to track with strace -p 27678 and found that it was stuck in read and the file descriptor was 14

root@localhost:/# strace -p 27678      
Process 27678 attached
read(14,

then we cd /proc/27678/ , here we can view the process status

root@localhost: /proc/27678 # cat status 
Name: python
State: S (sleeping)
Tgid: 27678
Ngid: 0
Pid: 27678
PPid: 27677

View the debugging information of the process’s kernel stack , wchan represents the function that caused the process to sleep or wait

root@localhost: /proc/27678 # cat stack 
[<ffffffff811a91ab>] pipe_wait+0x6b/0x90
[<ffffffff811a9c04>] pipe_read+0x344/0x4f0
[<ffffffff811a00bf>] do_sync_read+0x7f/0xb0
[<ff0ff> 0x130
[<ffffffff811a1110>] SyS_read+0x80/0xe0
[<ffffffff818d4c49>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
root@localhost: /proc/27678 # cat wchan 
pipe_wait

now 14 What does it stand for, pipe file

root@localhost:/proc/27678# ls -l ./fd
total 0
lr-x------ 1 root root 64 Mar 26 17:19 0 -> pipe:[30690124]
l-wx---- - 1 root root 64 Mar 26 17:19 1 -> pipe:[30690125]
lrwx------ 1 root root 64 Mar 26 17:19 10 -> socket:[30691732]
lr-x---- - 1 root root 64 Mar 26 17:19 11 -> /dev/urandom
lrwx------ 1 root root 64 Mar 26 17:19 12 -> socket:[30719611]
lrwx------ 1 root root 64 Mar 26 17:19 13 -> socket:[30719610]
lr-x------ 1 root root 64 Mar 26 17:19 14 -> pipe:[38483750]

We can already confirm that main creates child processes Execute the shell command docker exec -i mongo_docker_master ls, and communicate with the child process through the pipe at the same time, and the result is stuck on the read pipe.
In fact, we can also use lsof to locate here, we can see that the FD 14 opened by process 27678 is pipe, where u stands for readable and writable, and r stands for readable

sangfor ~ # lsof -d 14
COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
mongod     1907 root   14u   REG              251,0    36864   130683 /wns/data/mongodb/db/collection-7--588642557116981989.wt
syslog-ng  3446 root   14u  unix 0xffff88012227d800      0t0 40557736 /dev/log
dockerd    4025 root   14u  unix 0xffff8800b8d5d800      0t0    13941 /run/docker/libnetwork/a73bd949b5fbb89c2b8bec3b4ac6af0a948a944958c8b037d9e6c9b324b44331.sock
docker-co  9382 root   14u  0000                0,9        0     9553 anon_inode
docker-co 21204 root   14u  0000                0,9        0     9553 anon_inode
python    27678 root   14r  FIFO                0,8      0t0 38483750 pipe

You can also directly view the process 27678 opened, you can see that 14 is pipe

sangfor ~ # lsof -p 27678
COMMAND   PID USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
python  27678 root    0r  FIFO                0,8      0t0 30690124 pipe
python  27678 root    1w  FIFO                0,8      0t0 30690125 pipe
python  27678 root    2w  FIFO                0,8      0t0 30690126 pipe
python  27678 root    3u  0000                0,9        0     9553 anon_inode
python  27678 root    4u  0000                0,9        0     9553 anon_inode
python  27678 root    5u  pack           30691718      0t0  unknown type=SOCK_RAW
python  27678 root    6w   REG              251,0 76106652   130565 /wns/data/com_host/etc/config/err.log
python  27678 root    7u  IPv4           30691716      0t0      TCP Sangfor:53102->Sangfor:42457 (ESTABLISHED)
python  27678 root    8u  IPv4           30691717      0t0      TCP Sangfor:42457->Sangfor:53102 (ESTABLISHED)
python  27678 root    9u  IPv4           30691731      0t0      TCP db.sdwan:54072->sdwan.io:27017 (ESTABLISHED)
python  27678 root   10u  IPv4           30691732      0t0      TCP db.sdwan:54074->sdwan.io:27017 (ESTABLISHED)
python  27678 root   11r   CHR                1,9      0t0 30690329 /dev/urandom
python  27678 root   12u  IPv4           30719611      0t0      TCP db.sdwan:51404->db.sdwan:37017 (ESTABLISHED)
python  27678 root   13u  IPv4           30719610      0t0      TCP db.sdwan:47124->db.sdwan:27017 (ESTABLISHED)
python 27678 root 14r FIFO 0,8 0t0 38483750 pipe
————————————————
Original link: https://blog.csdn.net/peng314899581/article/details/79064616

Guess you like

Origin blog.csdn.net/bandaoyu/article/details/114303378