Linux multi-thread debugging (memory usage, endless loop, high CPU usage...)

Article source: http://www.cnblogs.com/cy568searchx/archive/2013/10/28/3391790.html


Your software stops serving at a certain moment, and the CPU usage reaches 100%+. One possible cause of this problem is an infinite loop. Assuming that there is a potential infinite loop somewhere in the program and it will be triggered under certain conditions, this article Take an example to locate the position where the infinite loop occurs.
When there is an infinite loop somewhere in the program, the usual way to locate the problem and narrow the scope is to add a log to the suspicious code or comment out the suspicious code. This is good for programs that easily reproduce the problem, but for "occasional" It is difficult to debug the problematic program because it is difficult for us to reproduce the program failure. The debugging process described in this article is exactly in this case, assuming that the problem has occurred, we require environmental protection of the site, that is, the program in question is still running.

1. We first need to know which thread has the problem:
first check the pid of the problematic process, for example

ovtsvn@ovtsvn: ~/MASS4/src/icdn/src$ ps  - ef  |  grep icdn 
ovtsvn    11065       1   50   11 : 57   ?          00 : 00 : 07  . / icdn 
ovtsvn    11076 10971 0 11 : 57  pts / 2 00 : 00 : 00  grep            
ovtsvn@ovtsvn: ~/ MASS4 / src / icdn / src$
ovtsvn@ovtsvn: ~/ MASS4 / src / icdn / src$ 

Then the top command to view thread information:
top -H -p 11065

PID USER      PR  NI  VIRT  RES  SHR S  % CPU  % MEM    TIME +   COMMAND                                                                 
11073  ovtsvn     25     0   325m  3980   2236  R   100    0.4     1 : 40.84  icdn                                                                    
11065  ovtsvn     18     0   325m  3980   2236  S     0    0.4     0 : 00.01  icdn                                                                    
11066  ovtsvn     18     0   325m  3980   2236  S     0    0.4     0 : 00.00  icdn                                                                    
11067  ovtsvn     15     0   325m  3980   2236  S     0    0.4     0 : 00.00  icdn                                                                    
11068  ovtsvn     15     0   325m  3980   2236  S     0    0.4     0 : 00.00  icdn                                                                    
11069 ovtsvn  18 0  325m  3980 2236  S  0 0.4 0 : 00.00  icdn 
11070  ovtsvn  18 0  325m  3980 2236  S  0 0.4 0 : 00.00  icdn 
11071  ovtsvn  22 0  325m  3980 2236  S  0 0.4 0 : 00.00  icdn 
11072  ovtsvn  15 0  325m  3980 2236  R  0 0.4 0 : 00.00  icdn
 

It can be seen from the above that the PID of the thread in question is 11073

2. Next, we use gdb to attach the target process
execution: gdb icdn 11065
In gdb, list the thread status:

(gdb) info threads   
9  Thread  47056948181264  (LWP  11066 )   0x00002acc4a3dec91   in  nanosleep () from  / lib / libc.so. 6    
8  Thread  47056956573968  (LWP  11067 )   0x00002acc4a406fc2   in  select () from  / lib / libc.so. 6    
7  Thread  47056964966672  (LWP  11068 )   0x00002acc4a3dec91   in  nanosleep () from  / lib / libc.so. 6   
  6  Thread  47056973359376  (LWP  11069 )   0x00002acc4a3dec91   in  nanosleep () from  / lib / libc.so. 6    
5  Thread  47056981752080  (LWP  11070 )   0x00002acc4a3dec91   in  nanosleep () from  / lib / libc.so. 6    
4  Thread  47056990144784  (LWP  11071 )   0x00002acc4a40e63c   in  recvfrom () from  / lib / libc.so. 6    
3  Thread  47057194060048  (LWP  11072 )   0x00002acc4a406fc2   in  select () from  / lib / libc.so. 6    
2  Thread  47057226893584  (LWP  11073 )  CSendFile::SendFile ( this = 0x2acc5d4aff40 , pathname = @ 0x2acc5d4afee0     at .. / src / csendfile.cpp: 101    
1  Thread  47056939784832  (LWP  11065 )   0x00002acc4a3dec91   in  nanosleep () from  / lib / libc.so. 6  (gdb) 


gdb已经列出了各线程正在执行的函数,我们需要更多信息,记住11073对应的行首标号,这是gdb为线程分配的id,这里为2,然后执行切换:

(gdb) thread  2  
[Switching to thread  2  (Thread  47057226893584  (LWP  11073 ))]# 0   CSendFile::SendFile ( this = 0x2acc5d4aff40 , pathname = @ 0x2acc5d4afee0     at .. / src / csendfile.cpp: 101   101               while ( 1
(gdb) 

bt一下:

(gdb) bt 
# 0   CSendFile::SendFile ( this = 0x2acc5d4aff40 , pathname = @ 0x2acc5d4afee0 ) at .. / src / csendfile.cpp: 101  
# 1    0x000000000040592e   in  CIcdn::TaskThread (pParam = 0x7fff617eafe0 ) at .. / src / cicdn.cpp: 128  
# 2    0x00002acc4a90b73a   in  start_thread () from  / lib / libpthread.so. 0  
# 3    0x00002acc4a40d6dd   in  clone () from  / lib / libc.so. 6  
# 4    0x0000000000000000   in   ??  ()


来看一下101行的代码:

(gdb) l 
96       } 
97  
98        int  CSendFile::SendFile( const   string &  pathname) 
99        {
100             int n;
101             while(1)
102             {
103                     n++;
104             }
105             //read file and send 

现在我们定位到了出问题的代码位置,这里的循环只用来演示的。 
最后别忘了detach()

调试完指定进程后,可以运行detach命令来让GDB释放该进程,该进程得以继续运行。当回车时,detach不会重复。当执行完detach后,进程和GDB不再相关,GDB可以attach其他进程。


Guess you like

Origin blog.csdn.net/gouguofei/article/details/46827805