strace + gdb 追踪多线程死锁问题

strace + gdb 追踪多线程死锁问题

一、strace命令简介

1、strace是什么

strace是Linux环境下的一款程序调试工具,用来监察一个应用程序所使用的系统调用,在其最简单的形式中,它可以从开始到结束跟踪二进制的执行,并在进程的生命周期中输出一行具有系统调用名称,每个系统调用的参数和返回值的文本行。在Linux中,进程是不能直接去访问硬件设备的,比如读取磁盘文件、接收网络数据等,但可以将用户态模式切换到内核模式,通过系统调用来访问硬件设备。这时strace就可以跟踪到一个进程产生的系统调用,包括参数,返回值,执行消耗的时间、调用次数,成功和失败的次数。

2、strace能干什么
  • 它可以基于特定的系统调用或系统调用组进行过滤
  • 它可以通过统计特定系统调用的使用次数,所花费的时间,以及成功和错误的数量来分析系统调用的使用。
  • 它跟踪发送到进程的信号。
  • 可以通过pid附加到任何正在运行的进程。
  • 调试性能问题,查看系统调用的频率,找出耗时的程序段
  • 查看程序读取的是哪些文件从而定位比如配置文件加载错误问题
3、strace用法

http://www.man7.org/linux/man-pages/man1/strace.1.html

或者

man strace 

可以查看strace的用法

  • -c 统计每一系统调用的所执行的时间,次数和出错的次数等.
  • -d 输出strace关于标准错误的调试信息.
  • -f 除了跟踪当前进程外,还跟踪由fork调用所产生的子进程.
  • -ff 如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号.
  • -F 尝试跟踪vfork调用.在-f时,vfork不被跟踪.
  • -h 输出简要的帮助信息.
  • -i 输出系统调用的入口指针寄存器值.
  • -q 禁止输出关于结合(attaching)、脱离(detaching)的消息,当输出重定向到一个文件时,自动抑制此类消息.
  • -r 打印出相对时间关于每一个系统调用,即连续的系统调用起点之间的时间差,与-t对应.
  • -t 打印各个系统调用被调用时的绝对时间秒级,观察程序各部分的执行时间可以用此选项。
  • -tt 在输出中的每一行前加上时间信息,微秒级.
  • -ttt 在每行输出前添加相对时间信息,格式为”自纪元时间起经历的秒数.微秒数”
  • -T 显示每一调用所耗的时间,其时间开销在输出行最右侧的尖括号内.
  • -v 冗余显示模式:显示系统调用中argv[]envp[]stat、termio(s)等数组/结构体参数所有的元素/成员内容.
  • -V 输出strace的版本信息.
  • -x 以十六进制形式输出非标准字符串 。
  • -xx 所有字符串以十六进制形式输出.
  • -a column 设置返回值的输出位置.默认为40,即"="出现在第40列.
  • -e expr 指定一个表达式,用来控制如何跟踪.
  • -e trace=set 只跟踪指定的系统 调用.例如:-e trace=open.
    
  • -e trace=file 只跟踪有关文件操作的系统调用. 
    
  • -e trace=process 只跟踪有关进程控制的系统调用. 
    
  • -e trace=network 跟踪与网络有关的所有系统调用. 
    
  • -e trace=signal 跟踪所有与系统信号有关的 系统调用 
    
  • -e trace=ipc 跟踪所有与进程通讯有关的系统调用 
    
  • -e abbrev=set 设定 strace输出的系统调用的结果集.-v 等与 abbrev=none.默认为abbrev=all. 
    
  • -e raw=set 将指 定的系统调用的参数以十六进制显示. 
    
  • -e signal=set 指定跟踪的系统信号.默认为all.如signal=!SIGIO,表示不跟踪SIGIO信号. 
    
  • -e read=set 输出从指定文件中读出 的数据.例如: -e read=3,5 -e write=set 
    
  • -E var 从命令的环境变量列表中移除var。
  • -E var=val 将var=val放入命令的环境变量列表.
  • -o filename 将strace的输出写入文件filename,而不是显示到标准错误输出(stderr).
  • -p pid 跟踪指定的进程pid,可指定多达32个(-p pid)选项以同时跟踪多个进程。该选项常用于调试后台进程.
  • -s strsize 限制每行输出中字符串(如read参数)的最大显示长度,默认32字节。但文件名总是完整显示
  • -S sortby 按指定规则对-c选项的输出直方图进行排序。sortby取值可为time、calls、name和nothing(默认 time)
  • -u username 以username 的UID和GID执行被跟踪的命令

二、strace + gdb 调试死锁问题

1、示例程序
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <unistd.h>
 
pthread_mutex_t g_smutex ; 
 
void * func(void *arg)
{
	int i=0;
 
	//lock
 
	pthread_mutex_lock( &g_smutex);
	
	for(i = 0 ;i < 0x7fffffff; i++)
	{
 
	}
 
	//forget unlock
	
	return NULL;
}
 
int main()
{
	pthread_t  thread_id_01;
	pthread_t  thread_id_02;
	pthread_t  thread_id_03;
	pthread_t  thread_id_04;
	pthread_t  thread_id_05;
	
	pthread_mutex_init( &g_smutex, NULL );
 
	pthread_create(&thread_id_01, NULL, func, NULL);
	pthread_create(&thread_id_02, NULL, func, NULL);
	pthread_create(&thread_id_03, NULL, func, NULL);
	pthread_create(&thread_id_04, NULL, func, NULL);
	pthread_create(&thread_id_05, NULL, func, NULL);
 
	while(1)
	{
		sleep(0xfff);
	}
	return 0;
}

2、strace 追踪程序

编译并运行程序

root@ubuntu:/wan/1.FlyInCoding/FlyInCodingVs/dead_lock# g++ dead_lock.cpp -lpthread
root@ubuntu:/wan/1.FlyInCoding/FlyInCodingVs/dead_lock# ./a.out 

重新开一个终端

root@ubuntu:~# ps aux -T | grep a.out
root      8822  8822  0.0  0.1  43296   700 pts/0    Sl+  17:29   0:00 ./a.out
root      8822  8823  0.0  0.1  43296   700 pts/0    Sl+  17:29   0:00 ./a.out
root      8822  8825  0.0  0.1  43296   700 pts/0    Sl+  17:29   0:00 ./a.out
root      8822  8826  0.0  0.1  43296   700 pts/0    Sl+  17:29   0:00 ./a.out
root      8822  8827  0.0  0.1  43296   700 pts/0    Sl+  17:29   0:00 ./a.out
root      8836  8836  0.0  0.1   5108   848 pts/6    S+   17:30   0:00 grep --color=auto a.out
root@ubuntu:~# strace -p 8822
strace: Process 8822 attached
restart_syscall(<... resuming interrupted nanosleep ...>
^Cstrace: Process 8822 detached
 <detached ...>
root@ubuntu:~# strace -p 8823
strace: Process 8823 attached
futex(0x804a02c, FUTEX_WAIT_PRIVATE, 2, NULL^Cstrace: Process 8823 detached
 <detached ...>
root@ubuntu:~# strace -p 8825
strace: Process 8825 attached
futex(0x804a02c, FUTEX_WAIT_PRIVATE, 2, NULL^Cstrace: Process 8825 detached
 <detached ...>
root@ubuntu:~# strace -p 8826
strace: Process 8826 attached
futex(0x804a02c, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 8826 detached
 <detached ...>
root@ubuntu:~# strace -p 8827
strace: Process 8827 attached
futex(0x804a02c, FUTEX_WAIT_PRIVATE, 2, NULL^Cstrace: Process 8827 detached
 <detached ...>
root@ubuntu:~# 

用strace可以看出总共五个线程,主线程在sleep 其余4个线程都在进行等待锁 并且锁的地址为

0x804a02c
3、gdb进一个不查看锁的信息

查看a.out进程号

root@ubuntu:~# ps aux  | grep a.out  
root      8822  1.2  0.1  43296   700 pts/0    Sl+  17:29   0:04 ./a.out
root      8849  0.0  0.1   5108   800 pts/6    S+   17:35   0:00 grep --color=auto a.out

使用gdb 去查看该进程

root@ubuntu:~# gdb 
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) attach 8822
Attaching to process 8822
[New LWP 8823]
[New LWP 8825]
[New LWP 8826]
[New LWP 8827]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
0xb77ccc31 in __kernel_vsyscall ()
(gdb) info threads 
  Id   Target Id         Frame 
* 1    Thread 0xb75e2700 (LWP 8822) "a.out" 0xb77ccc31 in __kernel_vsyscall ()
  2    Thread 0xb75e1b40 (LWP 8823) "a.out" 0xb77ccc31 in __kernel_vsyscall ()
  3    Thread 0xb65dfb40 (LWP 8825) "a.out" 0xb77ccc31 in __kernel_vsyscall ()
  4    Thread 0xb5ddeb40 (LWP 8826) "a.out" 0xb77ccc31 in __kernel_vsyscall ()
  5    Thread 0xb55ddb40 (LWP 8827) "a.out" 0xb77ccc31 in __kernel_vsyscall ()
(gdb) thread apply all bt

Thread 5 (Thread 0xb55ddb40 (LWP 8827)):
#0  0xb77ccc31 in __kernel_vsyscall ()
#1  0xb77a7d12 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/lowlevellock.S:144
#2  0xb77a189e in __GI___pthread_mutex_lock (mutex=0x804a02c <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3  0x080485a6 in func(void*) ()
#4  0xb779f295 in start_thread (arg=0xb55ddb40) at pthread_create.c:333
#5  0xb76ca0ae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:114

Thread 4 (Thread 0xb5ddeb40 (LWP 8826)):
#0  0xb77ccc31 in __kernel_vsyscall ()
#1  0xb77a7d12 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/lowlevellock.S:144
#2  0xb77a189e in __GI___pthread_mutex_lock (mutex=0x804a02c <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3  0x080485a6 in func(void*) ()
#4  0xb779f295 in start_thread (arg=0xb5ddeb40) at pthread_create.c:333
#5  0xb76ca0ae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:114

Thread 3 (Thread 0xb65dfb40 (LWP 8825)):
#0  0xb77ccc31 in __kernel_vsyscall ()
#1  0xb77a7d12 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/lowlevellock.S:144
#2  0xb77a189e in __GI___pthread_mutex_lock (mutex=0x804a02c <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3  0x080485a6 in func(void*) ()
#4  0xb779f295 in start_thread (arg=0xb65dfb40) at pthread_create.c:333
#5  0xb76ca0ae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:114

Thread 2 (Thread 0xb75e1b40 (LWP 8823)):
#0  0xb77ccc31 in __kernel_vsyscall ()
#1  0xb77a7d12 in __lll_lock_wait () at ../sysdeps/unix/sysv/linux/i386/lowlevellock.S:144
#2  0xb77a189e in __GI___pthread_mutex_lock (mutex=0x804a02c <g_smutex>) at ../nptl/pthread_mutex_lock.c:80
#3  0x080485a6 in func(void*) ()
#4  0xb779f295 in start_thread (arg=0xb75e1b40) at pthread_create.c:333
#5  0xb76ca0ae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:114

Thread 1 (Thread 0xb75e2700 (LWP 8822)):
#0  0xb77ccc31 in __kernel_vsyscall ()
#1  0xb76933ca in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#2  0xb76932fd in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#3  0x080486a0 in main ()
(gdb) p g_smutex
$1 = 2
(gdb) p *(pthread_mutex_t*)0x804a02c
$2 = {__data = {__lock = 2, __count = 0, __owner = 8824, __kind = 0, __nusers = 1, {__elision_data = {__espins = 0, 
        __elision = 0}, __list = {__next = 0x0}}}, 
  __size = "\002\000\000\000\000\000\000\000x\"\000\000\000\000\000\000\001\000\000\000\000\000\000", __align = 2}
(gdb) 

查找出这个把锁被线程8824 占着 这个线程已经退出了,没有释放这把锁。

如果在编译时不加入 -g 选项 在进行打印锁的地址时必须将相应的地址转化为实际的锁的类型
如果加上 -g 则可以直接进行打印

3、利用valgrind 查看
root@ubuntu:/wan/1.FlyInCoding/FlyInCodingVs/dead_lock# valgrind --tool=helgrind ./a.out 
==8655== Helgrind, a thread error detector
==8655== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==8655== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==8655== Command: ./a.out
==8655== 

^C==8655== 
==8655== Process terminating with default action of signal 2 (SIGINT)
==8655==    at 0x411E3CA: ??? (syscall-template.S:84)
==8655==    by 0x411E2FC: sleep (sleep.c:55)
==8655==    by 0x804869F: main (in /wan/1.FlyInCoding/FlyInCodingVs/dead_lock/a.out)
==8655== ---Thread-Announcement------------------------------------------
==8655== 
==8655== Thread #2 was created
==8655==    at 0x4155091: clone (clone.S:88)
==8655==    by 0x40561AD: create_thread (createthread.c:102)
==8655==    by 0x405799E: pthread_create@@GLIBC_2.1 (pthread_create.c:679)
==8655==    by 0x4031FD2: pthread_create_WRK (hg_intercepts.c:427)
==8655==    by 0x4032BF6: pthread_create@* (hg_intercepts.c:460)
==8655==    by 0x8048603: main (in /wan/1.FlyInCoding/FlyInCodingVs/dead_lock/a.out)
==8655== 
==8655== ----------------------------------------------------------------
==8655== 
==8655== Thread #2: Exiting thread still holds 1 lock
==8655==    at 0x80485AF: func(void*) (in /wan/1.FlyInCoding/FlyInCodingVs/dead_lock/a.out)
==8655==    by 0x403215C: mythread_wrapper (hg_intercepts.c:389)
==8655==    by 0x4057294: start_thread (pthread_create.c:333)
==8655==    by 0x41550AD: clone (clone.S:114)
==8655== 
==8655== 
==8655== For counts of detected and suppressed errors, rerun with: -v
==8655== Use --history-level=approx or =none to gain increased speed, at
==8655== the cost of reduced accuracy of conflicting-access information
==8655== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
发布了67 篇原创文章 · 获赞 15 · 访问量 7万+

猜你喜欢

转载自blog.csdn.net/wanxuexiang/article/details/88382808