gdb分析堆栈破坏实例

一、定位bug性质和范围

1、带符号分析dump

$ gdb IMActivityServer.symbol core.32530
(gdb) bt
#0  0x0000000000a6951a in ?? ()
#1  0x00000000018c6db8 in ?? ()
#2  0x8f127f1911ab2800 in ?? ()
#3  0x00000000018c6d00 in ?? ()
#4  0x0000000400000004 in ?? ()
#5  0x00000000018c6d88 in ?? ()
#6  0x00000000006ba9bf in ?? ()
#7  0x00000000018a4400 in ?? ()
#8  0x8f127f1911ab2800 in ?? ()
#9  0x00000000018c6d00 in ?? ()
#10 0x00007f518b789010 in ?? ()
#11 0x00000000018c6d00 in ?? ()
#12 0x0000000000693166 in ?? ()
#13 0x0000000000000000 in ?? ()

看不出任何信息,日志也看不出什么,怀疑是堆栈破坏

2、增加堆栈保护, 用编译参数-fstack-protector-all为所有函数插入保护代码,编译版本,再次带符号查看崩溃dump

$ gdb IMActivityServer.symbol core.32530
(gdb) bt
#0  0x00007f08a95d7118 in ?? () from /lib64/libgcc_s.so.1
#1  0x00007f08a95d8019 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x00007f08a93110c6 in backtrace () from /lib64/libc.so.6
#3  0x00007f08a927c334 in __libc_message () from /lib64/libc.so.6
#4  0x00007f08a9314a77 in __fortify_fail () from /lib64/libc.so.6
#5  0x00007f08a9314a40 in __stack_chk_fail () from /lib64/libc.so.6
#6  0x00000000006909a9 in ActivityService::cmdMsgParse (this=<optimized out>, ptNullCmd=<optimized out>, 
    nCmdLen=<optimized out>) at ActivityServer.cpp:930
#7  0x00007f08a75ae1e8 in ?? ()
#8  0x00007f0818000978 in ?? ()
#9  0x00007f0818000a08 in ?? ()
#10 0x00007f0818000970 in ?? ()
#11 0x0000000000000000 in ?? ()

堆栈可以看出函数名了cmdMsgParse,查看源码文件ActivityServer.cpp:930是函数返回地址,断定是某一个消息引起的堆栈破坏,这个是统一消息处理函数,消息量很大,

3、增加消息号处理日志,再放一个版本
多台服务器的日志最后一行都是:

180531-12:43:51 ActivityServer[17701] ERROR: [ActivityInfoManager.cpp:1293] cmd(51) param(0) len(70) server(0) id(0)

基本可以判断是51号消息引起的

二、详细分析bug

51号消息是一个通用转发包装消息,需要解析内部消息内容,考虑下断点统一处理函数
ActivityInfoManager::msgParseTask

1、先找到函数定义,看是否正确,有源码可以省略

(gdb) info func ActivityInfoManager::msgParseTask
All functions matching regular expression "ActivityInfoManager::msgParseTask":

File ActivityInfoManager.cpp:
bool ActivityInfoManager::msgParseTask(Cmd::t_NullCmd const*, unsigned int, void const*);
bool ActivityInfoManager::msgParseTask(unsigned int, Cmd::t_NullCmd const*, unsigned int, void const*);
bool ActivityInfoManager::msgParseTaskByWebTransfer(Cmd::t_NullCmd const*, unsigned int, void const*);
bool ActivityInfoManager::msgParseTaskGM(Cmd::t_NullCmd const*, unsigned int, unsigned int);

2、再断下来解析参数

(gdb) break ActivityInfoManager.cpp:1293
(gdb) c
Continuing.
[Switching to Thread 0x7f83877fe700 (LWP 414)]

Breakpoint 1, ActivityInfoManager::msgParseTask (this=0x7f840c16e010, ptNullCmd=0x7f83877edc30, cmdLen=84, 
    server_param=0x7f83877edc20) at ActivityInfoManager.cpp:1293
1293    ActivityInfoManager.cpp: 没有那个文件或目录.
(gdb) info locals
__temp_format__ = <error reading variable __temp_format__ (can't compute CFA for this frame)>
s = 0x7f83877edc20
buffercmd = <error reading variable buffercmd (can't compute CFA for this frame)>
cmd = <optimized out>
pBase = <optimized out>
// 查看ptNullCmd的结构体定义,有源码可以省略
(gdb) info types Cmd::t_NullCmd
All types matching regular expression "Cmd::t_NullCmd":

File ../../common/zNullCmd.h:
Cmd::t_NullCmd;
Cmd::t_NullCmd;
Cmd::t_NullCmd;
Cmd::t_NullCmd;
Cmd::t_NullCmd;

// 显示结构体的内容,有源码可以省略
(gdb) ptype Cmd::t_NullCmd  
type = class Cmd::t_NullCmd {  
  public:  
    union {  
        <no data fields>  
    };  

    void t_NullCmd(BYTE, BYTE);  
}  

// 显示参数详细内容
(gdb) p *(const Cmd::t_NullCmd *) 0x7f83877edc30
$11 = {{{byCmd = 51 '3', byParam = 0 '\000'}, {cmd = 51 '3', para = 0 '\000'}}}
(gdb) p *(Cmd::Activity::stServerParam*)0x7f83877fdc10
$6 = {asServer = 0 '\000', type = 0, serverid = 0}

// 查看消息值
(gdb) p ptNullCmd->cmd
$2 = 33 '!'

3、根据参数值下条件断点

// 清除老断点
(gdb) clear
Deleted breakpoint 1 
// 下条件断点
(gdb) break ActivityInfoManager.cpp:1293 if ptNullCmd->cmd == 51
Breakpoint 2 at 0x6bdd50: file ActivityInfoManager.cpp, line 1293.
(gdb) c
Continuing.

4、分析51号消息

扫描二维码关注公众号,回复: 2388404 查看本文章
// 查看断下来的消息
(gdb) p *(const Cmd::t_NullCmd *) 0x7f83877fdc30
$11 = {{{byCmd = 51 '3', byParam = 0 '\000'}, {cmd = 51 '3', para = 0 '\000'}}}

// 51号消息结构等价于
struct stActivityInCmd{
    BYTE cmd;
    BYTE para;
    DWORD ActId;
    WORD size;
    char data[0]
}
// 消息在内部data中,可以知道data是stActivityInCmd结构体地址+8字节,头也是Cmd::t_NullCmd结构体
(gdb) p *(const Cmd::t_NullCmd *) 0x7f83877fdc30+8
$11 = {{{byCmd = 51 '3', byParam = 12 '\f'}, {cmd = 51 '3', para = 12 '\f'}}}
// 可以看出是51号消息,子消息号是12,查源码知道消息是'Cmd::Activity::stOpMount'
// 显示详细结构内容
(gdb) p *(Cmd::Activity::stOpMount*)0x7f83877fdc38
$10 = {<Cmd::t_NullCmd> = {{{byCmd = 51 '3', byParam = 12 '\f'}, {cmd = 51 '3', para = 12 '\f'}}}, dwReqFunction = 10, 
  szName = "领工资\000sigin_pay_map_.size:[%d]\000", byOpType = 1 '\001', dwIndex = 0, dwUserId = 22684283, wAddType = 32, 
  dwTimeStart = 1527740441, dwTimeEnd = 1528345241, bNeedDelMount = false, wDelType = 0, bExtension = true}

查看消息Cmd::Activity::stOpMount的处理流程,发现一处堆栈覆盖问题

Cmd::Activity::stActivityInCmd cmd;
...
bcopy(rev->data, cmd.data, rev->size);
...

消息没有初始化就使用了,直接往data里面写数据,参考上面的结构体定义,data指向的是结构体堆栈末尾,导致数据直接写入了堆栈中,覆盖了原有堆栈内容。

三、修复bug

修复的方法很简单,初始化一下结构体再使用就可以了。

四、gdb打印日志

$ gdb attach 28644
// 加载符号
(gdb) symbol-file IMActivityServer.symbol 
Reading symbols from /home/ztgame/IMTESTVERSION/release/IMActivityServer.symbol...done.
// 开启日志
(gdb) set logging on
Future logs will be written to gdb.txt.
Copying output to gdb.txt.
// 下断点
(gdb) break ActivityInfoManager.cpp:1290
Breakpoint 2 at 0x6b70f0: file ActivityInfoManager.cpp, line 1290.
// 导入python库
(gdb) python import datetime
// 增加断点脚本命令
(gdb) commands 2    //指令集设置命令,断点序号
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>silent         //断点触发时不打印断点信息
>python gdb.execute("set $now=\"" + datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') + "\"")
>printf "%s cmd(%u) param(%u) len(%u) srvtype(%u) srvid(%u)\n",$now,ptNullCmd->cmd, ptNullCmd->para,cmdLen,((Cmd::Activity::stServerParam*)server_param)->type,((Cmd::Activity::stServerParam*)server_param)->serverid
>continue
>end    //指令集设置结束时必须用end结束
(gdb) c

打开gdb.txt

2018-05-31 19:45:45 cmd(17) param(1) len(38) srvtype(43) srvid(4301)
2018-05-31 19:45:45 cmd(17) param(1) len(49) srvtype(43) srvid(4301)
2018-05-31 19:45:45 cmd(17) param(1) len(49) srvtype(43) srvid(4301)
2018-05-31 19:45:45 cmd(17) param(1) len(38) srvtype(43) srvid(4301)
2018-05-31 19:45:45 cmd(17) param(1) len(38) srvtype(43) srvid(4301)
2018-05-31 19:45:46 cmd(22) param(14) len(54) srvtype(150) srvid(15000)

猜你喜欢

转载自blog.csdn.net/mergerly/article/details/80523750