记一次ceph bug gdb调试

环境:ceph10.2.3 armv7 32位,ceph编译环境是yocto

问题描述:在arm开发上测试ceph当启动mds进程的时候,mon进程就会挂掉。

ceph编译的时候默认就会有-g,编译出来可以直接用gdb调试。

接下来用gdb调试mon进程,ceph-mon是多进程的,gdb调试的时候要开启子线程调试模式。

  follow-fork-mode  detach-on-fork   说明

parent                   on               只调试主进程(GDB默认)
child                     on               只调试子进程
parent                   off              同时调试两个进程,gdb跟主进程,子进程block在fork位置
child                     off              同时调试两个进程,gdb跟子进程,主进程block在fork位置

1、启动gdb

root@node32:~# gdb
GNU gdb (GDB) 7.12.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-ap-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".

2、读入ceph-mon文件

(gdb) file ceph-mon
Reading symbols from ceph-mon...done.

3、设置运行参数

(gdb) set args -i node32
(gdb) 
(gdb) 
(gdb) 
(gdb) show args
Argument list to give program being debugged when it is started is "-i node32".

.4、开启多进程调试模式,gdb会阻塞主进程

(gdb) show follow-fork-mode
Debugger response to a program call of fork or vfork is "parent".
(gdb) show detach-on-fork
Whether gdb will detach the child of a fork is on.
(gdb) set follow-fork-mode child
(gdb) set detach-on-fork off
(gdb) 

5、运行程序

(gdb) run
Starting program: /usr/bin/ceph-mon -i node32
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0xb68d5ce0 (LWP 2036)]
[Thread 0xb68d5ce0 (LWP 2036) exited]
[New process 2037]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0xb68d5ce0 (LWP 2038)]
Reading symbols from /usr/lib/libtcmalloc.so.4...done.
Reading symbols from /usr/lib/libbz2.so.1...done.
Reading symbols from /lib/libz.so.1...done.
Reading symbols from /usr/lib/libleveldb.so.1...done.
Reading symbols from /usr/lib/libsnappy.so.1...done.
Reading symbols from /usr/lib/libnss3.so...done.
Reading symbols from /usr/lib/libnspr4.so...done.
Reading symbols from /lib/libpthread.so.0...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /usr/lib/libboost_thread.so.1.63.0...done.
Reading symbols from /usr/lib/libboost_random.so.1.63.0...done.
Reading symbols from /lib/librt.so.1...done.
Reading symbols from /usr/lib/libboost_iostreams.so.1.63.0...done.
Reading symbols from /usr/lib/libboost_system.so.1.63.0...done.
Reading symbols from /usr/lib/libstdc++.so.6...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libgcc_s.so.1...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux-armhf.so.3...done.
Reading symbols from /usr/lib/libunwind.so.8...done.
Reading symbols from /usr/lib/libnssutil3.so...done.
Reading symbols from /usr/lib/libplc4.so...done.
Reading symbols from /usr/lib/libplds4.so...done.
[New Thread 0xb48c5ce0 (LWP 2039)]
[New Thread 0xb40c5ce0 (LWP 2040)]
[New Thread 0xb38c5ce0 (LWP 2041)]
[New Thread 0xb30c5ce0 (LWP 2042)]
[New Thread 0xb28c5ce0 (LWP 2043)]
[New Thread 0xb20c5ce0 (LWP 2044)]
[New Thread 0xb18c5ce0 (LWP 2045)]
[New Thread 0xb10c5ce0 (LWP 2046)]
[New process 2047]

6、查看运行中的线程

(gdb) info inferiors 
  Num  Description       Executable        
  1    process 2033      /usr/bin/ceph-mon 
  2    process 2037      /usr/bin/ceph-mon 
* 3    <null>            /bin/bash.bash    
(gdb) 

可以看到现在在3号进程,现在3号进程没有进程号表示已经exit。

7、目前1号进程和2号进程都在阻塞状态中,切换到1号进程,continue

(gdb) inferior 1
[Switching to inferior 1 [process 2033] (/usr/bin/ceph-mon)]
[Switching to thread 1.1 (Thread 0xb6ff1010 (LWP 2033))]
#0  0xb6a15648 in __libc_fork () at /usr/src/debug/glibc/2.25-r0/git/sysdeps/nptl/fork.c:139
warning: Source file is more recent than executable.
139	  pid = ARCH_FORK ();
(gdb) where
#0  0xb6a15648 in __libc_fork () at /usr/src/debug/glibc/2.25-r0/git/sysdeps/nptl/fork.c:139
#1  0x7f6eba7c in Preforker::prefork (this=0xbeffeb70, err=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/common/Preforker.h:52
#2  0x7f692058 in main (argc=<optimized out>, argv=0x0) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/ceph_mon.cc:500
(gdb) c
Continuing.
[Inferior 1 (process 2033) exited normally]
(gdb) info inferiors 
  Num  Description       Executable        
* 1    <null>            /usr/bin/ceph-mon 
  2    process 2037      /usr/bin/ceph-mon 
(gdb) 

可以看到1号进程也退出了。切换到2号进程

8、切换到2号进程,并continue,进程2阻塞,等待客户端发送消息

9、在另一个开发板上启动mds进程

10、mon接收到消息并段错误

[New Thread 0xb08c5ce0 (LWP 2087)]
[New Thread 0xb05c5ce0 (LWP 2088)]

Thread 2.8 "ms_dispatch" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb20c5ce0 (LWP 2044)]
0xb6befe1c in std::local_Rb_tree_decrement (__x=0x7fc14b24 <_ZStL19piecewise_construct>)
    at ../../../../../../../../../../work-shared/gcc-5.4.0-r0/gcc-5.4.0/libstdc++-v3/src/c++98/tree.cc:98
98	../../../../../../../../../../work-shared/gcc-5.4.0-r0/gcc-5.4.0/libstdc++-v3/src/c++98/tree.cc: No such file or directory.
(gdb) 
Continuing.

Thread 2.8 "ms_dispatch" received signal SIGSEGV, Segmentation fault.
raise (sig=sig@entry=11) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51	/usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt 
#0  raise (sig=sig@entry=11) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f970930 in reraise_fatal (signum=11) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/global/signal_handler.cc:71
#2  handle_fatal_signal (signum=11) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/global/signal_handler.cc:133
#3  <signal handler called>
#4  0xb6befe1c in std::local_Rb_tree_decrement (__x=0x7fc14b24 <_ZStL19piecewise_construct>)
    at ../../../../../../../../../../work-shared/gcc-5.4.0-r0/gcc-5.4.0/libstdc++-v3/src/c++98/tree.cc:98
#5  0x7f7e585c in std::_Rb_tree_iterator<std::pair<mds_gid_t const, unsigned int> >::operator-- (this=<synthetic pointer>) at /usr/include/c++/5.4.0/bits/stl_tree.h:220
#6  std::_Rb_tree<mds_gid_t, std::pair<mds_gid_t const, unsigned int>, std::_Select1st<std::pair<mds_gid_t const, unsigned int> >, std::less<mds_gid_t>, std::allocator<std::pair<mds_gid_t const, unsigned int> > >::_M_get_insert_hint_unique_pos (__k=..., __position=..., this=0x855265dc) at /usr/include/c++/5.4.0/bits/stl_tree.h:1924
#7  std::_Rb_tree<mds_gid_t, std::pair<mds_gid_t const, unsigned int>, std::_Select1st<std::pair<mds_gid_t const, unsigned int> >, std::less<mds_gid_t>, std::allocator<std::pair<mds_gid_t const, unsigned int> > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<mds_gid_t const&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<mds_gid_t const, unsigned int> >, std::piecewise_construct_t const&, std::tuple<mds_gid_t const&>&&, std::tuple<>&&) (this=this@entry=0x855265dc, __pos=...) at /usr/include/c++/5.4.0/bits/stl_tree.h:2174
#8  0x7f9b538c in std::map<mds_gid_t, unsigned int, std::less<mds_gid_t>, std::allocator<std::pair<mds_gid_t const, unsigned int> > >::operator[] (__k=..., this=0x855265dc)
    at /usr/include/c++/5.4.0/bits/stl_map.h:483
#9  FSMap::insert (this=this@entry=0x85526518, new_info=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mds/FSMap.cc:794
#10 0x7f7d4c94 in MDSMonitor::prepare_beacon (this=this@entry=0x85526340, op=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/MDSMonitor.cc:549
#11 0x7f7da428 in MDSMonitor::prepare_update (this=this@entry=0x85526340, op=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/MDSMonitor.cc:469
#12 0x7f75bd20 in PaxosService::dispatch (this=this@entry=0x85526340, op=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/PaxosService.cc:96
#13 0x7f72021c in Monitor::dispatch_op (this=this@entry=0x855bc000, op=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/Monitor.cc:3605
#14 0x7f721078 in Monitor::_ms_dispatch (this=this@entry=0x855bc000, m=m@entry=0x855ff980) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/Monitor.cc:3532
#15 0x7f743414 in Monitor::ms_dispatch (this=0x855bc000, m=0x855ff980) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/Monitor.h:905
#16 0x7fb769b4 in Messenger::ms_deliver_dispatch (m=0x855ff980, this=0x855b0b00) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/msg/Messenger.h:584
#17 DispatchQueue::entry (this=0x855b0c80) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/msg/simple/DispatchQueue.cc:185
#18 0x7fa71d24 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/msg/simple/DispatchQueue.h:103
#19 0xb6d55f28 in start_thread (arg=0xb20c5ce0) at /usr/src/debug/glibc/2.25-r0/git/nptl/pthread_create.c:458
#20 0xb6a49968 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:76 from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

错误原因可以看到work-shared/gcc-5.4.0-r0/gcc-5.4.0/libstdc++-v3/src/c++98/tree.cc文件98行引发了一个段错误。

调试后发现因为ceph和gcc 5.4.0版编译器不匹配,编译器换成6.3.0问题解决,编译器6.3.0编译ceph会在pg scrub的是crash后来yocto的toolchain换成4.9.2后不再有pg scrub问题和mds crash问题
                                                                                                                                       

 

猜你喜欢

转载自my.oschina.net/u/2326998/blog/1619351