Remember a ceph bug gdb debugging

Environment: ceph10.2.3 armv7 32-bit, ceph compilation environment is yocto

Problem description: When testing ceph on arm development, when the mds process is started, the mon process will hang.

When ceph is compiled, there will be -g by default, and it can be directly debugged with gdb when it is compiled.

Next, use gdb to debug the mon process, ceph-mon is multi-process, and the sub-thread debugging mode should be turned on when gdb debugs.

  follow-fork-mode  detach-on-fork   说明

parent on debugs only the main process (GDB default)
child on debugs only
the
child process block in fork position

1. Start gdb

root@node32:~# gdb
GNU gdb (GDB) 7.12.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-ap-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".

2. Read the ceph-mon file

(gdb) file ceph-mon
Reading symbols from ceph-mon...done.

3. Set operating parameters

(gdb) set args -i node32
(gdb) 
(gdb) 
(gdb) 
(gdb) show args
Argument list to give program being debugged when it is started is "-i node32".

.4. Turn on multi-process debugging mode, gdb will block the main process

(gdb) show follow-fork-mode
Debugger response to a program call of fork or vfork is "parent".
(gdb) show detach-on-fork
Whether gdb will detach the child of a fork is on.
(gdb) set follow-fork-mode child
(gdb) set detach-on-fork off
(gdb) 

5. Run the program

(gdb) run
Starting program: /usr/bin/ceph-mon -i node32
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0xb68d5ce0 (LWP 2036)]
[Thread 0xb68d5ce0 (LWP 2036) exited]
[New process 2037]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0xb68d5ce0 (LWP 2038)]
Reading symbols from /usr/lib/libtcmalloc.so.4...done.
Reading symbols from /usr/lib/libbz2.so.1...done.
Reading symbols from /lib/libz.so.1...done.
Reading symbols from /usr/lib/libleveldb.so.1...done.
Reading symbols from /usr/lib/libsnappy.so.1...done.
Reading symbols from /usr/lib/libnss3.so...done.
Reading symbols from /usr/lib/libnspr4.so...done.
Reading symbols from /lib/libpthread.so.0...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /usr/lib/libboost_thread.so.1.63.0...done.
Reading symbols from /usr/lib/libboost_random.so.1.63.0...done.
Reading symbols from /lib/librt.so.1...done.
Reading symbols from /usr/lib/libboost_iostreams.so.1.63.0...done.
Reading symbols from /usr/lib/libboost_system.so.1.63.0...done.
Reading symbols from /usr/lib/libstdc++.so.6...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libgcc_s.so.1...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux-armhf.so.3...done.
Reading symbols from /usr/lib/libunwind.so.8...done.
Reading symbols from /usr/lib/libnssutil3.so...done.
Reading symbols from /usr/lib/libplc4.so...done.
Reading symbols from /usr/lib/libplds4.so...done.
[New Thread 0xb48c5ce0 (LWP 2039)]
[New Thread 0xb40c5ce0 (LWP 2040)]
[New Thread 0xb38c5ce0 (LWP 2041)]
[New Thread 0xb30c5ce0 (LWP 2042)]
[New Thread 0xb28c5ce0 (LWP 2043)]
[New Thread 0xb20c5ce0 (LWP 2044)]
[New Thread 0xb18c5ce0 (LWP 2045)]
[New Thread 0xb10c5ce0 (LWP 2046)]
[New process 2047]

6. View running threads

(gdb) info inferiors 
  Num  Description       Executable        
  1    process 2033      /usr/bin/ceph-mon 
  2    process 2037      /usr/bin/ceph-mon 
* 3    <null>            /bin/bash.bash    
(gdb) 

It can be seen that it is now in process No. 3, and now process No. 3 has no process number to indicate that it has exited.

7. Currently process No. 1 and No. 2 are in the blocking state, switch to process No. 1, continue

(gdb) inferior 1
[Switching to inferior 1 [process 2033] (/usr/bin/ceph-mon)]
[Switching to thread 1.1 (Thread 0xb6ff1010 (LWP 2033))]
#0  0xb6a15648 in __libc_fork () at /usr/src/debug/glibc/2.25-r0/git/sysdeps/nptl/fork.c:139
warning: Source file is more recent than executable.
139	  pid = ARCH_FORK ();
(gdb) where
#0  0xb6a15648 in __libc_fork () at /usr/src/debug/glibc/2.25-r0/git/sysdeps/nptl/fork.c:139
#1  0x7f6eba7c in Preforker::prefork (this=0xbeffeb70, err=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/common/Preforker.h:52
#2  0x7f692058 in main (argc=<optimized out>, argv=0x0) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/ceph_mon.cc:500
(gdb) c
Continuing.
[Inferior 1 (process 2033) exited normally]
(gdb) info inferiors 
  Num  Description       Executable        
* 1    <null>            /usr/bin/ceph-mon 
  2    process 2037      /usr/bin/ceph-mon 
(gdb) 

It can be seen that process No. 1 also exits. switch to process 2

8. Switch to process No. 2 and continue, process 2 blocks, waiting for the client to send a message

9. Start the mds process on another development board

10. mon receives the message and segfaults

[New Thread 0xb08c5ce0 (LWP 2087)]
[New Thread 0xb05c5ce0 (LWP 2088)]

Thread 2.8 "ms_dispatch" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb20c5ce0 (LWP 2044)]
0xb6befe1c in std::local_Rb_tree_decrement (__x=0x7fc14b24 <_ZStL19piecewise_construct>)
    at ../../../../../../../../../../work-shared/gcc-5.4.0-r0/gcc-5.4.0/libstdc++-v3/src/c++98/tree.cc:98
98	../../../../../../../../../../work-shared/gcc-5.4.0-r0/gcc-5.4.0/libstdc++-v3/src/c++98/tree.cc: No such file or directory.
(gdb) 
Continuing.

Thread 2.8 "ms_dispatch" received signal SIGSEGV, Segmentation fault.
raise (sig=sig@entry=11) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51	/usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt 
#0  raise (sig=sig@entry=11) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f970930 in reraise_fatal (signum=11) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/global/signal_handler.cc:71
#2  handle_fatal_signal (signum=11) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/global/signal_handler.cc:133
#3  <signal handler called>
#4  0xb6befe1c in std::local_Rb_tree_decrement (__x=0x7fc14b24 <_ZStL19piecewise_construct>)
    at ../../../../../../../../../../work-shared/gcc-5.4.0-r0/gcc-5.4.0/libstdc++-v3/src/c++98/tree.cc:98
#5  0x7f7e585c in std::_Rb_tree_iterator<std::pair<mds_gid_t const, unsigned int> >::operator-- (this=<synthetic pointer>) at /usr/include/c++/5.4.0/bits/stl_tree.h:220
#6  std::_Rb_tree<mds_gid_t, std::pair<mds_gid_t const, unsigned int>, std::_Select1st<std::pair<mds_gid_t const, unsigned int> >, std::less<mds_gid_t>, std::allocator<std::pair<mds_gid_t const, unsigned int> > >::_M_get_insert_hint_unique_pos (__k=..., __position=..., this=0x855265dc) at /usr/include/c++/5.4.0/bits/stl_tree.h:1924
#7  std::_Rb_tree<mds_gid_t, std::pair<mds_gid_t const, unsigned int>, std::_Select1st<std::pair<mds_gid_t const, unsigned int> >, std::less<mds_gid_t>, std::allocator<std::pair<mds_gid_t const, unsigned int> > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<mds_gid_t const&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<mds_gid_t const, unsigned int> >, std::piecewise_construct_t const&, std::tuple<mds_gid_t const&>&&, std::tuple<>&&) (this=this@entry=0x855265dc, __pos=...) at /usr/include/c++/5.4.0/bits/stl_tree.h:2174
#8  0x7f9b538c in std::map<mds_gid_t, unsigned int, std::less<mds_gid_t>, std::allocator<std::pair<mds_gid_t const, unsigned int> > >::operator[] (__k=..., this=0x855265dc)
    at /usr/include/c++/5.4.0/bits/stl_map.h:483
#9  FSMap::insert (this=this@entry=0x85526518, new_info=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mds/FSMap.cc:794
#10 0x7f7d4c94 in MDSMonitor::prepare_beacon (this=this@entry=0x85526340, op=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/MDSMonitor.cc:549
#11 0x7f7da428 in MDSMonitor::prepare_update (this=this@entry=0x85526340, op=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/MDSMonitor.cc:469
#12 0x7f75bd20 in PaxosService::dispatch (this=this@entry=0x85526340, op=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/PaxosService.cc:96
#13 0x7f72021c in Monitor::dispatch_op (this=this@entry=0x855bc000, op=...) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/Monitor.cc:3605
#14 0x7f721078 in Monitor::_ms_dispatch (this=this@entry=0x855bc000, m=m@entry=0x855ff980) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/Monitor.cc:3532
#15 0x7f743414 in Monitor::ms_dispatch (this=0x855bc000, m=0x855ff980) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/mon/Monitor.h:905
#16 0x7fb769b4 in Messenger::ms_deliver_dispatch (m=0x855ff980, this=0x855b0b00) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/msg/Messenger.h:584
#17 DispatchQueue::entry (this=0x855b0c80) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/msg/simple/DispatchQueue.cc:185
#18 0x7fa71d24 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-src/10.2.3-r0/git/src/msg/simple/DispatchQueue.h:103
#19 0xb6d55f28 in start_thread (arg=0xb20c5ce0) at /usr/src/debug/glibc/2.25-r0/git/nptl/pthread_create.c:458
#20 0xb6a49968 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:76 from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

The cause of the error can be seen in the work-shared/gcc-5.4.0-r0/gcc-5.4.0/libstdc++-v3/src/c++98/tree.cc file, which caused a segmentation fault on line 98.

After debugging, it was found that because the ceph and gcc version 5.4.0 compilers did not match, the problem was solved by replacing the compiler with 6.3.0. When the compiler 6.3.0 compiled ceph, the pg scrub was crash, and the yocto toolchain was replaced with 4.9.2. No more pg scrub issues and mds crash issues
                                                                                                                                       

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324932724&siteId=291194637