Core file can provide us with first-hand when mysql crash site conditions, which is very important reason mysqld crash analysis to us. But in a production environment, the database will often take up memory up to tens of G, or even hundreds of G, core file will be very large, because it contains all mysqld memory information. Therefore, the production environment to enable core file dump have to consider disk space, as well as after the restart time (mysqld crash, mysqld_safe will restart it, but need to wait to restart the core file dump is completed, the time memory is written to disk hundreds of G's needs would be more long. after testing, 700M core file dump takes a few seconds). But after all core file provides a way to analyze the problem, it is worth trying to explore.
surroundings
Modify the configuration mysql
In the my.cnf configuration file, enable core-file
1 2
[mysqld] core-file
Note that this only core-file can be written on, and core-file = ON or core_file = ON then lead to not start mysqld.
Then enter mysql, view core_file variable, in effect:
1 2 3 4 5 6 7 8
mysql> show global variables like 'core_file'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | core_file | ON | +---------------+-------+ 1 row in set (0.01 sec)
core file size limit viewing system mysqld process
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[root@stg-p2pbusiness-mysql-01 ~]# cat /proc/`pidof mysqld`/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 10485760 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 31405 31405 processes Max open files 10000 10000 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 31405 31405 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us
When you enable core-file, we found Max core file size is unlimited. == and not by setting corefile-size in [the mysqld_safe] to restrict the size == corefile.
We generally start by mysqld_safe mysqld, mysqld startup time to change the user / group, this situation needs to be set suid_dumpable 1, the system of production coredump (mysqld restart to take effect) for the mysqld process.
1
echo 1> / proc / sys / fs / suid_dumpable
In a plenty of disk space, create a directory to save the core file, modify the permissions to 777 to prevent writing failures, core_pattern modify the system parameters to point to the new directory. The default is 1 core_uses_pid that core file document named core.pid
We can see Segmentation fault (core dumped), and then restart the mysqld_safe mysqld. View core file document that has more than 700 M.
1 2 3 4 5
[root@stg-p2pbusiness-mysql-01 corefiles]# ls -lh total 704M -rw------- 1 mysql mysql 7.0G Jul 22 17:29 core.14609 [root@stg-p2pbusiness-mysql-01 corefiles]# du -sh * 704M core.14609
This document is a sparse (sparse file), there are many voids. Document size by ls and du command displays are different. The actual amount of disk size du command displays, ls is the logical document size. View mysqld actually occupied by the top command memory, 700M is about the size of similar (RES), and du command displays. The document size similar Virt virtual memory (7G) is displayed with ls.
1 2 3
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14691 mysql 20 0 7797m 714m 10m S 0.0 8.9 0:01.36 mysqld
The core of the document and then a deeper look
1 2 3 4 5 6 7 8 9 10 11
[root@stg-p2pbusiness-mysql-01 corefiles]# file core.14609 core.14609: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/mysqld --basedir=/usr --datadir=/dbfiles/mysql_home/data --plugin-dir', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/mysqld', platform: 'x86_64'
(gdb) info thread 40 Thread 0x7fb234894700 (LWP 14622) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 39 Thread 0x7fb23a8d9700 (LWP 14652) 0x00007fb3db997585 in sigwait () from /lib64/libpthread.so.0 38 Thread 0x7fb2121fc700 (LWP 14653) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 37 Thread 0x7fb233e93700 (LWP 14623) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 36 Thread 0x7fb2401b2700 (LWP 14641) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 35 Thread 0x7fb22d088700 (LWP 14634) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 34 Thread 0x7fb23028d700 (LWP 14629) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 33 Thread 0x7fb232a91700 (LWP 14625) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 32 Thread 0x7fb22840b700 (LWP 14656) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 31 Thread 0x7fb232090700 (LWP 14626) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 30 Thread 0x7fb23bbab700 (LWP 14648) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 29 Thread 0x7fb22f88c700 (LWP 14630) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 28 Thread 0x7fb236697700 (LWP 14619) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 27 Thread 0x7fb23c5ac700 (LWP 14647) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 26 Thread 0x7fb23d9ae700 (LWP 14645) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 25 Thread 0x7fb230c8e700 (LWP 14628) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 24 Thread 0x7fb212bfd700 (LWP 14651) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 23 Thread 0x7fb2135fe700 (LWP 14650) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 22 Thread 0x7fb213fff700 (LWP 14649) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 21 Thread 0x7fb203fff700 (LWP 14663) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 20 Thread 0x7fb23edb0700 (LWP 14643) 0x00007fb3db99700d in nanosleep () from /lib64/libpthread.so.0 19 Thread 0x7fb240bb3700 (LWP 14640) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 18 Thread 0x7fb22a884700 (LWP 14638) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 17 Thread 0x7fb22da89700 (LWP 14633) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 16 Thread 0x7fb23e3af700 (LWP 14644) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 15 Thread 0x7fb23cfad700 (LWP 14646) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 14 Thread 0x7fb22e48a700 (LWP 14632) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 13 Thread 0x7fb22c687700 (LWP 14635) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 12 Thread 0x7fb22ee8b700 (LWP 14631) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 11 Thread 0x7fb23168f700 (LWP 14627) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 10 Thread 0x7fb235295700 (LWP 14621) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 9 Thread 0x7fb22b285700 (LWP 14637) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 8 Thread 0x7fb233492700 (LWP 14624) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 7 Thread 0x7fb23f7b1700 (LWP 14642) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7fb235c96700 (LWP 14620) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 5 Thread 0x7fb237098700 (LWP 14618) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 4 Thread 0x7fb237a99700 (LWP 14617) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1 3 Thread 0x7fb22bc86700 (LWP 14636) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- * 2 Thread 0x7fb3d25f1700 (LWP 14610) 0x00007fb3da4386c7 in sigwaitinfo () from /lib64/libc.so.6 1 Thread 0x7fb3dbdb7720 (LWP 14609) 0x00007fb3db99497c in pthread_kill () from /lib64/libpthread.so.0
我们可以通过thread n ,切换到某一线程,然后使用bt命令查看该线程的堆栈:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(gdb) thread 1 [Switching to thread 1 (Thread 0x7fb3dbdb7720 (LWP 14609))]#0 0x00007fb3db99497c in pthread_kill () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007fb3db99497c in pthread_kill () from /lib64/libpthread.so.0 #1 0x00000000007d26d4 in handle_fatal_signal (sig=11) at /export/home/pb2/build/sb_0-27500212-1520171533.24/rpm/BUILD/mysql-5.7.22/mysql-5.7.22/sql/signal_handler.cc:220 #2 <signal handler called> #3 0x00007fb3da4e4383 in poll () from /lib64/libc.so.6 #4 0x0000000000dedda8 in Mysqld_socket_listener::listen_for_connection_event (this=0x3614ad0) at /export/home/pb2/build/sb_0-27500212-1520171533.24/rpm/BUILD/mysql-5.7.22/mysql-5.7.22/sql/conn_handler/socket_connection.cc:852 #5 0x00000000007cd0b9 in connection_event_loop (argc=53, argv=0x34e7338) at /export/home/pb2/build/sb_0-27500212-1520171533.24/rpm/BUILD/mysql-5.7.22/mysql-5.7.22/sql/conn_handler/connection_acceptor.h:66 #6 mysqld_main (argc=53, argv=0x34e7338) at /export/home/pb2/build/sb_0-27500212-1520171533.24/rpm/BUILD/mysql-5.7.22/mysql-5.7.22/sql/mysqld.cc:5132 #7 0x00007fb3da423d1d in __libc_start_main () from /lib64/libc.so.6 #8 0x00000000007c2699 in _start ()
key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=2 max_threads=300 thread_count=1 connection_count=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2508204 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x35)[0xf4fe15] /usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x7d2774] /lib64/libpthread.so.0(+0xf7e0)[0x7fb3db9977e0] /lib64/libc.so.6(__poll+0x53)[0x7fb3da4e4383] /usr/sbin/mysqld(_ZN22Mysqld_socket_listener27listen_for_connection_eventEv+0x38)[0xdedda8] /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1819)[0x7cd0b9] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fb3da423d1d] /usr/sbin/mysqld[0x7c2699] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Writing a core file
在有些情况下,mysqld无法在堆栈中解析出函数信息,只有一些16进制的数字:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
mysqld got signal 11; Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x41fd0110 thread_stack 0x40000 [0x9da402] [0x6648e9] [0x7f1a5af000f0] [0x7f1a5a10f0f2] [0x7412cb] [0x688354] [0x688494] [0x67a170] [0x67f0ad] [0x67fdf8] [0x6811b6] [0x66e05e]
In this case resolve_stack_dump can use tools to help analyze the call stack. mysql official documentation are described in detail (mysql5.7 doc chapter 28.5.1.5)