MySQL Core File Dump analysis of actual combat

Core file can provide us with first-hand when mysql crash site conditions, which is very important reason mysqld crash analysis to us. But in a production environment, the database will often take up memory up to tens of G, or even hundreds of G, core file will be very large, because it contains all mysqld memory information. Therefore, the production environment to enable core file dump have to consider disk space, as well as after the restart time (mysqld crash, mysqld_safe will restart it, but need to wait to restart the core file dump is completed, the time memory is written to disk hundreds of G's needs would be more long. after testing, 700M core file dump takes a few seconds). But after all core file provides a way to analyze the problem, it is worth trying to explore.

surroundings

Modify the configuration mysql

In the my.cnf configuration file, enable core-file

1
2
[mysqld]
core-file

Note that this only core-file can be written on, and core-file = ON or core_file = ON then lead to not start mysqld.

Then enter mysql, view core_file variable, in effect:

1
2
3
4
5
6
7
8

mysql> show global variables like 'core_file';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| core_file | ON |
+---------------+-------+
1 row in set (0.01 sec)

core file size limit viewing system mysqld process

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@stg-p2pbusiness-mysql-01 ~]# cat /proc/`pidof mysqld`/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 31405 31405 processes
Max open files 10000 10000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 31405 31405 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

When you enable core-file, we found Max core file size is unlimited. == and not by setting corefile-size in [the mysqld_safe] to restrict the size == corefile.

1
2
3
[mysqld_safe]
core-file-size=1024 # 1024 * 512 bytes
open-files-limit=10000

Modify system parameters

We generally start by mysqld_safe mysqld, mysqld startup time to change the user / group, this situation needs to be set suid_dumpable 1, the system of production coredump (mysqld restart to take effect) for the mysqld process.

1
echo 1> / proc / sys / fs / suid_dumpable

In a plenty of disk space, create a directory to save the core file, modify the permissions to 777 to prevent writing failures, core_pattern modify the system parameters to point to the new directory. The default is 1 core_uses_pid that core file document named core.pid

1
2
3
4
mkdir corefiles
chmod 777 corefiles
echo “/dbfiles/corefiles/core” > /proc/sys/kernel/core_pattern
echo “1” > /proc/sys/kernel/core_uses_pid

Use kill -11 command pidof mysqldor kill -sigsegv pidof mysqldanalog Segmentation fault, so mysqld crash.

1
2
[root@stg-p2pbusiness-mysql-01 corefiles]# kill -11 `pidof mysqld`
[root@stg-p2pbusiness-mysql-01 corefiles]# /usr/bin/mysqld_safe: line 198: 14609 Segmentation fault (core dumped) nohup /usr/sbin/mysqld --basedir=/usr --datadir=/dbfiles/mysql_home/data --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=stg-p2pbusiness-mysql-01.localhost.localdomain.err --open-files-limit=10000 --pid-file=/var/run/mysqld/mysqld.pid --socket=/dbfiles/mysql_home/data/mysql.sock --port=3306 < /dev/null > /dev/null 2>&1

We can see Segmentation fault (core dumped), and then restart the mysqld_safe mysqld. View core file document that has more than 700 M.

1
2
3
4
5
[root@stg-p2pbusiness-mysql-01 corefiles]# ls -lh
total 704M
-rw------- 1 mysql mysql 7.0G Jul 22 17:29 core.14609
[root@stg-p2pbusiness-mysql-01 corefiles]# du -sh *
704M core.14609

This document is a sparse (sparse file), there are many voids. Document size by ls and du command displays are different. The actual amount of disk size du command displays, ls is the logical document size. View mysqld actually occupied by the top command memory, 700M is about the size of similar (RES), and du command displays. The document size similar Virt virtual memory (7G) is displayed with ls.

1
2
3

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14691 mysql 20 0 7797m 714m 10m S 0.0 8.9 0:01.36 mysqld

The core of the document and then a deeper look

1
2
3
4
5
6
7
8
9
10
11
[root@stg-p2pbusiness-mysql-01 corefiles]# file core.14609
core.14609: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/mysqld --basedir=/usr --datadir=/dbfiles/mysql_home/data --plugin-dir', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/mysqld', platform: 'x86_64'

[root@stg-p2pbusiness-mysql-01 corefiles]# stat core.14609
File: `core.14609'
Size: 7504252928 Blocks: 1440384 IO Block: 4096 regular file
Device: fd02h/64770d Inode: 3670018 Links: 1
Access: (0600/-rw-------) Uid: ( 498/ mysql) Gid: ( 498/ mysql)
Access: 2018-07-22 17:34:15.689005465 +0800
Modify: 2018-07-22 17:29:56.449005465 +0800
Change: 2018-07-22 17:29:56.449005465 +0800

使用GDB加载core file,查看堆栈(需要安装mysql相应版本的debuginfo包: debuginfo-install mysql-community-server-5.7.22-1.el6.x86_64)

1
gdb mysqld core.14609

使用 info thread 查看当前状态mysqld内部线程,注意(LWP 14622)中的数字是thread_os_id

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
(gdb) info thread
40 Thread 0x7fb234894700 (LWP 14622) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
39 Thread 0x7fb23a8d9700 (LWP 14652) 0x00007fb3db997585 in sigwait () from /lib64/libpthread.so.0
38 Thread 0x7fb2121fc700 (LWP 14653) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
37 Thread 0x7fb233e93700 (LWP 14623) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
36 Thread 0x7fb2401b2700 (LWP 14641) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
35 Thread 0x7fb22d088700 (LWP 14634) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
34 Thread 0x7fb23028d700 (LWP 14629) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
33 Thread 0x7fb232a91700 (LWP 14625) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
32 Thread 0x7fb22840b700 (LWP 14656) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
31 Thread 0x7fb232090700 (LWP 14626) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
30 Thread 0x7fb23bbab700 (LWP 14648) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
29 Thread 0x7fb22f88c700 (LWP 14630) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
28 Thread 0x7fb236697700 (LWP 14619) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
27 Thread 0x7fb23c5ac700 (LWP 14647) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
26 Thread 0x7fb23d9ae700 (LWP 14645) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
25 Thread 0x7fb230c8e700 (LWP 14628) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
24 Thread 0x7fb212bfd700 (LWP 14651) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
23 Thread 0x7fb2135fe700 (LWP 14650) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
22 Thread 0x7fb213fff700 (LWP 14649) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
21 Thread 0x7fb203fff700 (LWP 14663) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
20 Thread 0x7fb23edb0700 (LWP 14643) 0x00007fb3db99700d in nanosleep () from /lib64/libpthread.so.0
19 Thread 0x7fb240bb3700 (LWP 14640) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
18 Thread 0x7fb22a884700 (LWP 14638) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
17 Thread 0x7fb22da89700 (LWP 14633) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
16 Thread 0x7fb23e3af700 (LWP 14644) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
15 Thread 0x7fb23cfad700 (LWP 14646) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
14 Thread 0x7fb22e48a700 (LWP 14632) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
13 Thread 0x7fb22c687700 (LWP 14635) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
12 Thread 0x7fb22ee8b700 (LWP 14631) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
11 Thread 0x7fb23168f700 (LWP 14627) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
10 Thread 0x7fb235295700 (LWP 14621) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
9 Thread 0x7fb22b285700 (LWP 14637) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
8 Thread 0x7fb233492700 (LWP 14624) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
7 Thread 0x7fb23f7b1700 (LWP 14642) 0x00007fb3db993a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
6 Thread 0x7fb235c96700 (LWP 14620) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
5 Thread 0x7fb237098700 (LWP 14618) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
4 Thread 0x7fb237a99700 (LWP 14617) 0x00007fb3db787614 in ?? () from /lib64/libaio.so.1
3 Thread 0x7fb22bc86700 (LWP 14636) 0x00007fb3db99368c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
* 2 Thread 0x7fb3d25f1700 (LWP 14610) 0x00007fb3da4386c7 in sigwaitinfo () from /lib64/libc.so.6
1 Thread 0x7fb3dbdb7720 (LWP 14609) 0x00007fb3db99497c in pthread_kill () from /lib64/libpthread.so.0

我们可以通过thread n ,切换到某一线程,然后使用bt命令查看该线程的堆栈:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(gdb) thread 1
[Switching to thread 1 (Thread 0x7fb3dbdb7720 (LWP 14609))]#0 0x00007fb3db99497c in pthread_kill () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007fb3db99497c in pthread_kill () from /lib64/libpthread.so.0
#1 0x00000000007d26d4 in handle_fatal_signal (sig=11) at /export/home/pb2/build/sb_0-27500212-1520171533.24/rpm/BUILD/mysql-5.7.22/mysql-5.7.22/sql/signal_handler.cc:220
#2 <signal handler called>
#3 0x00007fb3da4e4383 in poll () from /lib64/libc.so.6
#4 0x0000000000dedda8 in Mysqld_socket_listener::listen_for_connection_event (this=0x3614ad0)
at /export/home/pb2/build/sb_0-27500212-1520171533.24/rpm/BUILD/mysql-5.7.22/mysql-5.7.22/sql/conn_handler/socket_connection.cc:852
#5 0x00000000007cd0b9 in connection_event_loop (argc=53, argv=0x34e7338)
at /export/home/pb2/build/sb_0-27500212-1520171533.24/rpm/BUILD/mysql-5.7.22/mysql-5.7.22/sql/conn_handler/connection_acceptor.h:66
#6 mysqld_main (argc=53, argv=0x34e7338) at /export/home/pb2/build/sb_0-27500212-1520171533.24/rpm/BUILD/mysql-5.7.22/mysql-5.7.22/sql/mysqld.cc:5132
#7 0x00007fb3da423d1d in __libc_start_main () from /lib64/libc.so.6
#8 0x00000000007c2699 in _start ()

有些堆栈与mysql在crash的时候打到error log中的堆栈是一致的,只是更详细一些。mysqld error log中只打印导致mysqld crash的线程堆栈,更有针对性,只是解析出来的函数信息比较模糊! mysqld error log:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=2
max_threads=300
thread_count=1
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2508204 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0xf4fe15]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x7d2774]
/lib64/libpthread.so.0(+0xf7e0)[0x7fb3db9977e0]
/lib64/libc.so.6(__poll+0x53)[0x7fb3da4e4383]
/usr/sbin/mysqld(_ZN22Mysqld_socket_listener27listen_for_connection_eventEv+0x38)[0xdedda8]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1819)[0x7cd0b9]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fb3da423d1d]
/usr/sbin/mysqld[0x7c2699]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file

在有些情况下,mysqld无法在堆栈中解析出函数信息,只有一些16进制的数字:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
mysqld got signal 11;
Attempting backtrace. You can use the following information
to find out where mysqld died. If you see no messages after
this, something went terribly wrong...
stack_bottom = 0x41fd0110 thread_stack 0x40000
[0x9da402]
[0x6648e9]
[0x7f1a5af000f0]
[0x7f1a5a10f0f2]
[0x7412cb]
[0x688354]
[0x688494]
[0x67a170]
[0x67f0ad]
[0x67fdf8]
[0x6811b6]
[0x66e05e]

In this case resolve_stack_dump can use tools to help analyze the call stack. mysql official documentation are described in detail (mysql5.7 doc chapter 28.5.1.5)

Original link large column  https://www.dazhuanlan.com/2019/08/17/5d576c3a91393/

Guess you like

Origin www.cnblogs.com/chinatrump/p/11417278.html