How to find Which Process Is Killing mysqld With SIGKILL or SIGTERM on Linux

To determine which process is sending the signal to mysqld, it is necessary to trace the signals through the Linux kernel. Two options to do this are:

  1. the audit log (auditd)
  2. systemtap

Each of these methods will be discussed in the following sections.

Audit Log

The audit log is simple to set up, but does not provide fine grained control of which processes and signals are monitored; everything is included. So the log can become quite noisy, so it is recommended to disable the monitoring as soon as the process has been determined. The steps are:

1. Configure auditd to monitor for signals. This can be done runtime or through the auditd configuration file (/etc/audit/audit.rules). As the log output added is fairly noisy (it logs all signals even kill -0 i.e. checking whether a process is alive) and it the change is made in order to debug a single issue, it is usually preferable to make the change at runtime. You do this with the command:

 auditctl -a exit,always -F arch=b64 -S kill -k audit_kill

2. Wait for mysqld to be killed/shutdown by the signal.

3. Stop auditd logging signal calls again, the simplest is to restart it (if you added a rule in the configuration file, you will need to remove the rule first):

# service auditd restart

The log file (usually /var/log/audit.log) should now have an event similar to:

type=SYSCALL msg=audit(1450214919.813:148): arch=c000003e syscall=62 success=yes exit=0 a0=f60 a1=9 a2=7f736e706980 a3=0 items=0 ppid=3649 pid=3997 auid=500 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts3 ses=1 comm="mykill" exe="/opt/bin/mykill" subj=user_u:system_r:unconfined_t:s0 key="audit_kill"
type=OBJ_PID msg=audit(1450214919.813:148): opid=3936 oauid=500 ouid=102 oses=1 obj=user_u:system_r:mysqld_t:s0 ocomm="mysqld"

The important parts are:

General:

msg=audit(1450214919.813:148): the timestamp of the event. This is in epoch (time since 1 January 1970 at midnight UTC). You can e.g. use the FROM_UNIXTIME() function in MySQL to convert it into a normal date:

mysql> SELECT FROM_UNIXTIME(1450214919);
+---------------------------+
| FROM_UNIXTIME(1450214919) |
+---------------------------+
| 2015-12-16 08:28:39       |
+---------------------------+
1 row in set (0.05 sec)

type=SYSCALL

Information about the trigger of the syscall.

syscall=62: means it’s a signal (kill):

# ausyscall 62
kill

a1=9: means the signal is SIGKILL (for a SIGTERM signal, the value is 15).
comm=”mykill” exe=”/opt/bin/mykill”: is the process that send the signal – this is what you are interested in.
key=”audit_kill”: is the “-k audit_kill” option from the auditctl command. It just tells the event was triggered by the rule we added.

type=OBJ_PID

Information about the target of the syscall.

opid=3936: is the process id (as in what you see in top or the ps output) of the process receiving the signal.
ouid=102: the userid of the user executing the process (as in the id from /etc/passwd).
ocomm=”mysqld”: the name of the process.

So you need to look for an event with type=SYSCALL with a1=9 and key=”audit_kill” where the following object has ocomm=”mysqld”.

systemtap

systemtap requires a script specifying what should be monitored and what should be done with the information available. This makes it more complex to use, but also allows more much greater flexibility. An example script that will monitor SIGKILL and SIGTERM send to the mysqld process is:

#! /usr/bin/env stap
#
# This systemtap script will monitor for SIGKILL and SIGTERM signals send to
# a process named "mysqld".
#

probe signal.send {
  if (
    (sig_name == "SIGKILL" || sig_name == "SIGTERM")
    && pid_name == "mysqld"
  ) {
    printf("%10d   %-34s   %-10s   %5d   %-7s   %s\n",
      gettimeofday_s(), tz_ctime(gettimeofday_s()),
      pid_name, sig_pid, sig_name, execname());
  }
}

probe begin {
  printf("systemtap script started at: %s\n\n", tz_ctime(gettimeofday_s()));
  printf("%50s%-18s\n",
    "",  "Signaled Process");
  printf("%-10s   %-34s   %-10s   %5s   %-7s   %s\n",
    "Epoch", "Time of Signal", "Name", "PID", "Signal", "Signaling Process Name");
  printf("---------------------------------------------------------------");
  printf("---------------------------------------------------------------");
  printf("\n");
}

probe end {
  printf("\n");
}

Note: The above script is meant as example. Please test on a test system before using it in production.

Save the script to a file (the following assumes the file name is mysqld_kill_or_term.stp). The usage is:

# stap mysqld_kill_or_term.stp
systemtap script started at: Fri Dec 18 13:35:44 2015 AEDT

                                                  Signaled Process
Epoch        Time of Signal                       Name           PID   Signal    Signaling Process Name
------------------------------------------------------------------------------------------------------------------------------
1450406150   Fri Dec 18 13:35:50 2015 AEDT        mysqld       21578   SIGKILL   mykill
1450406161   Fri Dec 18 13:36:01 2015 AEDT        mysqld       21942   SIGKILL   mykill
1450406171   Fri Dec 18 13:36:11 2015 AEDT        mysqld       22045   SIGTERM   mykill
^C

猜你喜欢

转载自blog.csdn.net/CaspianSea/article/details/126497211