Finding the source of signals on Linux with strace, auditd, or systemtap

inux and UNIX® like operating systems commonly use signals to communicate between processes. The use of the command line kill is widely known. WebSphere Application Servers on Linux and UNIX by default respond to kill -3 by producing a javacore, and to kill -11 by creating s system core and exiting. There are in fact a lot of signals that may be sent and acted on.

In some cases, we determine that a signal has unexpectedly come to a WebSphere Application Server and we need to determine which process/user sent the signal. This is possible in most cases with strace command for kill -3, but kill -9 and kill -11 are not usually reported.

The strace utility is fairly universal and starting it with this line will generally find the source of kill -3 and so on:

strace -tt -o /tmp/traceit -p <pid> &

 

This results in volumes of output that do include the source of most signals:

strace -tt -o /tmp/traceit -p <pid> &
16:08:45.388961 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
16:08:45.389113 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=21398, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---

 

16:09:01.210200 --- SIGTTOU {si_signo=SIGTTOU, si_code=SI_USER, si_pid=829, si_uid=1000} ---

In case you do not recognize SIGTTOU   use kill -l to list signals on your environment:

 kill -l

 1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP

 6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1

11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM

16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP

21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ

26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR

31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3

38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8

43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13

48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12

53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7

58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2

63) SIGRTMAX-1 64) SIGRTMAX

 

which may be a surprise if you never used anything but 3, 9, and 11.   kill -22 is SIGTHOU and the process id and userid of the sender are listed. Unfortunately, most of the time strace does not show kill -9 and kill -11 as they are not trapped and all you get is this line:

++++  killed by SIGKILL  +++

 

There are 2 available tools that are not usually installed and/or active on Linux but have so much functionality, they should be. These tools are included in the Linux repositories for the RHEL, SUSE, and Fedora distributions and are installed as any other software package would be using the usual Linux install tools. Since they are very functional at the system level, root or elevated access rights are needed. However, the install process is quite simple and the functionality is worthwhile.

 

AUDIT

Auditd is a daemon process or service that does as the name implies and produces audit logs of System level activities. It is installed from the usual repository as the audit package and then is configured in /etc/audit/auditd.conf and the rules are in /etc/audit/audit.rules.

Example entry for kill signal logging:

-a entry,always -F arch=b64 -S kill -k kill_signals

then the command: sevice auditd start

will log all signals in /ver/audit/audit.log with a key of kill_signals for searching by your favorite editor or you may use ausearch -k kill_signals

Of course, this example captures all signals and is quite verbose. The usual output will look like this:

time->Wed Jun  3 16:34:08 2015
type=SYSCALL msg=audit(1433363648.091:6342): arch=c000003e syscall=62 success=no exit=-3 a0=1e06 a1=0 a2=1e06 a3=fffffffffffffff0 items=0 ppid=10044 pid=10140 auid=500 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm=4174746163682041504920696E6974 exe="/opt/ibm/WebSphere/AppServer/java/jre/bin/java" subj=unconfined_u:unconfined_r:unconfined_java_t:s0-s0:c0.c1023 key="kill_signals"
----
time->Wed Jun  3 16:34:08 2015
type=OBJ_PID msg=audit(1433363648.130:6343): opid=27307 oauid=-1 ouid=0 oses=-1 obj=system_u:system_r:initrc_t:s0 ocomm="symcfgd"
type=SYSCALL msg=audit(1433363648.130:6343): arch=c000003e syscall=62 success=yes exit=0 a0=6aab a1=12 a2=f a3=50d items=0 ppid=1 pid=27214 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="sav-limitcpu" exe="/usr/bin/sav-limitcpu" subj=system_u:system_r:initrc_t:s0 key="kill_signals"
----
time->Wed Jun  3 16:34:08 2015

 

Stop the logging with service auditd stop command and see this link from RedHat for more information: How to use audit to monitor a specific SYSCALL

 

System Tap

This tool is relatively more complex and flexible than the audit tool. The tool provide probe and taps that are written in a script that is remarkably C like. It is similar to Dtrace on Solaris in that regard. It is also similar to Dtrace in that it offers a lot of probes to look at performance and memory as well as network activity. It too is easily installed (for example on RHEL yum install systemtap does it). Root access does seem to be required. Good news, it comes with a set of taps that will perform a comprehesive set of tracing. These live in /usr/share/systemtap. Root access is required or you may be a member of a group with the privileges.

The basic command:

  stap sigkill.stp gets very verbose

even on lab systems while the same script can be filtered. An example to trace kill commands for a specific pid and a specific command:

stap sigkill.stp -x <pid> SIGKILL

which logs:

SIGKILL was sent to java (pid:<pid>) by bash uid:0

on testing on a command sent from the command line.

 

So you do need the script sigkill.stp which is created by RedHat and looks like this:

#! /usr/bin/env stap
# sigkill.stp
# Copyright (C) 2007 Red Hat, Inc., Eugene Teo <eteo@redhat.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
#
# /usr/share/systemtap/tapset/signal.stp:
# [...]
# probe signal.send = _signal.send.*
# {
#     sig=$sig
#     sig_name = _signal_name($sig)
#     sig_pid = task_pid(task)
#     pid_name = task_execname(task)
# [...]
probe signal.send {
  if (sig_name == "SIGKILL")
    printf("%s was sent to %s (pid:%d) by %s uid:%d
",
           sig_name, pid_name, sig_pid, execname(), uid())
}

 

Here is a very useful link for System Tap. It shows some useful tools for tracking down most signals (strace) or all of them (audit and system tap):
Red Hat Enterprise Linux 6 SystemTap Beginners Guide Introduction to SystemTap

 

https://www.ibm.com/developerworks/community/blogs/aimsupport/entry/Finding_the_source_of_signals_on_Linux_with_strace_auditd_or_Systemtap?lang=en

原文地址:https://www.cnblogs.com/DataArt/p/10176473.html