Development environment-the use of the magic key (sysRq)

SysRq related links (written in detail):
   https://www.ibm.com/developerworks/cn/linux/l-cn-sysrq/

 

What is SysRq?

    Have you encountered a situation where the server cannot log in via SSH, nor can it log in via a local terminal (tty)?
    In this case, have you done anything besides pressing the power or reset button?
    Have you ever thought that it is possible to recover from this situation?
    Have you thought about collecting more information to locate the cause of the system hang?
        --The above situation can be called "interruptible system hang".
    In other words, the system has stopped responding to most of the normal services for some reason, but the system can still respond to the keyboard key interrupt (which should actually be a serial port interrupt) request.
    
    Sysrq is called a "magic key combination" and it is a debugging tool built into the Linux kernel. You can press it and the kernel will respond.
    It is defined as a series of key combinations. It is called "magic" because it can complete a series of pre-defined system operations through key combinations when the system hangs and most services are no longer responding.
    As long as the kernel is not completely locked, no matter what the kernel is doing, you can use these key combinations to collect system operation information including system memory usage, CPU task processing, and process running status.
    Through it, not only can restart a suspended server while ensuring the safety of disk data, avoid data loss and long-term file system check after restart,
    but also collect system memory usage, CPU task processing, process running status, etc. System operation information may even restore a server that has stopped responding without restarting.

 

How to start SysRq?

    Need to configure "CONFIG_MAGIC_SYSRQ" in the "make menuconfig" of
    the kernel. After running the compiled SysRq kernel, you can control which functions are allowed to call the SysRq key in "/proc/sys/kernel/sysrq".
    The following is a list of possible values ​​for "/proc/sys/kernel/sysrq" (the original text is in Linux.xx/Documentation/sysrq.txt):

    Here is the list of possible values in /proc/sys/kernel/sysrq:
    0 - disable sysrq completely
    1 - enable all functions of sysrq
    >1 - bitmask of allowed sysrq functions (see below for detailed function description):
          2 =   0x2 - enable control of console logging level
          4 =   0x4 - enable control of keyboard (SAK, unraw)
          8 =   0x8 - enable debugging dumps of processes etc.
         16 =  0x10 - enable sync command
         32 =  0x20 - enable remount read-only
         64 =  0x40 - enable signalling of processes (term, kill, oom-kill)
        128 =  0x80 - allow reboot/poweroff
        256 = 0x100 - allow nicing of all RT tasks

    You can set the value in the SysRq file with the following command:

        echo "number" >/proc/sys/kernel/sysrq

    If you need to permanently enable or disable SysRqs, you can set it in /etc/sysctl.conf:

        kernel.sysrq = 1 (启用SsyRq)
        kernel.sysrq = 0 (禁用SysRq)

    kernel.sysrq also accepts enabling parameters other than 0 and 1. Please refer to the sysrq kernel documentation for details.

 

How to use SysRq?

    The method of using SysRq is different on different platforms. The original text is as follows:


On x86   - You press the key combo 'ALT-SysRq-<command key>'. Note - Some
           keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is
           also known as the 'Print Screen' key. Also some keyboards cannot
       handle so many keys being pressed at the same time, so you might
       have better luck with "press Alt", "press SysRq", "release SysRq",
       "press <command key>", release everything.

On SPARC - You press 'ALT-STOP-<command key>', I believe.

On the serial console (PC style standard serial ports only) -
           You send a BREAK, then within 5 seconds a command key. Sending
           BREAK twice is interpreted as a normal BREAK.

On PowerPC - Press 'ALT - Print Screen (or F13) - <command key>,
             Print Screen (or F13) - <command key> may suffice.

On other - If you know of the key combos for other architectures, please
           let me know so I can add them to this section.

On all -  write a character to /proc/sysrq-trigger.  e.g.:

        echo t > /proc/sysrq-trigger

 

sysrq-trigger node debugging

    Enable the CONFIG_MAGIC_SYSRQ option in the kernel configuration options, so that after the system starts, a /proc/sysrq-trigger node will be generated for debugging.
    Note that setting the value of "/proc/sys/kernel/sysrq" only affects calls through the keyboard.
    The "/proc/sysrq-trigger" node can call any operation (user with administrative privileges)
    part of the function:

        echo m > /proc/sysrq-trigger 导出内存分配信息
        echo t > /proc/sysrq-trigger 导出当前任务状态信息
        echo p > /proc/sysrq-trigger 导出当前CPU寄存器和标志位信息
        echo c > /proc/sysrq-trigger 产生空指针panic事件,人为导致系统崩溃
        echo s > /proc/sysrq-trigger 即时同步所有挂载的文件系统
        echo u > /proc/sysrq-trigger 即时重新挂载所有的文件系统为只读
        echo w > /proc/sysrq-trigger 转储处于uninterruptable阻塞状态的任务

 

What are the "commands" for magic keys?

    Magic keys use help:

    0-9 设定终端输出的内核 log 优先级
    b 立即重启系统
    c 内核live reboot,并输出错误信息
    d 显示所有排它锁(显示所有被持有的锁)
    e 向除 init 外进程发送 SIGTERM 信号,让其自行结束
    f 人为触发 OOM Killer (out of memory)
    g 当进入内核模式时,以 framebuttter 代替输出(kgdb(内核调试器)使用)
    h 输出帮助
    i 向除 init 以外所有进程发送 SIGKILL 信号,强制结束进程
    k 安全访问密钥(SAK)杀死当前虚拟控制台上的所有程序
    l 显示所有活动cpu的堆栈回溯。
    m 内存使用信息(将当前内存信息转储到您的控制台。)
    n 重置所有进程的 nice(优先级)
    o 关机
    p 输出cpu 寄存器信息
    q Display all active high-resolution timers and clock sources.
    r 把键盘设置为 ASCII 模式,使按键可以穿透 x server 捕捉传递给内核
    s 同步缓冲区数据到硬盘
    t 输出进程列表(将当前任务及其信息的列表转储到您的控制台。)
    u 重新挂载所有文件系统为只读模式
    v 输出 Voyager SMP 处理信息
    w 输出 block(d状态)进程列表

 

What can we do with SysRq?

    1. "k"-(secure access key) is very useful, when you want to ensure that no Trojan horse program is running on the console, waiting for an opportunity to grab your password

        When you try to log in. It will kill all programs in a given console,
        so you can be sure that the login prompt you see is actually from init, not a Trojan.
    2. "b"-Restarting when the system cannot be shut down (b) is good. But you should also synchronize(s) and umount(u) first.
    3. "s"-Synchronization(s) is great, when your system is locked, it allows you to synchronize your disk, which of course will reduce the chance of data loss and fscking.
        Please note that synchronization will not occur until "OK" and "Done" appear on the screen. (If the kernel is really in conflict, you may never get the OK or Done message...)
    4. "u"-Unmount (u) is basically as useful as sync(s). I usually sync(s), unmount (u), then restart (b) when my system locks up.
        It saves me a lot of operations. Similarly, before you see the "OK" and "Done" messages appear on the screen, unmounting (remount read-only) has not occurred.
    5. "0-9"-Log levels "0"-"9" are very useful when the console is flooded with undesirable kernel messages.
        Choosing "0" will prevent all messages except the most urgent kernel messages from reaching your console.
        (However, if syslod/klogd is active, they will still be logged.)
    6. "e", "i"-term(e) and kill(i) are useful if you have some kind of runaway process ,
        You can't use any other method to kill it, especially when it spawns other processes.
    7. "m"-Memory information (m) is very useful, when the system is stuck and you want to check the current system memory usage.
    8. "t"-Process list (t) can print all current tasks and their information. Used to find some abnormal processes.
    9. "p"-CPU register (p) is also very useful. When the system is stuck, you can find out where the current system pointer is by checking the register.
    10. "REISUB"-Restart the system safely. The recommended way to use this sequence of REISUB is:
        R – 1 second – E – 30 seconds – I – 10 seconds – S – 5 seconds – U – 5 seconds – B, instead of pressing these six keys in one go,
        just imagine it is normal once. The reboot command is not completed in an instant.
        R-Set the keyboard to ASCII mode
        E-Send SIGTERM signal
        to all processes except init I-Send SIGKILL signal to all processes except init
        S-Disk buffer synchronization
        U-Remount to read-only mode
        B-Restart the system immediately
    
    

I clicked SysRq, but nothing happened, why?

    Some keyboards generate key codes for SysRq that are different from the predefined value 99 (see KEY_SYSRQ in include/linux/input.h), or there is no SysRq key at all.
I want to add an event to the SysRq module, what should I do?
    I don’t know how to do it, because I haven’t done it...
    Check the information, the original text is as follows:


In order to register a basic function with the table, you must first include
the header 'include/linux/sysrq.h', this will define everything else you need.
Next, you must create a sysrq_key_op struct, and populate it with A) the key
handler function you will use, B) a help_msg string, that will print when SysRQ
prints help, and C) an action_msg string, that will print right before your
handler is called. Your handler must conform to the prototype in 'sysrq.h'.

After the sysrq_key_op is created, you can call the kernel function
register_sysrq_key(int key, struct sysrq_key_op *op_p); this will
register the operation pointed to by 'op_p' at table key 'key',
if that slot in the table is blank. At module unload time, you must call
the function unregister_sysrq_key(int key, struct sysrq_key_op *op_p), which
will remove the key op pointed to by 'op_p' from the key 'key', if and only if
it is currently registered in that slot. This is in case the slot has been
overwritten since you registered it.

The Magic SysRQ system works by registering key operations against a key op
lookup table, which is defined in 'drivers/tty/sysrq.c'. This key table has
a number of operations registered into it at compile time, but is mutable,
and 2 functions are exported for interface to it:
    register_sysrq_key and unregister_sysrq_key.
Of course, never ever leave an invalid pointer in the table. I.e., when
your module that called register_sysrq_key() exits, it must call
unregister_sysrq_key() to clean up the sysrq key table entry that it used.
Null pointers in the table are always safe. :)

If for some reason you feel the need to call the handle_sysrq function from
within a function called by handle_sysrq, you must be aware that you are in
a lock (you are also in an interrupt handler, which means don't sleep!), so
you must call __handle_sysrq_nolock instead.

 

When I click a SysRq key combination, only the header appears in the console?

    Sysrq output is controlled by the same console log level as all other console output.
    This means that if the kernel boots as "quiet" as is common on distribution kernels,
    then the output may not appear on the actual console, even if it appears in the dmesg buffer
    and can be passed through the dmesg command and / Consumer access of proc/kmsg. As a specific exception,
    the header line of the sysrq command is passed to all console consumers, as if the current log level is the largest.
    If only the header is sent, the kernel log level is almost certainly too low.
    If you need output on the console channel, you need to use alt-sysrq-8 or:
        echo 8> /proc/sysrq-trigger
    Remember, after triggering the sysrq command you are interested in, restore the log level to normal.

 

I have more questions, who can I ask?

    Don't ask me, ask them!
    Just ask them on the linux kernel mailing list:
        [email protected]

    
(Ps: "h"-View the help of using magic keys)

sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) 
memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) 
show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) 
poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) 
show-task-states(t) unmount(u) show-blocked-tasks(w)

(Ps: "m"-view current memory usage)

sysrq: SysRq : Show Memory
Mem-Info:
active_anon:34588 inactive_anon:1508 isolated_anon:0
 active_file:0 inactive_file:0 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:774 slab_unreclaimable:10682
 mapped:2297 shmem:33632 pagetables:218 bounce:0
 free:42392 free_pcp:88 free_cma:3934
Node 0 active_anon:138352kB inactive_anon:6032kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:9188kB dirty:0kB writeback:0kB shmem:134528kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? yes
Normal free:169568kB min:8192kB low:10240kB high:12288kB active_anon:138352kB inactive_anon:6032kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:393216kB managed:380740kB mlocked:0kB slab_reclaimable:3096kB slab_unreclaimable:42728kB kernel_stack:968kB pagetables:872kB bounce:0kB free_pcp:352kB local_pcp:128kB free_cma:15736kB
lowmem_reserve[]: 0 0 0
Normal: 4*4kB (MEC) 2*8kB (ME) 2*16kB (EC) 3*32kB (MEC) 1*64kB (C) 1*128kB (E) 3*256kB (MEC) 3*512kB (UME) 1*1024kB (C) 3*2048kB (UMC) 39*4096kB (MC) = 169568kB
33632 total pagecache pages
98304 pages RAM
0 pages HighMem/MovableOnly
3119 pages reserved
4096 pages cma reserved

(Ps: "l"-view the stack traceback of the current active cpu)

sysrq: SysRq : Show backtrace of all active CPUs
NMI backtrace for cpu 1
CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O    4.9.37 #396
Hardware name: Generic DT based system
[<c010f8b8>] (unwind_backtrace) from [<c010b4e0>] (show_stack+0x10/0x14)
[<c010b4e0>] (show_stack) from [<c03807bc>] (dump_stack+0x84/0x98)
[<c03807bc>] (dump_stack) from [<c0383f54>] (nmi_cpu_backtrace+0xc0/0xc4)
[<c0383f54>] (nmi_cpu_backtrace) from [<c038403c>] (nmi_trigger_cpumask_backtrace+0xe4/0x12c)
[<c038403c>] (nmi_trigger_cpumask_backtrace) from [<c03fca14>] (__handle_sysrq+0x120/0x174)
[<c03fca14>] (__handle_sysrq) from [<c040fd60>] (pl011_fifo_to_tty+0x19c/0x1f4)
[<c040fd60>] (pl011_fifo_to_tty) from [<c04106e4>] (pl011_int+0x254/0x44c)
[<c04106e4>] (pl011_int) from [<c015f7fc>] (__handle_irq_event_percpu+0x9c/0x124)
[<c015f7fc>] (__handle_irq_event_percpu) from [<c015f8a0>] (handle_irq_event_percpu+0x1c/0x58)
[<c015f8a0>] (handle_irq_event_percpu) from [<c015f920>] (handle_irq_event+0x44/0x68)
[<c015f920>] (handle_irq_event) from [<c0162d6c>] (handle_fasteoi_irq+0xb4/0x194)
[<c0162d6c>] (handle_fasteoi_irq) from [<c015eb28>] (generic_handle_irq+0x24/0x34)
[<c015eb28>] (generic_handle_irq) from [<c015f04c>] (__handle_domain_irq+0x5c/0xb4)
[<c015f04c>] (__handle_domain_irq) from [<c0101438>] (gic_handle_irq+0x48/0x8c)
[<c0101438>] (gic_handle_irq) from [<c010bfcc>] (__irq_svc+0x6c/0x90)
Exception stack(0xd6877f98 to 0xd6877fe0)
7f80:                                                       00000000 023343d6
7fa0: d6cd62e8 c0115200 d6876000 c0a02fe4 00000002 c0a0304c c0a0d44a 410fd034
7fc0: 00000000 00000000 00000000 d6877fe8 c0108680 c0108684 60000013 ffffffff
[<c010bfcc>] (__irq_svc) from [<c0108684>] (arch_cpu_idle+0x38/0x3c)
[<c0108684>] (arch_cpu_idle) from [<c01528dc>] (cpu_startup_entry+0xc8/0x13c)
[<c01528dc>] (cpu_startup_entry) from [<22101528>] (0x22101528)
Sending NMI from CPU 1 to CPUs 0:

(Ps: "p"-print out CPU register information)

sysrq: SysRq : Show Regs
CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O    4.9.37 #396
Hardware name: Generic DT based system
task: d6853440 task.stack: d6876000
PC is at arch_cpu_idle+0x38/0x3c
LR is at arch_cpu_idle+0x34/0x3c
pc : [<c0108684>]    lr : [<c0108680>]    psr: 60000013
sp : d6877fe8  ip : 00000000  fp : 00000000
r10: 00000000  r9 : 410fd034  r8 : c0a0d44a
r7 : c0a0304c  r6 : 00000002  r5 : c0a02fe4  r4 : d6876000
r3 : c0115200  r2 : d6cd62e8  r1 : 023348a2  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5383d  Table: 230fc06a  DAC: 00000051
CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O    4.9.37 #396
Hardware name: Generic DT based system
[<c010f8b8>] (unwind_backtrace) from [<c010b4e0>] (show_stack+0x10/0x14)
[<c010b4e0>] (show_stack) from [<c03807bc>] (dump_stack+0x84/0x98)
[<c03807bc>] (dump_stack) from [<c03fca14>] (__handle_sysrq+0x120/0x174)
[<c03fca14>] (__handle_sysrq) from [<c040fd60>] (pl011_fifo_to_tty+0x19c/0x1f4)
[<c040fd60>] (pl011_fifo_to_tty) from [<c04106e4>] (pl011_int+0x254/0x44c)
[<c04106e4>] (pl011_int) from [<c015f7fc>] (__handle_irq_event_percpu+0x9c/0x124)
[<c015f7fc>] (__handle_irq_event_percpu) from [<c015f8a0>] (handle_irq_event_percpu+0x1c/0x58)
[<c015f8a0>] (handle_irq_event_percpu) from [<c015f920>] (handle_irq_event+0x44/0x68)
[<c015f920>] (handle_irq_event) from [<c0162d6c>] (handle_fasteoi_irq+0xb4/0x194)
[<c0162d6c>] (handle_fasteoi_irq) from [<c015eb28>] (generic_handle_irq+0x24/0x34)
[<c015eb28>] (generic_handle_irq) from [<c015f04c>] (__handle_domain_irq+0x5c/0xb4)
[<c015f04c>] (__handle_domain_irq) from [<c0101438>] (gic_handle_irq+0x48/0x8c)
[<c0101438>] (gic_handle_irq) from [<c010bfcc>] (__irq_svc+0x6c/0x90)
Exception stack(0xd6877f98 to 0xd6877fe0)
7f80:                                                       00000000 023348a2
7fa0: d6cd62e8 c0115200 d6876000 c0a02fe4 00000002 c0a0304c c0a0d44a 410fd034
7fc0: 00000000 00000000 00000000 d6877fe8 c0108680 c0108684 60000013 ffffffff
[<c010bfcc>] (__irq_svc) from [<c0108684>] (arch_cpu_idle+0x38/0x3c)
[<c0108684>] (arch_cpu_idle) from [<c01528dc>] (cpu_startup_entry+0xc8/0x13c)
[<c01528dc>] (cpu_startup_entry) from [<22101528>] (0x22101528)

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/Ivan804638781/article/details/97921188