Analysis of an endless loop of user mode process

1. Problem phenomenon

The business process (user mode multi-threaded program) hangs, the operating system is unresponsive, and the system log has no abnormalities. From the kernel mode stack of the process, it seems that all threads are stuck in the following stack flow of the kernel mode:
[root@vmc116 ~]# cat /proc/27007/task/11825/stack
[] retint_careful+0x14/0x32
[ ] 0xffffffffffffffff

2, the problem points analysis

2.1, kernel stack analysis

From the perspective of the kernel stack, all processes are blocked on retint_careful. This is the process of interrupt return. The code (assembly) is as follows:
entry_64.S:

ret_from_intr:
    DISABLE_INTERRUPTS(CLBR_NONE)
    TRACE_IRQS_OFF
    decl PER_CPU_VAR(irq_count)

    /* Restore saved previous stack */
    popq %rsi
    CFI_DEF_CFA rsi,SS+8-RBP    /* reg/off reset after def_cfa_expr */
    leaq ARGOFFSET-RBP(%rsi), %rsp
    CFI_DEF_CFA_REGISTER    rsp
    CFI_ADJUST_CFA_OFFSET    RBP-ARGOFFSET
...
retint_careful://中断返回
    CFI_RESTORE_STATE
    bt    $TIF_NEED_RESCHED,%edx
    jnc   retint_signal
    TRACE_IRQS_ON
    ENABLE_INTERRUPTS(CLBR_NONE)
    pushq_cfi %rdi
    SCHEDULE_USER//调度点
    popq_cfi %rdi
    GET_THREAD_INFO(%rcx)
    DISABLE_INTERRUPTS(CLBR_NONE)
    TRACE_IRQS_OFF
    jmp retint_check

This is actually the process of returning from the interrupt after the user mode process is interrupted by the user mode, combined with retint_careful+0x14/0x32, and disassembled, you can confirm that the blocking point is actually in SCHEDULE_USER,
which is actually calling schedule() for scheduling , That is to say, when the process goes to the interrupt return flow, it is found that it needs to be scheduled (TIF_NEED_RESCHED is set), so scheduling occurs here.
There is a question: Why can't I see the stack frame of schedule() level in the stack?
Because this is directly called by assembly, there is no related stack frame push and context preservation operations.

2.2. Perform status information analysis

From the results of the top command, the relevant threads are actually in the R state, the CPU is almost completely exhausted, and most of them are consumed in the user mode:

[root@vmc116 ~]# top
top - 09:42:23 up 16 days,  2:21, 23 users,  load average: 84.08, 84.30, 83.62
Tasks: 1037 total,  85 running, 952 sleeping,   0 stopped,   0 zombie
Cpu(s): 97.6%us,  2.2%sy,  0.2%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32878852k total, 32315464k used,   563388k free,   374152k buffers
Swap: 35110904k total,    38644k used, 35072260k free, 28852536k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                     
27074 root      20   0 5316m 163m  14m R 10.2  0.5 321:06.17 z_itask_templat                                                                                                                
27084 root      20   0 5316m 163m  14m R 10.2  0.5 296:23.37 z_itask_templat                                                                                                                   
27085 root      20   0 5316m 163m  14m R 10.2  0.5 337:57.26 z_itask_templat                                                                                                                   
27095 root      20   0 5316m 163m  14m R 10.2  0.5 327:31.93 z_itask_templat                                                                                                                   
27102 root      20   0 5316m 163m  14m R 10.2  0.5 306:49.44 z_itask_templat                                                                                                                   
27113 root      20   0 5316m 163m  14m R 10.2  0.5 310:47.41 z_itask_templat                                                                                                                   
25730 root      20   0 5316m 163m  14m R 10.2  0.5 283:03.37 z_itask_templat                                                                                                                   
30069 root      20   0 5316m 163m  14m R 10.2  0.5 283:49.67 z_itask_templat                                                                                                                   
13938 root      20   0 5316m 163m  14m R 10.2  0.5 261:24.46 z_itask_templat                                                                                                                   
16326 root      20   0 5316m 163m  14m R 10.2  0.5 150:24.53 z_itask_templat                                                                                                                   
 6795 root      20   0 5316m 163m  14m R 10.2  0.5 100:26.77 z_itask_templat                                                                                                                   
27063 root      20   0 5316m 163m  14m R  9.9  0.5 337:18.77 z_itask_templat                                                                                                                   
27065 root      20   0 5316m 163m  14m R  9.9  0.5 314:24.17 z_itask_templat                                                                                                                   
27068 root      20   0 5316m 163m  14m R  9.9  0.5 336:32.78 z_itask_templat                                                                                                                   
27069 root      20   0 5316m 163m  14m R  9.9  0.5 338:55.08 z_itask_templat                                                                                                                   
27072 root      20   0 5316m 163m  14m R  9.9  0.5 306:46.08 z_itask_templat                                                                                                                   
27075 root      20   0 5316m 163m  14m R  9.9  0.5 316:49.51 z_itask_templat                                                                                                                   
...

2.3, process scheduling information

From the scheduling information of related threads:
[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat
15681811525768 129628804592612 3557465
[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat
15682016493013 129630684625241 3557509
[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat
15682843570331 129638127548315 3557686
[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat
15683323640217 129642447477861 3557793
[root@vmc116 ~]# cat /proc/27007/task/11825/schedstat
15683698477621 129645817640726 3557875
found that the scheduling statistics of related threads have been increasing, indicating that the related threads have been scheduled to run, combined with its status has always been R, speculated that it is likely to occur in user mode Dead loop (or non-sleep deadlock).

There is another question here : Why is the CPU occupancy rate of each thread only about 10% from the top, instead of the 100% occupancy rate caused by the usual endless loop process?
Because there are many threads and their priorities are all the same, according to the CFS scheduling algorithm, the time slice will be evenly distributed, and one thread will not monopolize the CPU. The result is that multiple threads are scheduled in turn, consuming all the cpu. .
Another question: Why does the kernel not detect softlockup in this case?
Because the priority of the business process is not high, it will not affect the scheduling of the watchdog kernel thread (the highest priority real-time thread), so there will be no softlockup.
Another question: Why is it always blocked in retint_careful every time I view the thread stack and not elsewhere?
Because this (when the interrupt returns) is the timing of scheduling, scheduling cannot occur at other points in time (regardless of other circumstances~), and we check the behavior of the thread stack, which must also depend on the process scheduling, so we check every time When stacking, it is the time when the process (cat command) viewing the stack is scheduled, and it is when the interrupt returns, so the blocking point just seen is retint_careful.

2.4, user attitude analysis

From the above analysis, it is speculated that a deadlock occurred in the user mode (the deadlock cannot occur here, if a deadlock occurs, there must be a process that is locked in the kernel mode ). The mutex kernel stack is as follows:

Detaching from program: /sbc/wq/some_func/share_mem_mutex, process 29758
[root@localhost ~]# cat /proc/29758/stack
[<ffffffff810a4179>] futex_wait_queue_me+0xb9/0xf0
[<ffffffff810a53a8>] futex_wait+0x1f8/0x390
[<ffffffff810a6c11>] do_futex+0x121/0xb10
[<ffffffff810a767b>] sys_futex+0x7b/0x170
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

At this point, the user mode can be used to confirm the method:
deploy the debug information, then gdb attach the relevant process, confirm the stack, and combine the code logic analysis.
It is finally confirmed that the problem is indeed an endless loop in the user mode process.

Guess you like

Origin blog.csdn.net/wangquan1992/article/details/108508701