Java线上问题排障：Linux内核bug引发JVM死锁导致线程假死

Java本质上还是离不开操作系统，一来Java源码是用C/C++实现的，二来java进程还是需要依附于操作系统和硬件资源，有时候一些问题是操作系统级别导致的，下面的整个事件是源自一则真实的线上案例。

过程：

JVM死锁导致线程不可用，然后会瞬间起N个线程，当然起再多也是不可用的，因为需要的对象发生死锁，然后耗尽文件句柄导致外部请求也就是TCP连接无法建立产生拒绝服务，看起来就像线程假死了一样，不过巧合的是jstack之后就会恢复。

问题升级：

futex.c的bug->JVM死锁->起更多的线程->达到线程上限->新的请求无线程可以使用->拒绝服务

原因：

是Linux内核某个switch分支缺少memory barrier的正确处理，导致外部应用如JVM的lock被错误锁住；一般jstack连后就恢复，当然你线上不能老是这样是不是，必须彻底解决这个问题。

解决办法：

方法一：上层解决替换中间件类库，比如httpclient的（前提是你是由此触发的）。

方法二：下沉解决方案前面已经说了给Linux内核打patch或者升级内核到比较稳定的新版本。

内存屏障（英语：Memory barrier），也称内存栅栏，内存栅障，屏障指令等，是一类同步屏障指令，是CPU或编译器在对内存随机访问的操作中的一个同步点，使得此点之前的所有读写操作都执行后才可以开始执行此点之后的操作。大多数现代计算机为了提高性能而采取乱序执行，这使得内存屏障成为必须。

关于内存屏障参考：User-space RCU: Memory-barrier menagerie https://lwn.net/Articles/573436/

先看linux-2.6.33.1的代码\linux-2.6.33.1\linux-2.6.33.1\kernel\futex.c

然后再看Linus的修复记录：

https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0

很清楚的看到这个switch被加了default，以前是没有这个所以导致死锁的。

/*
 * Take a reference to the resource addressed by a key.
 * Can be called while holding spinlocks.
 *
 */
static void get_futex_key_refs(union futex_key *key)
{
	if (!key->both.ptr)
		return;

	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
	case FUT_OFF_INODE:
		ihold(key->shared.inode); /* implies MB (B) */
		break;
	case FUT_OFF_MMSHARED:
		futex_get_mm(key); /* implies MB (B) */
		break;
	default:
		smp_mb(); /* explicit MB (B) */
	}
}

v3.18版修复 ：

futex: Ensure get_futex_key_refs() always implies a barrier

Commit b0c29f7 (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616 (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.

However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).

Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.

Signed-off-by: Catalin Marinas <[email protected]>
Reported-by: Matteo Franchin <[email protected]>
Fixes: b0c29f7 (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Acked-by: Davidlohr Bueso <[email protected]>
Tested-by: Mike Galbraith <[email protected]>
Cc: <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>