JVM jstack命令用法及相关知识小结

https://www.jianshu.com/p/99562e9391b5

至少学到了一个 print "%x" 100 的玩法

前言

很久没有写过与JVM相关的文章了。今天搬砖有点累，不太想啃源码，写一篇实用性比较强的吧。

在日常工作中，我们如果遇到JVM方面的问题，一般是采用各种现成的工具辅助定位解决，如VisualVM、JProfiler、Eclipse MAT、Arthas等。但是，我们也有必要了解JVM原生提供的那些命令行工具，本文就针对常用的jstack命令做个简单的总结，顺便聊一些与Java线程与并发相关的周边知识。今后有时间的话，也会将jmap、jhat、jstat等命令的用法总结出来。

jstack简介

jstack命令用来生成JVM中的线程快照（thread dump），其中包含有每个线程的方法调用栈以及其状态、锁信息等。其用法说明如下所示。

~ jstack -h
Usage:
    jstack [-l] <pid>
        (to connect to running process)
    jstack -F [-m] [-l] <pid>
        (to connect to a hung process)
    jstack [-m] [-l] <executable> <core>
        (to connect to a core file)
    jstack [-m] [-l] [server_id@]<remote server IP or hostname>
        (to connect to a remote debug server)

Options:
    -F  to force a thread dump. Use when jstack <pid> does not respond (process is hung)
    -m  to print both java and native frames (mixed mode)
    -l  long listing. Prints additional information about locks
    -h or -help to print this help message

说明一下三个参数的含义：

-F：如果正常执行jstack命令没有响应（比如进程hung住了），可以加上此参数强制执行thread dump。
-m：除了打印Java的方法调用栈之外，还会输出native方法的栈帧。
-l：打印与锁有关的附加信息。使用此参数会导致JVM停止时间变长，在生产环境需慎用。

jstack是在线程级别定位JVM问题的利器，但前提是得读懂thread dump，我们举例说明。

线程快照

下面是对一个正常运行的Spark Streaming作业执行jstack命令产生的线程快照的一部分。

~ jstack 18747
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode):

"SparkUI-1510250" #1510250 daemon prio=5 os_prio=0 tid=0x00007f9a6c00c800 nid=0x45cc waiting on condition [0x00007f9ce86e7000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c0420db8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at org.spark_project.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.spark_project.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563)
        at org.spark_project.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)
        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
        at java.lang.Thread.run(Thread.java:748)

"shuffle-server-6-7" #190 daemon prio=5 os_prio=0 tid=0x00007f9b44009000 nid=0x4d80 runnable [0x00007f9ce99f8000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked <0x00000000c58e6498> (a io.netty.channel.nio.SelectedSelectionKeySet)
        - locked <0x00000000c59c1528> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000000c59c1450> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:748)

"SparkUI-84-acceptor-3@3b331d23-ServerConnector@9826a7d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" #84 daemon prio=3 os_prio=0 tid=0x00007f9d7decc800 nid=0x4500 waiting for monitor entry [0x00007f9d112c8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)
        - waiting to lock <0x00000000c045e868> (a java.lang.Object)
        at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:373)
        at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:593)
        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
        at java.lang.Thread.run(Thread.java:748)

"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #25 daemon prio=5 os_prio=0 tid=0x00007f9d7d8da000 nid=0x44c2 in Object.wait() [0x00007f9d19dc0000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
        - locked <0x00000000c0031c98> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
        at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3213)
        at java.lang.Thread.run(Thread.java:748)

"RecurringTimer - JobGenerator" #120 daemon prio=5 os_prio=0 tid=0x00007f9b04045000 nid=0x4cc3 waiting on condition [0x00007f9d10cd4000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.spark.util.SystemClock.waitTillTime(Clock.scala:63)
        at org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:93)
        at org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106)
        at org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29)

在每个线程的快照的第一行，包含有线程名、是否为守护线程、优先级、线程ID等信息，第二行则是线程状态，下面就是方法调用栈了。下图是Java线程状态转换的示意，老生常谈。

jstack线程快照中的状态与图示相同，只是没有NEW状态而已。我们逐一进行分析，在分析之前，先放出Java管程对象ObjectMonitor的简图。看官也可以通过我之前写的这篇文章来了解管程。

RUNNABLE

线程正在运行。如果在其调用栈中看到locked <地址>的提示，表示该方法正持有锁，即该线程位于Owner区内。

BLOCKED

线程处于阻塞状态，即正在等待锁被其他线程释放。在其调用栈的栈顶方法会看到waiting to lock <地址>的提示，表示该方法试图持有锁，线程正在Entry Set区等待。

WAITING

线程处于无限等待的状态。又分为两种情况：

on object monitor：线程已经获得锁，调用了不带超时参数的Object.wait()/Thread.join()方法，线程进入管程的Wait Set区。在其调用栈中会看到locked <地址>的提示。
parking：调用了LockSupport.park()方法，线程直接进入挂起状态（park是Unsafe提供的低级原语）。在其调用栈的栈顶方法会看到parking to wait for <地址>的提示。

TIMED_WAITING

线程处于有限等待的状态。它分为三种情况，除了与WAITING相同的on object monitor（获得锁并调用带超时的Object.wait()/Thread.join()方法）和parking（调用带超时的LockSupport.parkNanos()/parkUntil()方法）之外，还有一种sleep，即通过Thread.sleep()使线程睡眠。

通过分析线程快照的状态和调用栈，可以让我们快速地定位造成Java程序表现异常的症结，如死锁、热锁（很多线程竞争同一块临界区造成大量BLOCKED）、高CPU占用、I/O长时间阻塞（注意此时线程状态可能是RUNNABLE）等。下面举两个具体的例子。

用jstack诊断死锁

死锁（deadlock）是操作系统理论中的基础概念，即在并发环境下，一个或多个线程在等待资源，但该资源又被其他进程所占用的困局。死锁的四个必要条件是：

互斥（mutual exclusion）
不可抢占（no preemption）
持有并等待（hold and wait）
循环等待（circular wait）

下面我们用Java造一个死锁，并用jstack来诊断它。

public class DeadlockExample {
  private static final Object lock1 = new Object(); private static final Object lock2 = new Object(); public static void main(String[] args) throws Exception { new Thread(() -> { for (int i = 0; i < 100; i++) { synchronized (lock1) { System.out.println("thread1 synchronized lock1"); synchronized (lock2) { System.out.println("thread1 synchronized lock2"); } } } }, "thread1").start(); new Thread(() -> { for (int i = 0; i < 100; i++) { synchronized (lock2) { System.out.println("thread2 synchronized lock2"); synchronized (lock1) { System.out.println("thread2 synchronized lock1"); } } } }, "thread2").start(); } }

运行之，发现只输出了几句就停止了。

thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1 thread2 synchronized lock2

用jstack打印线程快照，节选部分如下。

"thread2" #20 prio=5 os_prio=31 tid=0x00007fad74020800 nid=0x6203 waiting for monitor entry [0x0000700006364000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
    - waiting to lock <0x00000007157d2a58> (a java.lang.Object)
    - locked <0x00000007157d2a68> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

"thread1" #19 prio=5 os_prio=31 tid=0x00007fad7401c000 nid=0x9903 waiting for monitor entry [0x0000700006261000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
    - waiting to lock <0x00000007157d2a68> (a java.lang.Object)
    - locked <0x00000007157d2a58> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

Found one Java-level deadlock:
=============================
"thread2":
  waiting to lock monitor 0x00007fad65004168 (object 0x00000007157d2a58, a java.lang.Object),
  which is held by "thread1"
"thread1":
  waiting to lock monitor 0x00007fad650056b8 (object 0x00000007157d2a68, a java.lang.Object),
  which is held by "thread2"

Java stack information for the threads listed above:
===================================================
"thread2":
    at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
    - waiting to lock <0x00000007157d2a58> (a java.lang.Object)
    - locked <0x00000007157d2a68> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)
"thread1":
    at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
    - waiting to lock <0x00000007157d2a68> (a java.lang.Object)
    - locked <0x00000007157d2a58> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.

可见，我们不仅能够发现两个线程都处于BLOCKED状态，并且jstack还直接给出了死锁的详细信息，方便我们修改代码消除死锁。

用jstack诊断高CPU占用

接下来造一个有死循环的程序，模拟异常的CPU占用。

public class InfiniteLoopExample {
  private static final Object lock = new Object(); static class InfiniteLoopRunnable implements Runnable { @Override public void run() { synchronized (lock) { long l = 0; while (true) { l++; } } } } public static void main(String[] args) throws Exception { new Thread(new InfiniteLoopRunnable(), "thread1").start(); new Thread(new InfiniteLoopRunnable(), "thread2").start(); } }

运行该程序，用jps命令找出该Java进程的PID，然后利用top -Hp <PID>命令找出吃CPU最多的那个线程。

使用jstack导出线程快照到文件中。由于线程ID是十六进制表示的，所以我们要将线程ID转换成十六进制再grep。

这下我们就可以定位到异常代码的位置，并进行修改了。

[转帖]JVM jstack命令用法及相关知识小结