A, jstack command
- Deadlock, the Deadlock (focus)
- Implementation, Runnable
- Waiting for resources, Waiting for condition Condition ON (focus)
- Waiting to obtain monitor, Waiting ON Monitor entry (focus)
- Pause, Suspended
- Waiting objects, the Object.wait () or TIMED_WAITING
- Blocked, Blocked (focus)
- Stop, Parked
Let's start with the first example start analyzing, then lists the different meanings and precautions thread state, and finally to add two instances.
Meanings are as follows:
- Deadlock: Deadlock thread, generally refers to calls between multiple threads into each other resource consumption, leading to waits case can not be released.
- Runnable: generally refers to the thread executing state, the thread takes up resources, a request is being processed, there may be being passed SQL database to perform, it is possible for a file operation, it is possible to convert data types.
- Waiting on condition: waiting for resources, or wait for the occurrence of certain conditions. To analyze the specific reasons to be combined with stacktrace.
- If the stack is a clear message application code, then it proves that the thread is waiting for resources. The case is usually read a large number of resources, and the resource using the resource lock, the thread enters a wait state, waiting to read resources.
- Or, Waiting for other threads.
- If you find a large number of threads are in Wait on condition, from the thread stack of view, the network is waiting to read and write, this could be a sign of a network bottleneck. Because network congestion causes the thread can not be performed.
- One case the network is very busy, almost all the bandwidth consumed, there are still large amounts of data waiting for the network to read and write;
- Another situation may also be the network is idle, but the routing and other issues, the package can not normally reach.
- Another common situation occurs Wait on condition that the thread sleep, waiting for sleep time to time, will wake up.
- Blocked: the thread blocked, refers to the process of the current thread execution, the resources needed to wait for a long time has not been able to get, the container thread manager identified as blocked, can be understood as a thread waiting for a resource timeouts.
- Waiting for monitor entry and in Object.wait (): Monitor in Java is the primary means to achieve mutually exclusive and collaboration between the threads, it can be seen as an object or Class of lock. Each object has also only one monitor. As can be seen in Figure 1 below, each of the Monitor at some point, it can only be a thread that owns the thread is "Active Thread", while the other thread is "Waiting Thread", respectively, in the two queues "Entry Set "and" wait Set "waiting inside. Thread state in the "Entry Set" is waiting for "Waiting for monitor entry", but the thread state in "Wait Set" is waiting "in Object.wait ()".
Figure 1 A Java Monitor
2) " Waiting for condition Condition ON " needs stack " Parking to the wait for <0x00000000acd84de8> ( A java.util.concurrent.SynchronousQueue $ TransferStack ) binds" look. First of all, this thread is definitely waiting for the occurrence of certain conditions, to wake up to themselves. Secondly, SynchronousQueue not a queue, but the mechanism of transfer of information between threads, when we put an element into the SynchronousQueue must have another thread is waiting to accept the transfer of tasks, so it is waiting for the conditions in this thread.
Comprehensive demonstration three: in Obejct.wait () and TIMED_WAITING
-
Line multiple threads in excess of 100% of the CPU, you can see by these threads are mainly jstack command garbage collection thread
-
Jstat command by GC monitoring the situation, you can see the number of Full GC is very large, and the number is increasing.
nid=0xa
, nid here means that the operating system thread id meaning.
root@8d36124607a0:/# jstat -gcutil 9 1000 10
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 0.00 0.00 75.07 59.09 59.60 3259 0.919 6517
7.715
8.635
0.00 0.00 0.00 0.08 59.09 59.60 3306 0.930 6611 7.822 8.752
0.00 0.00 0.00 0.08 59.09 59.60 3351 0.943 6701 7.924 8.867
0.00 0.00 0.00 0.08 59.09 59.60 3397 0.955 6793 8.029 8.984
After mat analysis tools , we basically can determine which object in memory is the main memory consuming then find the location of the object is created, can be processed. Here it is mainly up to PrintStream, but we can also see its memory consumption is only 12.2%. In other words, it is not enough to cause a lot of GC Full , then we need to consider another case, it is dependent on third-party code or package display of System.gc()
calls. In this case we see the memory dump file to get the judgment, as it will print GC reasons:
[Full GC (System.gc()) [Tenured: 262546K->262546K(349568K), 0.0014879 secs] 262546K->262546K(506816K), [Metaspace: 3109K->3109K(1056768K)], 0.0015151 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
[GC (Allocation Failure) [DefNew: 2795K->0K(157248K), 0.0001504 secs][Tenured: 262546K->402K(349568K), 0.0012949 secs] 265342K->402K(506816K), [Metaspace: 3109K->3109K(1056768K)], 0.0014699 secs] [Times: user=0.00
For example, here for the first time due to GC System.gc()
display call results, and the second is the JVM GC initiative launched. In summary, for the Full GC too many times, mainly in the following two reasons:
-
Code once acquired a large number of objects, causing memory overflow, then you can see the eclipse through the mat has more memory tool which objects;
-
Memory consumption is not high, but the number of Full GC, or more, may be displayed at this time of
System.gc()
call results in excessive GC number, which can be added-XX:+DisableExplicitGC
to disable response JVM to display the GC.
In this case, a typical example is that we often need to access an interface 2 ~ 3s to return. This is a more troublesome situation because, in general, it consumes much CPU, and memory occupied is not high, that is to say, we are unable to carry out the investigation to solve this problem through the above two methods. And because such interfaces are consuming larger problem that appears from time to time, which led us through the jstack
command even though the thread stack information has been accessed, we can not determine exactly which thread is executing time-consuming operation thread.
对于不定时出现的接口耗时比较严重的问题,我们的定位思路基本如下:首先找到该接口,通过压测工具不断加大访问力度,如果说该接口中有某个位置是比较耗时的,由于我们的访问的频率非常高,那么大多数的线程最终都将阻塞于该阻塞点,这样通过多个线程具有相同的堆栈日志,我们基本上就可以定位到该接口中比较耗时的代码的位置。如下是一个代码中有比较耗时的阻塞操作通过压测工具得到的线程堆栈日志:
"http-nio-8080-exec-2" #29 daemon prio=5 os_prio=31 tid=0x00007fd08cb26000 nid=0x9603 waiting on condition [0x00007000031d5000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:340) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386) at com.aibaobei.user.controller.UserController.detail(UserController.java:18) "http-nio-8080-exec-3" #30 daemon prio=5 os_prio=31 tid=0x00007fd08cb27000 nid=0x6203 waiting on condition [0x00007000032d8000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:340) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386) at com.aibaobei.user.controller.UserController.detail(UserController.java:18) "http-nio-8080-exec-4" #31 daemon prio=5 os_prio=31 tid=0x00007fd08d0fa000 nid=0x6403 waiting on condition [0x00007000033db000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:340) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386) at com.aibaobei.user.controller.UserController.detail(UserController.java:18)
从上面的日志可以看你出,这里有多个线程都阻塞在了UserController的第18行,说明这是一个阻塞点,也就是导致该接口比较缓慢的原因。
对于这种情况,这是比较罕见的一种情况,但是也是有可能出现的,而且由于其具有一定的“不可复现性”,因而我们在排查的时候是非常难以发现的。笔者曾经就遇到过类似的这种情况,具体的场景是,在使用CountDownLatch时,由于需要每一个并行的任务都执行完成之后才会唤醒主线程往下执行。而当时我们是通过CountDownLatch控制多个线程连接并导出用户的gmail邮箱数据,这其中有一个线程连接上了用户邮箱,但是连接被服务器挂起了,导致该线程一直在等待服务器的响应。最终导致我们的主线程和其余几个线程都处于WAITING状态。
对于这样的问题,查看过jstack日志的读者应该都知道,正常情况下,线上大多数线程都是处于 TIMED_WAITING
状态,而我们这里出问题的线程所处的状态与其是一模一样的,这就非常容易混淆我们的判断。解决这个问题的思路主要如下:
-
通过grep在jstack日志中找出所有的处于
TIMED_WAITING
状态的线程,将其导出到某个文件中,如a1.log,如下是一个导出的日志文件示例:
"Attach Listener" #13 daemon prio=9 os_prio=31 tid=0x00007fe690064000 nid=0xd07 waiting on condition [0x0000000000000000] "DestroyJavaVM" #12 prio=5 os_prio=31 tid=0x00007fe690066000 nid=0x2603 waiting on condition [0x0000000000000000] "Thread-0" #11 prio=5 os_prio=31 tid=0x00007fe690065000 nid=0x5a03 waiting on condition [0x0000700003ad4000] "C1 CompilerThread3" #9 daemon prio=9 os_prio=31 tid=0x00007fe68c00a000 nid=0xa903 waiting on condition [0x0000000000000000]
-
等待一段时间之后,比如10s,再次对jstack日志进行grep,将其导出到另一个文件,如a2.log,结果如下所示:
"DestroyJavaVM" #12 prio=5 os_prio=31 tid=0x00007fe690066000 nid=0x2603 waiting on condition [0x0000000000000000] "Thread-0" #11 prio=5 os_prio=31 tid=0x00007fe690065000 nid=0x5a03 waiting on condition [0x0000700003ad4000] "VM Periodic Task Thread" os_prio=31 tid=0x00007fe68d114000 nid=0xa803 waiting on condition
-
重复步骤2,待导出3~4个文件之后,我们对导出的文件进行对比,找出其中在这几个文件中一直都存在的用户线程,这个线程基本上就可以确认是包含了处于等待状态有问题的线程。因为正常的请求线程是不会在20~30s之后还是处于等待状态的。
-
经过排查得到这些线程之后,我们可以继续对其堆栈信息进行排查,如果该线程本身就应该处于等待状态,比如用户创建的线程池中处于空闲状态的线程,那么这种线程的堆栈信息中是不会包含用户自定义的类的。这些都可以排除掉,而剩下的线程基本上就可以确认是我们要找的有问题的线程。通过其堆栈信息,我们就可以得出具体是在哪个位置的代码导致该线程处于等待状态了
这里需要说明的是,我们在判断是否为用户线程时,可以通过线程最前面的线程名来判断,因为一般的框架的线程命名都是非常规范的,我们通过线程名就可以直接判断得出该线程是某些框架中的线程,这种线程基本上可以排除掉。而剩余的,比如上面的 Thread-0
,以及我们可以辨别的自定义线程名,这些都是我们需要排查的对象。
经过上面的方式进行排查之后,我们基本上就可以得出这里的 Thread-0
就是我们要找的线程,通过查看其堆栈信息,我们就可以得到具体是在哪个位置导致其处于等待状态了。如下示例中则是在SyncTask的第8行导致该线程进入等待了。
"Thread-0" #11 prio=5 os_prio=31 tid=0x00007f9de08c7000 nid=0x5603 waiting on condition [0x0000700001f89000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at com.aibaobei.chapter2.eg4.SyncTask.lambda$main$0(SyncTask.java:8) at com.aibaobei.chapter2.eg4.SyncTask$$Lambda$1/1791741888.run(Unknown Source) at java.lang.Thread.run(Thread.java:748)
5. 死锁
对于死锁,这种情况基本上很容易发现,因为 jstack
可以帮助我们检查死锁,并且在日志中打印具体的死锁线程信息。如下是一个产生死锁的一个 jstack
日志示例
It can be seen at the bottom of jstack log directly help us analyze what the deadlock, deadlock, and each thread stack information exists log. Here we have two user threads are waiting for each other to release the lock, which is blocked position is in line 5 ConnectTask, in which case we will be able to directly locate the position and code analysis to find the cause of deadlock the reason.