jstack - Detect deadlocks, waits, CPU time

jstack

jstack is used to detect deadlocks

Tracing Java processes with jstack

Full analysis of virtual machine stack

The stack trace tool jstack of the Java virtual machine tool locates the infinite loop

 

 

 

dump log analysis tool

 "IBM Thread and Monitor Dump Analyzer for Java" download link: https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=2245aa39-fa5c-4475-b891-14c205f7333c

 

jstack is used to generate a thread snapshot of the current moment of the java virtual machine. A thread snapshot is a collection of method stacks that are being executed by each thread in the current Java virtual machine. The main purpose of generating a thread snapshot is to locate the cause of the thread's long pause, such as inter-thread deadlock, infinite loop, and long time caused by requesting external resources. Time to wait and so on. 

When the thread is paused, you can view the call stack of each thread through jstack, and you can know what the unresponsive thread is doing in the background, or what resources it is waiting for. If the java program crashes to generate the core file, the jstack tool can be used to obtain the information of the java stack and the native stack of the core file, so that you can easily know how the java program crashes and where the problem occurs in the program. In addition, the jstack tool can also be attached to the running java program to see the information of the java stack and the native stack of the running java program at that time. If the currently running java program is in a hung state, jstack is very useful.

   -F: Force output thread stack when normal output requests are not responded to

   -l: In addition to the stack, display additional information about the lock

   -m: If a native method is called, the C/C++ stack can be displayed

Command format: jstack [option] vmid

 

 

jstack命令的语法格式: jstack  <pid>。可以用jps查看java进程id。这里要注意的是:

1. 不同的 JAVA虚机的线程 DUMP的创建方法和文件格式是不一样的,不同的 JVM版本, dump信息也有差别。

2. 在实际运行中,往往一次 dump的信息,还不足以确认问题。建议产生三次 dump信息,如果每次 dump都指向同一个问题,我们才确定问题的典型性。

 

 

一:jstack Dump 日志文件中的线程状态

 

1:dump 文件里,值得关注的线程状态有

死锁, Deadlock(重点关注) 

执行中,Runnable   

等待资源, Waiting on condition(重点关注) 

等待获取监视器, Waiting on monitor entry(重点关注)

暂停,Suspended

对象等待中,Object.wait() 或 TIMED_WAITING

阻塞, Blocked(重点关注)  

停止,Parked

 

2:Dump文件中的线程状态含义及注意事项

 

Deadlock:死锁线程,一般指多个线程调用间,进入相互资源占用,导致一直等待无法释放的情况。

 

Runnable:一般指该线程正在执行状态中,该线程占用了资源,正在处理某个请求,有可能正在传递SQL到数据库执行,有可能在对某个文件操作,有可能进行数据类型等转换。

 

Waiting on condition:该状态出现在线程等待某个条件的发生。具体是什么原因,可以结合 stacktrace来分析。最常见的情况是线程在等待网络的读写,比如当网络数据没有准备好读时,线程处于这种等待状态,而一旦有数据准备好读之后,线程会重新激活,读取并处理数据。在 Java引入 NewIO之前,对于每个网络连接,都有一个对应的线程来处理网络的读写操作,即使没有可读写的数据,线程仍然阻塞在读写操作上,这样有可能造成资源浪费,而且给操作系统的线程调度也带来压力。在 NewIO里采用了新的机制,编写的服务器程序的性能和可扩展性都得到提高。

 

        如果发现有大量的线程都在处在 Wait on condition,从线程 stack看, 正等待网络读写,这可能是一个网络瓶颈的征兆。因为网络阻塞导致线程无法执行。一种情况是网络非常忙,几 乎消耗了所有的带宽,仍然有大量数据等待网络读 写;另一种情况也可能是网络空闲,但由于路由等问题,导致包无法正常的到达。所以要结合系统的一些性能观察工具来综合分析,比如 netstat统计单位时间的发送包的数目,如果很明显超过了所在网络带宽的限制 ; 观察 cpu的利用率,如果系统态的 CPU时间,相对于用户态的 CPU时间比例较高;如果程序运行在 Solaris 10平台上,可以用 dtrace工具看系统调用的情况,如果观察到 read/write的系统调用的次数或者运行时间遥遥领先;这些都指向由于网络带宽所限导致的网络瓶颈。另外一种出现 Wait on condition的常见情况是该线程在 sleep,等待 sleep的时间到了时候,将被唤醒。

 

locked:线程阻塞,是指当前线程执行过程中,所需要的资源长时间等待却一直未能获取到,被容器的线程管理器标识为阻塞状态,可以理解为等待资源超时的线程。

 

Waiting for monitor entry 和 in Object.wait():Monitor是 Java中用以实现线程之间的互斥与协作的主要手段,它可以看成是对象或者 Class的锁。每一个对象都有,也仅有一个 monitor。

 

 

二、死锁案例:

public class DeadThread implements Runnable{  
      
    private Object monitor_A = new Object();  
      
    private Object monitor_B = new Object();  
      
    public void  method_A(){  
         synchronized(monitor_A) {   
               synchronized(monitor_B) {   
                   System.out.println(Thread.currentThread().getName()+" invoke method A");  
               }                  
           }          
    }  
      
    public void  method_B(){  
         synchronized(monitor_B) {   
               synchronized(monitor_A) {   
                   System.out.println(Thread.currentThread().getName()+" invoke method B");  
               }                  
           }          
    }     
  
    public void run() {       
        for(int i=0;i<1;i--){               
            method_A();            
            method_B();                        
        }              
    }  
       
  public static void main(String[] args) {    
      DeadThread t1 = new DeadThread();    
       Thread ta = new Thread(t1, "A");    
       Thread tb = new Thread(t1, "B");    
    
       ta.start();    
       tb.start();   
  }   
}  

 结果:

"B" prio=10 tid=0x0898d000 nid=0x269a waiting for monitor entry [0x8baa2000]  
   java.lang.Thread.State: BLOCKED (on object monitor)  
    at org.marshal.DeadThread.method_A(DeadThread.java:11)  
    - waiting to lock <0xaa4d6f88> (a java.lang.Object)  
    - locked <0xaa4d6f80> (a java.lang.Object)  
    at org.marshal.DeadThread.run(DeadThread.java:28)  
    at java.lang.Thread.run(Thread.java:636)  
  
"A" prio=10 tid=0x0898b800 nid=0x2699 waiting for monitor entry [0x8baf3000]  
   java.lang.Thread.State: BLOCKED (on object monitor)  
    at org.marshal.DeadThread.method_B(DeadThread.java:19)  
    - waiting to lock <0xaa4d6f80> (a java.lang.Object)  
    - locked <0xaa4d6f88> (a java.lang.Object)  
    at org.marshal.DeadThread.run(DeadThread.java:29)  
    at java.lang.Thread.run(Thread.java:636)  
	
Found one Java-level deadlock:  
=============================  
"B":  
  waiting to lock monitor 0x089615d8 (object 0xaa4d6f88, a java.lang.Object),  
  which is held by "A"  
"A":  
  waiting to lock monitor 0x08962258 (object 0xaa4d6f80, a java.lang.Object),  
  which is held by "B"  
  
Java stack information for the threads listed above:  
===================================================  
"B":  
    at org.marshal.DeadThread.method_A(DeadThread.java:11)  
    - waiting to lock <0xaa4d6f88> (a java.lang.Object)  
    - locked <0xaa4d6f80> (a java.lang.Object)  
    at org.marshal.DeadThread.run(DeadThread.java:28)  
    at java.lang.Thread.run(Thread.java:636)  
"A":  
    at org.marshal.DeadThread.method_B(DeadThread.java:19)  
    - waiting to lock <0xaa4d6f80> (a java.lang.Object)  
    - locked <0xaa4d6f88> (a java.lang.Object)  
    at org.marshal.DeadThread.run(DeadThread.java:29)  
    at java.lang.Thread.run(Thread.java:636)  
  
Found 1 deadlock.  

 从这个结果文件我们一看到发现了一个死锁,具体是线程A在等待线程B,而线程B在等待线程A造成的,同时也记录了线程的堆栈和代码行数,通过这个堆栈和行数我们就可以去检查对应的代码块,从而发现问题和解决问题.

也可以发现在哪些地方发生阻塞,访问缓慢等,从而可以准确入手进行优化.

 

 

三、找出某个Java进程中最耗费CPU的Java线程并定位堆栈信息,用到的命令有ps、top、printf、jstack、grep

linux中查找java程序 cpu占用高的代码位置

java应用死循环排查方法或查找程序消耗资源的线程方法

 

具体流程:

来源:VM性能调优监控工具jps、jstack、jmap、jhat、jstat、hprof使用详解


 

第一步先找出Java进程ID,服务器上的Java应用名称为mrf-center:

root@ubuntu:/# ps -ef | grep mrf-center | grep -v grep

root     21711     1  1 14:47 pts/3    00:02:10 java -jar mrf-center.jar

得到进程ID为21711,第二步找出该进程内最耗费CPU的线程,可以使用

1)ps -Lfp pid

2)ps -mp pid -o THREAD, tid, time

3)top -Hp pid

用第三个,输出如下:

 

TIME列就是各个Java线程耗费的CPU时间,CPU时间最长的是线程ID为21742的线程,用

printf "%x\n" 21742

得到21742的十六进制值为54ee,下面会用到。

 

OK,下一步终于轮到jstack上场了,它用来输出进程21711的堆栈信息,然后根据线程ID的十六进制值grep,如下:

root@ubuntu:/# jstack 21711 | grep 54ee

"PollIntervalRetrySchedulerThread" prio=10 tid=0x00007f950043e000 nid=0x54ee in Object.wait()

可以看到CPU消耗在PollIntervalRetrySchedulerThread这个类的Object.wait(),我找了下我的代码,定位到下面的代码:

// Idle wait
getLog().info("Thread [" + getName() + "] is idle waiting...");
schedulerThreadState = PollTaskSchedulerThreadState.IdleWaiting;
long now = System.currentTimeMillis();
long waitTime = now + getIdleWaitTime();
long timeUntilContinue = waitTime - now;
synchronized(sigLock) {
  try {
    if(!halted.get()) {
      sigLock.wait(timeUntilContinue);
    }
  } 
  catch (InterruptedException ignore) {
  }
}

 

它是轮询任务的空闲等待代码,上面的sigLock.wait(timeUntilContinue)就对应了前面的Object.wait()。

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326178112&siteId=291194637