java application monitoring (4) - online troubleshooting routine

tags: java, troubleshooting, monitor


Summarized in one sentence: java online application problems such as high CPU, memory overflow, how to troubleshoot IO too high, this tells you in detail.

1 Introduction

After java application running on the line, inevitably there will be problems, on the whole issue will be divided into four categories:

  • (1) CPU-related issues
  • (2) memory-related issues
  • (3) disk IO and related issues
  • (4) business code issues.

To solve these problems, how online monitoring and troubleshooting, is a necessary skill java developers. The following java command-line tool in conjunction with the previously mentioned, this routine investigation of several issues described below.

2 CPU troubleshooting routines

And if the system becomes slow card, slow application response, the first thing to check is the CPU usage, CPU-intensive processes are generally too high, it is necessary to monitor the CPU occupancy, and java applications with the CPU is mainly related to the thread running, so specific to the java applications, we need to condition monitoring thread that corresponds to the command-line tool jstack. The summary of high CPU problem according to the following routines:

# (1) 查询CPU占用高的进程ID(PID)
top -c

# (2) 了解此进程的启动参数
ps -ef|grep  PID
或者
jinfo -flags PID

# (3) 打印线程堆栈信息并输出文件
jstack -l PID > PID.dump

# (4) 根据进程查找线程ID(TID)
top -H -p PID

# (5) 获取TID的16进制数
printf "%x\n" TID

# (6) 结合TID和线程堆栈信息文件查找问题
- 可以使用文本工具直接查看
- 可以使用 grep TID -A20 PID.dump 来查看
- 需要配合线程状态来检查

复制代码

About jstacktools and thread status to view the article "java application monitoring (3) - These command-line tools you have mastered it."

3 memory troubleshooting routines

Memory problem is OOM (out of memory) java applications occur during operation, and therefore need advice when java application startup, add several parameters, including -Xloggc:file -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=logs/heapdump.hprof -XX:ErrorFile=logs/java_error_%p.log. So that when oom occurs, you can come out from the dump file to analyze the causes of oom. Memory-related problems, including java command-line tool jmap, jstatand therefore memory OOM troubleshooting routines are as follows:

# (1)找到java应用进程(PID)
jps -lvm
或者
top -c

# (2)了解此进程启动参数(特别是-Xms,-Xmx等)
ps -ef|grep  PID
或者
jinfo -flags PID

# (3) 确认内存情况
jmap -heap PID

# (4) 查找占内存的大对象
jmap -histo:live PID 

# (5) dump出堆文件,以便使用工具分析
jmap -dump:file=./heap.hprof PID

# (6) 查看GC变化情况,如下每秒打印一次
jstat -gc PID 1000 

# (7) 结合日志文件出错信息及dump出来的堆文件分析OOM和GC情况
- 内存分配小,适当调整内存
- 对象被频繁创建,且不释放,优化代码
- young gc频率太高,查看-Xmn、-XX:SurvivorRatio等参数设置是否合理
复制代码

About OOM, official documents have instructions on the OOM ( https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html), it is divided into the following categories:

  • java.lang.OutOfMemoryError: Java heap spaceHeap memory usage has reached the -Xmxmaximum setting can not create a new object, simply by adjusting can be considered -Xmxto address parameters.
  • java.lang.OutOfMemoryError: GC Overhead limit exceededRepresenting GC has been performed and the java process running very slow, usually thrown, java heap space allocation can not be so small that the new data into the heap. Consider adjusting the size of the heap, if you want to close the output parameters can be used to shut down -XX:-UseGCOverheadLimit.
  • java.lang.OutOfMemoryError: Requested array size exceeds VM limit, Java application attempts to allocate the array is greater than the size of the heap, the heap size as 256M, 512M have to allocate the array, an error is reported. Consider adjust or modify the code heap size
  • java.lang.OutOfMemoryError: MetaspaceWhen the machine exceeds the amount of memory required MaxMetaSpaceSize Times when the class metadata, consider adjusting MaxMetaSpaceSize.
  • java.lang.OutOfMemoryError: request size bytes for reason. Out of swap space?This error will be reported when the allocation from the local heap and native heap of failure could be close to depletion, you need to check the log to deal with.
  • java.lang.OutOfMemoryError: Compressed class spaceNon-stack structure JVM, the class pointer is insufficient storage space, consider CompressedClassSpaceSizeadjusted.
  • java.lang.OutOfMemoryError: reason stack_trace_with_native_method, Native method area less than the JVM, the Java Native Interface (JNI) or native methods allocation failure is detected, the stack information needed to find the corresponding query.

4 disk IO and troubleshooting routines

Process applications running java, it involves generating a log of disk read and write and other operations, there may be a variety of problems, such as insufficient disk (log excessive output) ,, slow disk read and write IO, IO and so too often. In general, you can isolate the following routines:

# (1) 查看磁盘容量情况
df -h

# (2) 查看文件大小和目录大小
ls -l 或者直接ll
du -h --max-depth=1

# (3) 查看IO情况,找到IO读写频繁的进程PID
iotop -d 1 # 1秒打印一次
或者
iostat -d -x -k 1 #1秒打印一次

# (4) 使用stack打印线程堆栈信息,排查IO相关代码

# (5) 有时候若想测试磁盘的读写速度(特别是虚拟机),可以使用dd
# 示例:测数据卷挂载目录的纯写速度
dd if=/dev/zero of=/数据卷目录/test.iso bs=8k count=1000000

复制代码

5 troubleshooting routine business

Business problems, mainly related to the code logic level, mainly query log output, whether the correct method of logic, it is generally a routine investigation as follows:

# (1) 实时日志输出查询
tail -fn 100 log_file

# (2) 根据日志输出的关键字来定位问题
grep keyWord log_file # 关键字所在行
grep -C n keyWord log_file # 关键字所在前后n行

# (3) 日志文件使用可视化文本工具分析(notepad++,sublime,大文件查看如EmEditor)

# (4) 使用线上工具直接检测方法的参数、返回值,异常情况等等,如Btrace,arthas等。

复制代码

Question about java online diagnostic tools include Btrace and arthas, this series will follow the articles are introduced.

Summary 6

In this paper, the problem encountered in line java application divided into four categories, namely (1) CPU issues, (2) memory-related issues, (3) and a disk IO related issues, (4) service code issues. For a variety of problems, according to a certain routine, combined with the java command-line tools and online diagnostic tool that can be easily carried out the investigation to the java application. I hope this java developers to the appropriate help.

Reference material

Related Reading

Guess you like

Origin juejin.im/post/5d64023e5188255d51426fb0