[Turn] 90% of people will experience performance problems, how to use a line of code to quickly locate

90% of people will experience performance problems, how to quickly locate with one line of code?

Ali sister REVIEW: In the "performance optimization how to answer questions in order to impress the interviewer Ali? " , The distribution is to introduce the common point of application performance bottlenecks, and how initial impression of several indicators of whether there is an exception.

 

Today, Chai-kwong will be based on a number of indicators previously listed, give some common tuning analysis of ideas, namely: how many abnormal performance indicators to identify the core of that one, and then locate performance bottlenecks point, the final performance Tuning. The entire article will be in accordance with the code, CPU, memory, network, disk and so the direction of the organization, for each optimized for a particular point, the system will have a "routine" summary for easy migration practice ideas.

 

1. Code relevant
experience performance problems, first of all should do is to check whether the code associated with the business - not to solve the problem by reading the code, but the code or by logging, exclude some low-level errors associated with the business code. Performance optimization best position is internal applications.
For example, to view the business log, check whether the contents of the log in a lot of error is generated, the application layer, layer frame some performance issues, most can be found clues from logs (log level set unreasonable, leading to crazy play online journal) ; Furthermore, some of the common problems are mainly checked logic code, such as the for loop irrational use, NPE, regular expressions, mathematical calculations, etc., can fix the problem by simply modifying the code.
Do not put hundreds of performance optimization and caching, asynchronous technology, JVM tune excellent terms linked to complex issues simple solution might be, it is still valid in the field of performance optimization ri "28 principles」 . Of course, to understand some basic "code is used to step on pit", we can speed up the process of thinking of problem analysis, analysis from the CPU, memory, JVM, etc. to optimize the thinking of some of the bottleneck point, it may also be reflected in the code here.
Here are some high frequency, the coding elements likely to cause performance problems.
1) Regular expressions consuming CPU (such as a greedy may cause back), caution split string (), the replaceAll () method and the like; Regular Expressions certain precompiled.
2) String.intern () on the low version (Java 1.6 and earlier) use the JDK may cause the method area (permanent generation) memory overflow. In the high version of the JDK, if the string too much string pool is set too small cache will cause a large performance overhead.
3) output when the exception log, and if the stack information is clear, you can cancel the detailed output stack exception stack structure is costly. Note: Repeat the same location throw a lot of information on the stack after the JIT will optimize it into a direct throw a pre-compiled, matching exception type, exception stack will not see the information.
4) Avoid unnecessary reference to the entry box and type operation between basic type, please try to be consistent, automatic packing happen too often, can be very serious drain on performance.
5) Select Stream API's. Complex and parallel operations, recommended Stream API, the code can be simplified, while playing to play the advantages of multi-core CPU, or the CPU if it is simple mononuclear recommended explicit iteration.
6) According to the business scene, created by ThreadPoolExecutor hand thread pool, combining different specifying the number of threads and queue size of the task, and to avoid the risk of resource depletion, named after the reunification of the thread also facilitate follow-up troubleshooting.
7) According to the business scene, a reasonable choice concurrent containers. If you choose to Map the type of vessel, if there is strong consistency of data requirements, use Hashtable or 'Map + Lock "; read much larger than writing, using CopyOnWriteArrayList; access to a small amount of data, there is no strong consistency of data requirements, do not change frequently, the use of ConcurrentHashMap; access data is large, frequent reading and writing, no strong data consistency requirements, use ConcurrentSkipListMap.
8) Lock optimization ideas are: to reduce the lock granularity, the lock using coarse cycle, reducing the holding time of the lock (read-write lock selection) and the like. At the same time, also consider the number of concurrent optimization JDK classes, such as statistical consistency less demanding scenario using alternative AtomicLong LongAdder counted using alternative Random ThreadLocalRandom the like.
Optimized code layer addition to the above, there are many not to list. We can observe that, in these points, there are some common ideas of optimization , can be extracted, such as:
  1. Space for time: the use of memory or disk, in exchange for more valuable CPU or network, such as the use of the cache;
  2. Time for space: by sacrificing part of the CPU, memory or save network resources, such as the transmission of a large network into a plurality of times;
  3. Others, such as parallelism, Asynchronized, cell technology and the like.
2. Related CPU
earlier mentioned before, we should be concerned about CPU load, high CPU utilization is generally not a problem, the system calculates the CPU load is to determine whether the resource key to a healthy basis.
2.1 CPU utilization high && high average load
this situation is common in CPU-intensive applications, the large number of threads in a runnable state, I / O little common consume a lot of CPU resources application scenarios are:
  1. Regular operation
  2. computation
  3. Serialization / deserialization
  4. Reflex actions
  5. Infinite loop or unreasonably large number of cycles
  6. Basis / third-party component defects
General idea troubleshoot high CPU usage: multiple (> 5) print thread stack, it can generally be positioned to consume more CPU thread stack by jstack. Profiling or by way of (or event-based sampling Buried), applied on-CPU in FIG flame over time, can quickly locate the problem.
There is also a possible, at this time there is frequent use of GC (including Young GC, Old GC, Full GC ), which can also cause CPU utilization and load are increased. Troubleshooting ideas: Use jstat -gcutil GC count the number of continuous output current and the time of application. GC lead to frequent load increases, generally accompanied by a lack of available memory, available free or top commands to view the available memory size of the current machine.
CPU utilization is too high, whether there may be a performance bottleneck CPU itself caused it? There are also possible. You can see more details of CPU utilization by vmstat. User mode CPU utilization (us) high, indicating that the user mode process consumes more CPU, if this value is greater than 50% of the long-term, it should focus on troubleshooting application performance problems itself. Kernel mode CPU utilization (sy) high, indicating that the kernel mode takes up more CPU, so it should focus on troubleshooting performance problems or system calls kernel threads. If us + sy value is greater than 80%, indicating that CPU may be insufficient.

2.2 CPU utilization is low && high average load
If CPU utilization is not high, indicating that our application was not busy computing, but do other things. CPU utilization is low and the high average load, is common in I / O-intensive process, it is very easy to understand, after all, is the average load status process R and D state of the process and to get rid of the first, left with D status the process (D causes generally because state waiting for I / O, such as disk I / O, network I / O, etc.).
Troubleshooting && verify ideas: the use vmstat 1 output timing of system resources, observations% wa (iowait) column, the column identifies the disk I / O wait time percentage of CPU time slice, if the value exceeds 30%, indicating that disk I / O wait serious, this could be a lot of disk random access or direct disk access (do not use the system cache) caused, there may be a bottleneck disk itself, can be combined with the output of iostat or dstat of verified, such as% wa ( iowait) also observed a large increase in disk read request, indicating that the problem may be caused by disk read.
Furthermore, lengthy network requests (i.e., network I / O) will lead to increased average CPU load, such as MySQL slow query, using the interface RPC interface to obtain data. Usually this situation requires investigation of combined application dependencies on itself downstream and intermediate buried trace log points were analyzed.
2.3 CPU context switching frequency becomes high
first check with system vmstat context switches, then the observation process pidstat voluntary context switch (cswch) and involuntary context switches (nvcswch) case. Voluntary context switch, because the application of the internal thread state conversion occurs due to, for example, call sleep (), join (), wait () method and the like, or using synchronized or locked Lock structure; involuntary context switch, because the thread is due to allocated time slice is exhausted or is caused due to the execution priority scheduler schedules.
If a high number of voluntary context switches, CPU means that there is a resource acquisition wait, say, less than I / O, memory and other system resources. If involuntary context switching times higher, because of possible excessive number of threads within an application, leading to intense competition CPU time slice, the system is forced to frequently scheduled at this time may be combined jstack and thread count of the number distribution state be corroborated.

3. related memory
mentioned earlier, the memory is divided into system memory and process memory (including Java application process), the general memory problems we encountered, the vast majority will fall on the process memory, system resource bottlenecks caused accounted for relatively small. For Java process, it comes with memory management automation solution to two problems: how to allocate memory for objects and how to reclaim the memory allocated to the object, which is the core of garbage collection mechanism.
Although garbage collection can effectively prevent memory leaks to ensure the effective use of memory, but it is not a panacea, irrational parameter configuration and code logic, still bring a range of memory problems. In addition, early garbage collector, and in functional recovery efficiency is not very good, very much dependent on the GC tuning parameters experienced developer. For example, the maximum heap memory is not set up properly, may lead to a heap overflow or heap shock and other issues.
Analysis of ideas Let's look at a few common memory problems.
3.1 insufficient system memory
Java applications typically have a single memory or water level monitoring cluster, if stand-alone memory utilization greater than 95%, or cluster of memory utilization greater than 80%, it means that memory might be a potential problem (Note: here memory level system memory).
In addition to some of the more extreme cases, the general system memory, the high probability is caused by Java applications. When using the top command, we can see the actual memory footprint of Java application process, which represents a permanent memory using RES process, VIRT shows the relationship between virtual memory usage process, the memory size is: VIRT> RES> Java applications actually used heap size. In addition to heap memory, Java process overall memory footprint, as well as the method area / element space, JIT cache and other major components as follows:
Java application memory footprint = Heap (heap) + Code Cache (code cache area) + Metaspace (element space) + Symbol tables (symbol table) + Thread stacks (thread stack area) + Direct buffers (heap memory outside) + JVM structures (occupied by some other JVM itself) + mapped files (memory mapped file) + native libraries (local library) + ...
memory footprint of Java process, you can use jstat -gc command to see indicators of output current can be obtained heap memory the use of the district, yuan space. Heap external memory and usage statistics, you can use NMT (Native Memory Tracking, HotSpot VM Java8 introduced) acquisition. Thread stack memory space used can easily be ignored, although the thread stack memory uses lazy loading model does not directly use the + Xss size to allocate memory, but too many threads can lead to unnecessary memory footprint, you can use this script jstackmem overall statistics threads occupied.
Insufficient system memory troubleshooting ideas:
  1. First, check the current use of free memory space available, and then use the vmstat see the specific memory usage and memory growth trend, this stage is generally positioned as the most expensive memory process;
  2. Analysis cache / buffer memory usage. If this value is changed little at a time, it can be ignored. If the cache was observed / the buffer size continues to rise, it can be used pcstat, cachetop, slabtop other tools, analyze the specific occupation cache / buffers;
  3. After excluded caching effects / buffer of system memory, the memory is still growing, if found, indicating that there is likely a memory leak.
3.2 Java memory overflow
memory overflow refers to the application of a new object instance, the required memory space larger than the space available in the heap. The type of memory overflow more general keywords will see OutOfMemoryError error log. Common types of memory overflow and analyze ideas are as follows:
1) java.lang.OutOfMemoryError: the Java heap Space. The reason: the heap (the old and the new generation's) can not continue to allocate the objects, some object references have long been held not been released, the garbage collector can not be recycled, use a lot of Finalizer objects that are not recovered in the GC and other cycle. Heap overflows are generally caused due to a memory leak, if confirmed that no memory leaks may suitably by increasing heap memory.
2) java.lang.OutOfMemoryError: GC overhead limit exceeded . Reason: the garbage collector more than 98% of the time to garbage collection, but the recovery less than 2% of the heap memory, because there is typically a memory leak or heap space is too small.
3) java.lang.OutOfMemoryError: Metaspace or java.lang.OutOfMemoryError: PermGen space. Troubleshooting ideas: Check for dynamic class loading but did not uninstall, whether a large number of string constants pooling, on behalf of the permanent / dimensional space is set too small and so on.
4) java.lang.OutOfMemoryError: unable to create new native Thread. Reason: The virtual machine in expanding stack space, can not apply to enough memory space. It may be appropriate to reduce the overall size of the application and the number of threads each thread stack. In addition, the overall system of process / thread creation system also limited the total number of free memory and operating system, check carefully. Note: This stack overflow, and StackOverflowError different, which is due to the method call level too deep, stack allocated memory is not enough new stack frame lead.
In addition, there Swap partition overflow, native method stack overflow, overflow OutOfMemoryError array allocation type, because it is not very common, not introduced.
3.3 Java memory leaks
Java memory leaks can be said that the developer's nightmare, a memory leak and memory overflow is different, which is simple and crude, the site is relatively easy to find. Memory leak performance is: the application running for some time, more and more high memory utilization, response more slowly, until the process culminating in "suspended animation."
Java memory leaks may cause insufficient memory available to the system, the process of suspended animation, OOM and so on, the investigation was no more than the following two ideas:
  1. Jmap output by periodic statistics heap objects, object location number and size of growing;
  2. Use Profiler tool applications Profiling, look for memory allocation hot spots.

In addition, when the heap continues to grow, the proposed dump a snapshot of heap memory, based on a snapshot of the back to do some analysis. Although the snapshot is instantaneous value, but also has some significance.
3.4 related to garbage collection
GC (garbage collection, the same below) of the indicators is an important benchmark to measure whether the Java process memory usage healthy. Garbage collection core indicators: GC Pause (including MinorGC and MajorGC) the frequency and number, as well as details of each recovered memory, the former can be obtained directly by jstat tools, which need to analyze GC logs. Note that, jstat output column FGC / FGCT represents the number of garbage collections a year old, the emergence of GC Pause (ie Stop-the-World), such for CMS garbage collector, garbage collection every years old this value will be increased by 2 (initial marking and re-marking the two Stop-the-World stage, this statistic will be 2.
when the need for GC tuning? it depends on the circumstances of the application, such as the response time requirements, throughput requirements, experience and some system resource constraints:. GC significant increase in the frequency and time-consuming, the Pause GC average time exceeds 500ms, Full GC execution frequency of less than 1 minute, etc., some of the features to meet if the GC above described GC needs to be tuned up.
Since the garbage collector a wide range of different applications, tuning strategy is also somewhat different, so here are a few common strategies of GC tuning.
1) select the appropriate GC collector. Depending on the application of delay, throughput, characteristics of each of the garbage collector, a reasonable choice. G1 is recommended to replace CMS garbage collector, G1's performance is gradually optimized in 8GB memory and the following machine, its performance in various aspects also have to catch up and even beyond the trend. G1 parameter adjustment more convenient, while CMS garbage collector parameters is too complicated and likely to cause fragmentation of space, high CPU consumption of such problems, it is also currently in the abandoned state. Java 11 years ZGC introduced new garbage collector, can be used to do basic concurrent mark and full recovery stage, it is worth the wait.
2) a reasonable heap memory size. Do not set the heap size is too large, it is recommended not to exceed 75% of the system memory to avoid system memory is exhausted. The maximum heap size and the size of the heap initialization consistent and avoid heap shock. Set the size of the new generation of more crucial, we GC adjust the frequency and time-consuming, many times, in the resizing of the new generation, including the proportion of the new generation and the old age, proportion of the new generation in Eden area and Survivor areas, these the set also need to consider the proportion of aged generations promotion of objects, what the whole process needs to be considered, or more. If you use the G1 garbage collector, something new generation need to consider the size of this one on the lot less adaptive strategies will determine every recycling collection (CSet). Adjustment of the core of the new generation GC tuning, rely heavily on the experience, but in general, Young GC high frequency, meaning the new generation of small (or region and Survivor configuration unreasonable Eden), Young GC long time, meaning newborn Generation too large, the two directions substantially no difference.
3) reducing the frequency of Full GC. If you frequent the Full GC years old or GC, it is likely to be a memory leak, causing the object to be long-term holders, analyzed by dump memory snapshot, generally can quickly locate the problem. In addition, the proportion of the new generation and the old age inappropriate, leading to frequent objects are allocated directly to the old era, but also may cause Full GC, this time requires a combination of a comprehensive analysis of the business code and memory snapshots. In addition, by configuring the GC parameters that can help us get a lot of critical information needed to tune GC advantages, such as configuration -XX: + PrintGCApplicationStoppedTime-XX: + PrintSafepointStatistics-XX: + PrintTenuringDistribution, respectively, can be obtained GC Pause distribution, time-consuming security point statistics, target promotion information age distribution, with -XX: + PrintFlagsFinal can let us know the final GC parameters in force and so on.

 

4. disk I / O and network I / O
4.1 disk I / O troubleshooting ideas:
  1. Disk-related metrics using the tool output to output, commonly used% wa (iowait),% util, disk I / O input is determined according to whether there is an abnormality, such as higher% util this indicator, indicating heavier I / O behavior ;
  2. Use pidstat target specific process, focus on data size and rate of the next read or write;
  3. Use lsof + process ID, the exception process can be opened to view the list of files (including directories, block device, dynamic library, network sockets, etc.), in conjunction with business code, generally targeting source I / O, and if necessary specific analysis , and the like may also be used perf positioning tool trace I / O source.
It should be noted,% wa (iowait) does not mean necessarily mean increased disk I / O bottlenecks, which is a percentage of the value represents the CPU time I / O operations occupied, if at this time in the application process the main activity is the I / O, it is normal.
4.2 Network I / O bottleneck, the possible reasons are as follows:
  1. A transfer target is too large, it may result in slow response to the request, while GC frequent;

  2. Network I / O model selection is unreasonable, resulting in a lower overall QPS applications, response time is long;

  3. RPC calls thread pool set unreasonable. Using statistical distribution of the number of jstack thread, or more if in TIMED_WAITING WAITING state of the thread, you need to focus on. Example: database connection pool is not enough, reflected in the thread stack is a lot of competition in a thread-locking connection pool;

  4. RPC call timeout unreasonable, resulting in more requests failed;

Java thread stack snapshot application is useful for troubleshooting problems in addition to the thread pool configuration unreasonable mentioned above, some other scenarios, such as CPU soared high, slow response and other applications, can first start thread stack.


Useful command line 5.
This section gives a number of performance issues positioning commands for rapid positioning.
1) view the current network connections system
netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
2) Check the heap in the distribution of the object Top 50 (locate memory leaks)
jmap –histo:live $pid | sort-n -r -k2 | head-n 50
3) process listed in the top 10 of the use of CPU / memory
#内存ps axo %mem,pid,euser,cmd | sort -nr | head -10#CPUps -aeo pcpu,user,pid,cmd | sort -nr | head -10
4) shows the overall system and idle CPU utilization rate
grep "cpu " /proc/stat | awk -F ' ' '{total = $2 + $3 + $4 + $5} END {print "idle \t used\n" $5*100/total "% " $2*100/total "%"}'

5) Press the thread state statistics (enhanced version) thread
jstack $pid | grep java.lang.Thread.State:|sort|uniq -c | awk '{sum+=$1; split($0,a,":");gsub(/^[ \t]+|[ \t]+$/, "", a[2]);printf "%s: %s\n", a[2], $1}; END {printf "TOTAL: %s",sum}';

 

6) Check the machine Top10 thread stack information are consuming the most CPU's

Recommend the use of show-busy-java-threads script that can be used to quickly troubleshoot Java CPU performance issues (top us too high), CPU automatically detect many threads running Java process consumes, and print out its thread method stack to determine the cause of performance problems calling the script has been used Ali online operation and maintenance environment. Link address: https: //github.com/oldratlee/useful-scripts/.
7) Flame FIG generation (need to perf, perf-map-agent, FlameGraph these three items):
# 1. 收集应用运行时的堆栈和符号表信息(采样时间30秒,每秒99个事件);sudo perf record -F 99 -p $pid -g -- sleep 30; ./jmaps
# 2. 使用 perf script 生成分析结果,生成的 flamegraph.svg 文件就是火焰图。sudo perf script | ./pkgsplit-perf.pl | grep java | ./flamegraph.pl > flamegraph.svg
8) process listed in the top 10 of the use of Swap partition
for file in /proc/*/status ; do awk '/VmSwap|Name|^Pid/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 3 -n -r | head -10
9) JVM memory usage and garbage collection state statistics
#显示最后一次或当前正在发生的垃圾收集的诱发原因jstat -gccause $pid
#显示各个代的容量及使用情况jstat -gccapacity $pid
#显示新生代容量及使用情况jstat -gcnewcapacity $pid
#显示老年代容量jstat -gcoldcapacity $pid
#显示垃圾收集信息(间隔1秒持续输出)jstat -gcutil $pid 1000
10) some other daily command
# 快速杀死所有的 java 进程ps aux | grep java | awk '{ print $2 }' | xargs kill -9
# 查找/目录下占用磁盘空间最大的top10文件find / -type f -print0 | xargs -0 du -h | sort -rh | head -n 10

6. Summary

Performance optimization is a big field, and there's every little point, can be expanded to dozens of articles to elaborate. Application performance optimization, in addition to the above-described, as well as the front end of the optimization, the optimization architecture (distributed cache use, etc.), data storage optimization, and code optimization (such as design optimization mode) or the like, is limited to limited space, here did not start one by one, the contents of this article, just play a serve as a stimulus. At the same time, something of this article is some of my experience and knowledge, not necessarily all right, I hope you correct and supplement.
Performance optimization is a comprehensive work, we need to practice constantly, learning tool, integrated into the actual combat experience learning to constantly improve, to form a tuning methodology of their own.
In addition, although the performance optimization is important, but not too early to put too much effort on optimization (of course, improve the structure of the design and coding is necessary), premature optimization is the root of all evil. On the one hand, the optimization of work to do ahead of time, may not apply to rapidly changing business needs, actually to the new requirements, new features played a role in obstructing; on the other hand, premature optimization makes the application complexity increases, reducing the use of maintainability. When optimized, optimized to what extent is a need for multi-party trade-off proposition.
References: [1] https://github.com/superhj1987/awesome-scripts [2] https://github.com/jvm-profiling-tools/perf-map-agent??
        [3] HTTPS: // github.com/brendangregg/FlameGraph?
     [4]  https://github.com/apangin/jstackmem/blob/master/jstackmem.py

Guess you like

Origin www.cnblogs.com/lyhero11/p/11974636.html