[Turn] the strongest Java application performance tuning practice guidelines!

Internet to see a good article, very comprehensive collection of turn:
the Java application performance optimization is a common theme, typical performance issues such as slow page response, the interface timeout, high server load, low number of concurrent database frequently deadlock. Especially in the "rough fierce fast" Internet development model popular today, with bloated increasing traffic systems and codes, all kinds of performance problems began to pour.

Java application performance bottleneck point is very large, systemic factors such as disk, memory, network I / O, etc., Java application code, JVM GC, databases, caching. I based on personal experience, the Java performance optimization is divided into four levels: the application layer, a database layer, a frame layer, the JVM layer, as shown in FIG.

Figure 1.Java hierarchical model performance optimization
Here Insert Picture Description
of each layer to optimize increasingly hard, and knowledge to solve the problems involved will be different. For example, the application logic layer needs to understand the code, locate the line of code in question by the Java thread stack, and so on; the database level need to analyze the SQL, positioning deadlock; layer need to understand the source code framework, a framework for understanding the mechanism; JVM level and type of work required for the GC mechanisms have a good understanding of the role of various parameters JVM gains.

Around Java performance tuning, there are two basic methods of analysis: on-site analysis and post analysis.

Site analysis by reservation site, and then locate the analysis using the diagnostic tool. Site analysis greater impact on the line, part of the scene (particularly when it comes to online users critical business) is not suitable.

Later analysis needs to collect as multi-site data, then immediately restore service, but after the analysis and reproduced for field data collection. Let's start from the performance of diagnostic tools, share a number of cases and practice.

First, the performance diagnostic tool

One is performance diagnostics for the diagnosis of performance problems have been identified and code systems, there is a pair of lines on the pre-advance system performance testing, to determine whether the on-line performance requirements.

This paper tested for the former, the latter can be pressed with various performance measurement tool (e.g. the JMeter), the range is not discussed herein.

For Java applications, performance diagnostic tools are mainly divided into two layers: OS level and application-level Java (including application code diagnostics and diagnostic GC).

OS diagnosis
diagnostic OS's main concern is the CPU, Memory, I / O in three aspects.

Two, CPU diagnosis

The main concern for the CPU average load (Load Average), CPU usage, context switches (Context Switch).

Can view the system load and average CPU usage by the top command, FIG. 2 is a top view of a system through a command status.

2.top command example of FIG.
Here Insert Picture Description

Average load of three numbers: 63.66,58.39,57.18, respectively, over 1 minute, 5 minutes, 15 minutes of machine load. According to experience, if the value is less than 0.7 * number of CPU, the system is working properly; if more than this value, even up to four or five times the number of CPU cores, the load on the system is significantly higher.

In Figure 2 the load has reached 15 minutes 57.18,1 load is 63.66 minutes (for the 16-core system), the system described loading problems arise, and the present tendency is further increased, the need to locate specific reasons.

Can be viewed by the CPU vmstat command context switches, shown in Figure 3:

FIG 3.vmstat command examples
Here Insert Picture Description

Scene number of occurrences of context switches have the following main categories:

1) time slice expires, CPU normal task scheduling;

2) the other being a higher priority task preemption;

3) perform tasks encountered I / O obstruction, suspend the current task, a task switch to the next;

4) user code initiative suspend the current task to give up CPU;

5) preemptive multitasking resources, because there is no grab is suspended;

6) hardware interrupt.

Java thread context switching mainly from the shared resource competition. Generally a single object locking system rarely become a bottleneck, unless the lock granularity is too large. However, an access frequency is high, a plurality of successive objects in a locked block of code may appear a large number of context switches, become a system bottleneck.

For example, once in our system log4j 1.x volume printing at larger concurrent logs, frequent context switching, a lot of thread blocking, lead to a large drop in system throughput, which is the relevant code is shown in Listing 1, to upgrade log4j 2.x only solve this problem.

for(Category c = this; c != null; c=c.parent) {
 // Protected against simultaneous call to addAppender, removeAppender,…
 synchronized(c) {
 if (c.aai != null) {
 write += c.aai.appendLoopAppenders(event);
 }
 }
}

三、 Memory

From the perspective of the operating system, memory, attention to the adequacy of the application process, you can use the free -m command to view memory usage.

You can view the process used by the top command virtual memory VIRT and physical memory RES, according to the formula VIRT = SWAP + RES can calculate the swap partition specific application used (Swap), the use of swap too much effect on Java application performance can be swappiness transferred to the value as small as possible.

Because for Java applications, take up too much swap can affect performance, disk performance, after all, much slower than memory.

Four, I / O

I / O includes a disk I / O, and network I / O, disk more prone to I / O bottlenecks in general. Can see how disk read by iostat, through the CPU I O wait can be seen / disk I / O is normal.

If the disk I / O has been in a high state, indicating slow or disk failure, it has become a performance bottleneck, the need for application optimization or disk replacement.

Addition to the usual top, ps, vmstat, iostat commands, there are other Linux tools to diagnose system problems, such as mpstat, tcpdump, netstat, pidstat, sar and so on. Linux Brendan summary lists different types of device performance diagnostic tools, as shown in FIG. 4, for reference.

FIG Performance Observation tool 4.Linux
Here Insert Picture Description

Five, Java applications and diagnostic tools

The application code is relatively easy to solve performance problems of a class of performance problems. Through some application level monitoring alarm, if there are problems and code determined by the code can be located directly; or via top + jstack, identify problem thread stack, positioned on issues threaded code, you can find the problem. For more complex, logic more code segments, most often the application code may be positioned Stopwatch performance problems by printing performance log.

Common Java application diagnostics including diagnostic thread stack, GC and other aspects.

jstack

jstack with the top command is typically used Java localization process and thread through the top -H -p pid, reuse jstack -l pid export thread stack. As the thread stack is transient, thus requiring multiple dump, usually three times a dump, usually every every 5s on the line. The positioning of the top Java thread pid turn into hexadecimal, get Java thread stack nid, the problem can be found in the corresponding thread stack.

Figure 5. -p see how it works for a long time Java thread through the top -H
Here Insert Picture Description

5, wherein the longer threads running 24985, may be a problem, transformed into hexadecimal data by Java thread stack to find the corresponding thread stack as 0x6199 to locate problems, as shown in FIG.

Figure 6.jstack viewing thread stacks
Here Insert Picture Description

JProfiler

JProfiler may be performed on the CPU, heap, memory analysis, powerful, as shown in FIG. Combined with pressure measurement tool, the code can be time-consuming sampling statistics.

7. FIG memory analysis by JProfiler
Here Insert Picture Description
six, GC diagnosis

Java GC addresses the risk of programmers manage memory, but the application of GC pause caused another problem to be solved become. JDK provides a range of tools to locate the GC issue, more commonly used jstat, jmap, as well as third-party tools such as MAT.

jstat

GC jstat command details printable, Young Full GC and GC frequency information stack. The command format is

jstat -gcxxx -t pid, as shown in FIG.

FIG 8.jstat command examples
Here Insert Picture Description

jmap

jmap printing process Java heap information jmap -heap pid. By jmap -dump: file = xxx pid to stack dump file can then be further analyzed by the use of other tools stack

MAT

MAT is the Java heap analysis tool, provides an intuitive diagnostic report, a built-in OQL allows the heap class SQL queries, powerful, outgoing reference and incoming reference can be traced to an object reference.

FIG example 9.MAT
Here Insert Picture Description

Figure 9 is an example using MAT, MAT has two columns show the size of the object, respectively Shallow size and Retained size, the former represents the size of memory occupied by the object itself, which does not contain the object referenced by the object itself and which is referenced directly or indirectly Shallow size of the object and that the object is released after being recovered GC memory size, the size of which can in general concern.

For some piles (tens G) of Java applications that require large memory to open MAT.

Usually local development machine memory is too small, can not be opened, it is recommended online at server-side installation graphical environment and MAT, remote open to view. Mat piles or execute commands raw index, the index to the local copy, but is limited in this way see the stack information.

In order to diagnose the problem GC, we suggested adding the JVM argument -XX: + PrintGCDateStamps. GC Parameters commonly used as shown in FIG.

Figure 10. Common GC Parameters
Here Insert Picture Description

** For Java applications, by top + jstack + jmap + MAT can locate most applications and memory problems, it can be described as an essential tool. ** Sometimes, Java applications need to refer to diagnose OS information, you can use some of the more comprehensive diagnostic tools, such as Zabbix (integrated OS and JVM monitoring) and so on. In a distributed environment, distributed tracking systems and other infrastructure has on application performance diagnostics provides a strong support.

Seven, performance optimization practice

After the introduction of some commonly used diagnostic tool After the performance, the following will combine some of our practices in the Java application tuning in, case sharing from the JVM layer, application layer and the database layer of code.

JVM Tuning: GC pain
select a system reconfiguration XX business platform as an internal RMI remoting protocol, began periodic service stops responding after line on the system, pause time ranging from a few seconds to tens of seconds. By observing the GC log and found that an hour after the service since the launch there will be a Full GC. Because the system heap is set larger, Full GC pause time will be longer application time, this greater impact on the online real-time services.

After analysis, the situation on a regular basis Full GC does not appear in the system before the reconstruction, therefore suspected RMI framework dimension. Through public information and found that RMI's GDC (Distributed Garbage Collection, distributed garbage collection) will start the daemon thread periodically perform the Full GC to reclaim the remote object, Listing 2 shows the code for its daemon thread.

Listing 2.DGC daemon thread source code

private static class Daemon extends Thread {
 public void run() {
 for (;;) { 
     //…
 long d = maxObjectInspectionAge();
 if (d >= l) {
    System.gc(); 
 d = 0;
 }
 //…
 }
     }
}

After locating the problem to solve relatively easy. One is by adding -XX: + DisableExplicitGC parameters, disable the Show directly call the GC system, but the system using NIO, there will be the risk of external memory heap overflow.
Another way is by adjusting the parameters of the large -Dsun.rmi.dgc.server.gcInterval and -Dsun.rmi.dgc.client.gcInterval, Full GC interval increases, while increasing the parameter -XX: + ExplicitGCInvokesConcurrent, to a full Stop- the-World of Full GC adjusted to concurrent GC cycle time, reduce application pause times, while the NIO application will not be affected.
Seen from FIG. 11, Full GC after adjusting the number of significantly reduced after three months.
Figure 11.Full GC monitoring statistics
Here Insert Picture Description

GC tuning for high concurrency applications interact with large amounts of data is still very necessary, especially in the default JVM parameters are usually not satisfied with business needs, the need for specialized tuning. Interpretation of GC logs have a lot of public information, this will not go.

GC tuning target basic three ideas: to reduce the frequency of GC, can be generated by increasing the heap to reduce unnecessary objects; GC pause times reduced, by reducing the heap space, using CMS GC algorithm; Full avoiding GC, adjustment trigger CMS proportion, avoid Promotion failure and Concurrent mode failure (years old allocate more space, increasing the number of GC threads to speed up the recovery rate), to reduce generation and other large objects.

Application Layer Tuning: Code smell of bad taste
to start tuning the code from the application layer, the root causes for the decline in code efficiency, undoubtedly one of the good means to improve the performance of Java applications.

A commercial system (using Nginx load balancing) after a certain time every day on the line, including a sharp increase in several machines load, CPU usage quickly played. We had an emergency rollback online and on-site to a server which is saved by jmap and jstack.

Figure 12. Analysis by the stack field MAT
Here Insert Picture Description

Field stack 12, according to the analysis of the dump MAT data, most often found as a memory object and byte [] java.util.HashMap $ Entry, and there is a circular reference java.util.HashMap $ Entry object. Preliminary locate there may be an infinite loop problem (Figure java.util.HashMap $ Entry 0x2add6d992cb8 and 0x2add6d992ce8 the next reference form a cycle) in the process put in HashMap.

Access to relevant documents which belong to positioning errors typical scene of concurrent use of ( http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6423457 ), briefly put, is HashMap itself does not have multithreading features, in the case of multiple threads of a put operation, it will lead to the interior of the formed annular structure list HashMap internal expansion of the array, so that an endless loop.

For the on-line, the biggest change is to improve system performance through memory cache site data, using both lazy loading mechanism, as shown in Listing 3.

Listing 3. Website Data lazy loading code

    private boolean isResetDomains() {
        if (CollectionUtils.isEmpty(domainMap)) {
            // 从远端 http 接口获取网站详情
            List<UnionDomain> newDomains = unionDomainHttpClient
                    .queryAllUnionDomain();
            if (CollectionUtils.isEmpty(domainMap)) {
                domainMap = new HashMap<Long, UnionDomain>();
                for (UnionDomain domain : newDomains) {
                    if (domain != null) {
                        domainMap.put(domain.getSubdomainId(), domain);
                    }
                }
            }
            return true;
        }
        return false;
    }

DomainMap can see here is static shared resources, it is the type HashMap in a multithreaded situation will lead to its internal list form a ring structure, an infinite loop.

Can be seen through the front-end connection and Nginx access logs, since the system is restarted Nginx amassed a user request to start Resin containers, many users require influx of applications, multiple users simultaneously request data and initialization sites work, leading to HashMap concurrency problems arise. After the fault location solution is relatively simple, the main solutions are:

(1) using the block synchronization or ConcurrentHashMap solve the concurrent problems;

(2) complete website cache loaded before the system starts, remove the lazy loading and so on;

(3) replacing the local cache like a distributed cache.

For the positioning of bad code, in addition to code review in the conventional sense, with tools such as MAT can quickly locate performance bottlenecks of the system point to a certain extent. But in some cases tied to a specific scene or business data binding, but the need for auxiliary code review, performance testing tools, data lines and even simulate drainage, etc. in order to ultimately confirm the source of performance problems. The following is a summary of some of our bad code may some features, for your reference:

(1) poor readability of the code, without the basic programming specifications;

(2) object generation or generating too large objects, memory leaks and the like;

(3) IO excessive flow operation, or forget to turn off;

Excessive (4) database operations, the transaction is too long;

(5) using the synchronization error scenarios;

(6) loop iteration time consuming operation.

Database layer Tuning: deadlock nightmare
for most Java applications, to interact with the scene database is very common, especially for OLTP this data consistency demanding applications, database performance will directly affect the entire application performance. Sogou business platform system as advertisers and advertising serving platform, its timeliness and consistency of the material has high demands, we are in a relational database optimization has also accumulated some experience.

For advertising materials library, a higher frequency of operation (particularly through the bulk material tool operation) very easily result in a database deadlock situation occurs, one of the more typical scenario is advertising material price adjustment. Customers tend to frequent adjustments of bid materials, and thus indirectly cause greater pressure to load the database system, also contributed to the possibility of deadlock. Below an advertising system case Sogou business platform advertising materials price adjustment will be described.

Day visits a sudden increase in commercial systems, resulting in increased system load and database frequent deadlock, deadlock statement is shown in Figure 13.

Figure 13. Deadlock statement
Here Insert Picture Description

Wherein, groupdomain the index table idx_groupdomain_accountid (accountid), idx_groupdomain_groupid (groupid), three single index structure primary (groupdomainid), using Mysql innodb engine.

This scenario occurs when updating group bid, there is a scene group, industry group (groupindus table) and Group website (groupdomain table).

When updating group bid, if the industry group bids using group bid (marked by isusegroupprice, if it is a use group bid). At the same time if the bid team Web site using industry group bid (marked by isuseindusprice, if it is to use an industry group bid), the group also need to update its Web site bid. Since the maximum possible for each group following a 3000 site, so when updating group bid will long for relevant records locked.

Deadlock problems can be seen from above, the Transaction 1 and Transaction 2 are selected idx_groupdomain_accountid a single index. According to Mysql innodb engine lock features in a single transaction will choose to use an index, but if the use of secondary indexes Once locked, tries to lock the primary key index. Further analysis shows that a transaction in the transaction request held by two idx_groupdomain_accountidlocking secondary index (lock range "space id 5726 page no 8658 n bits 824 index"), but the transaction has been secondary index 2 ( "space id 5726 page no 8658 n bits 824 index "on) added lock, lock requests waiting for a lock on the primary key index pRIMARY index. As the waiting transaction execution time is too long or 2 long time not to release the lock, resulting in a transaction rollback occurs 1 final.

By day visit log tracking can see, there are a large number of modifications initiated by the customer bid to promote the group's operations via script way that day, resulting in a large number of transactions in the primary loop waiting for key PRIMARY index of the previous transaction to release the lock. Source of the problem actually lies in index Mysql innodb engine for limited use in an Oracle database This issue is not prominent.

The natural way is to solve a single transaction locks the number of records as possible, the probability of this deadlock will be greatly reduced. The final use (accountid, groupid) composite index, reducing the number of records in a single transaction lock, but also to promote the realization of the isolation of data records under different programs, thereby reducing the probability of occurrence of such a deadlock.

Generally speaking, for tuning the database layer we will basically start from the following aspects:

(1) to optimize the SQL statement level: Slow SQL analysis, index analysis and tuning, split and other matters;

(2) to optimize the database configuration level: such field is designed to adjust buffer size, disk I / O, database parameter optimization, data defragmentation;

(3) from the database to optimize the structural level: consider splitting vertical and horizontal split database, and the like;

(4) selecting a suitable database engine, or adapted to different types of scenes, such as NoSQL consider introducing the like.

Eight, summary and recommendations

2-8 tuning follows the same principles, the performance problem is 80% from 20% of the generated code, thus optimizing the key code multiplier. At the same time, to optimize the performance of on-demand optimization to do, over-optimization may introduce more problems. For Java performance tuning, not only to understand the system architecture, application code, it requires the same attention JVM level and even the underlying operating system. To sum up the main can be considered from the following points:

1) Tuning and performance of

Base properties herein refers to the hardware level or to upgrade the operating system level optimization, such as network tuning, operating system version upgrades, hardware optimization. Such as the use F5's hard drive and SDD introduced, including a new version of the Linux upgrade NIO aspects of performance can be greatly promote the application;

2) Database Performance Optimization

Including common affairs split, index tuning, SQL optimization, and other NoSQL introduced, such as the introduction of asynchronous processing when the transaction is split, and ultimately achieve consistency introduction of practices, including the introduction of various types of NoSQL database for a particular scene, can greatly ease the shortage of traditional database under high concurrency;

3) application framework optimized

Introduce some new computing or storage framework, with the new feature to solve the performance bottleneck of the original cluster computing the like; or the introduction of a distributed strategy, level of computing and storage, comprising pre-calculated in advance and the like, using a typical practice space for time and the like; the system load can be reduced to some extent;

4) the operational level of optimization

Technology is not the only means to enhance system performance in many scenes of the performance problems, in fact, you can see a large part because of the special business scenarios arising, if able to evade or adjustment in business, in fact, often the most Effective.

【转】https://mp.weixin.qq.com/s/wK7Yb_f_AY9miElZGEBQJQ

Guess you like

Origin blog.csdn.net/weixin_37586375/article/details/94767341
Recommended