jvm series (X): how to optimize Java GC "

Transfer: https: //www.cnblogs.com/ityouknow/p/7653129.html

This article from the CrowHawk translation, address: how to optimize Java GC "translation" is a classic masterpiece Java GC tuning.

Sangmin Lee published in Cubrid on the third in a "Become a Java GC Expert" article series "How to Tune the Java Garbage Collection" , the author of this article is Korean, written before JDK 1.8 release, though slightly outdated in some places, but overall content is very valuable. The translator had also seen people translated article and found that many of the mistakes blunt and evasive at, it was decided to translate their own copy for everyone to share.

This article is "Java GC to become experts," third in a series of articles, the first article in the series "Understanding Java GC" , we learned that the process of implementation of different algorithms GC, GC works of the new generation and the old age concept, 5 GC types you need to know about the 7 JDK and the impact on performance of each GC.

In the second article of the series "How to monitor Java GC" , the author has explained the principles of real-time JVM's GC, GC monitoring methods and can make the process more rapid and efficient tool.

In the third article, the author will be based on case actual production environment, introduce a few of the best GC optimization parameters. Here we assume that you understand the contents of two articles in this series, so in order to better understanding of this article is to talk, I suggest that you read these two articles carefully before reading this article.

 

GC optimization is necessary?

Or, more precisely, GC optimization of the Java basic services is necessary? The answer is no, in fact GC basic services for the optimization of Java in some cases be dispensed with, provided that these are running Java system, must contain the following parameters or behavior:

  • Memory size has been specified by -Xms and -Xmx parameters
  • Running in server mode (-server parameter)
  • No system error logs and the like remaining timeout log

In other words, if you do not have to manually set the memory size at runtime and print out the log too much overtime, then you need to optimize the system GC.

But you need to always bear in mind one sentence: GC tuning is the last task to be done.

Now think about the most fundamental reason for the optimization of GC, working garbage collector is to remove objects created in Java, the number of objects the garbage collector needs to be cleaned and the number of GC to be performed depend on the number of objects that have been created. Therefore, in order to make your system perform well on the GC, you first need to reduce the number of created objects.

As the saying goes, "Rome was not built in a day", we should first make these small details do the following when coding, or some trivial accumulated bad code will allow GC work becomes heavy and difficult to manage:

  • Use StringBuilderor StringBufferinsteadString
  • Minimize the output log

Nevertheless, we will still be helpless. XML and JSON parsing process often takes up most of the memory, even if we have as little as possible with String, less output log, there will still be a large number of temporary memory (about 10-100MB) is used to parse XML or JSON file, but we and very difficult to abandon XML and JSON. Here, you just need to know that this process can take up a lot of memory.

If the application memory usage situation has improved after a few iterations of optimization, so long you can start to optimize the GC.

The author summarizes the GC-optimized two purposes:

  1. The number of objects will enter old age minimum
  2. Reduce the execution time of Full GC

 

The number of objects will enter old age minimum

In addition to G1 collector may be used in JDK 7 and later, the other generational GC is provided by Oracle JVM. About generational GC, that object is created in the Eden area, then transferred to Survivor areas, after remaining objects will be transferred to the old era. Because there are some objects occupy too much memory, after the Eden area has been created will be directly passed years old. GC years old will be relatively more time-consuming than the new generation GC, therefore, reduce the number of objects that enter old age can significantly reduce the frequency of Full GC. You might think that reducing the number of objects that enter old age means that they remain in the Cenozoic, the opposite is true, the new generation of memory size can be adjusted.

 

Full GC reduce the time

Full GC execution time much longer than the Minor GC, so if you spend too much time on the Full GC (over 1s), will likely timeout errors.

  • If the Full GC to reduce time by reducing old's memory, it may cause OutOfMemoryErroror lead to increased frequency of Full GC.
  • In addition, if the frequency is reduced by increasing the Full GC old's memory, Full GC time may therefore increase.

Therefore, you need to set the size of old age as a "right" values.

 

GC parameters that affect performance

As I said in the first article in the series "Understanding Java GC" mentioned at the end, do not fantasize "If someone get a good performance by GC parameters he set up, why do not we replicate his argument to set it?", Because for unused Web services that create objects the size and life cycle are not the same.

As a simple example, if a task execution condition is A, B, C, D and E, another task to perform exactly the same condition that only A and B, which then perform a task faster it? As common sense is concerned, the answer is clearly the latter.

Setting Java GC parameters is the reason, and not to set several parameters to enhance the speed of execution of GC, but will make it more slowly. The basic principle is to optimize the GC GC different parameters are applied to two or more servers, and then compare their performance, and that are proven to improve performance or reduce the execution time GC parameters applied to the final work on the server.

This table below shows the memory associated with the size and influence of GC GC performance parameters

Table 1: GC optimized JVM parameters to consider
Types of parameter description
Heap memory size -Xms Heap memory size when starting the JVM
  -Xmx The maximum heap memory limit
The new generation of space -XX:NewRatio Memory than the new generation and the old age
  -XX:NewSize The new generation of memory size
  -XX:SurvivorRatio Survivor memory than Eden area and region

When the author carrying GC optimize the most commonly used parameter -Xms, -Xmxand -XX:NewRatio. -XmsAnd -Xmxparameters are usually required, so NewRatiothe value of GC performance will have a major impact.

Some people may ask how to set up a permanent generation of memory size, you can use -XX:PermSizeand -XX:MaxPermSizeparameters to be set, but remember, only when there is OutOfMemoryErroran error that you just need to set the permanent generation of memory.

Another factor that can affect the performance of GC type garbage collector , the following table shows the optional arguments about the GC type (based on JDK 6.0):

Table 2: GC optional parameter type
GC type parameter Remark
Serial GC -XX:+UseSerialGC  
Parallel GC -XX:+UseParallelGC
-XX:ParallelGCThreads=value
 
Parallel Compacting GC -XX:+UseParallelOldGC  
CMS GC -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=value
-XX:+UseCMSInitiatingOccupancyOnly
 
G1 -XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC
In JDK 6 these two parameters must be used

In addition to the collector G1, may be switched GC type, the most common non-invasive Serial GC GC is the parameter setting table for each type of the first row, which has been specifically optimized for the client system.

GC will affect the performance of many parameters, but the parameters described above will bring the most significant results, please remember to set too many parameters does not necessarily improve the performance of the GC.

 

GC optimization process

GC optimization of processes and procedures to enhance the performance of the most common similarity, the following is the process I used:

1. Monitoring GC status

You need to monitor various GC running on the system in order to check the GC more details, please see the second article of the series "How to monitor Java GC"

2. After the analysis of the monitoring results to decide whether the need to optimize GC

GC after checking the status, you need to analyze and decide whether the monitoring structure needs to be optimized GC. If the analysis shows GC run time of only 0.1 to 0.3 seconds, then there is no need to waste time on GC optimization, but if GC run time to reach 1-3 seconds, or even more than 10 seconds, it would be useful to optimize GC necessary.

However, if you have about 10GB of memory allocated to Java, and the memory can not be saved, then it is not possible to optimize the GC. Before performing GC optimization, you need to consider why you need to allocate such a large memory space, if you assign a 1GB or 2GB of memory size and appeared OutOfMemoryError, then you should perform a heap dump (heap dump) to eliminate the cause of the abnormal the reason.

note:

Dump heap (heap dump) is used to check a memory file Java objects in memory and data. The file can be executed by the JDK jmapto create a command. In the process of creating a file, all Java programs will be suspended, and therefore, do not create the file during the execution of the system.

You can search for detailed instructions heap dump on the Internet. For Korea readers can refer directly to the book I published last year: "at The Story of the Troubleshooting for the Java Developers and System Operators"  (Sangmin Lee, Hanbit Media, 2011, 416 Pages and the)

3. Set Type GC / Memory Size

If you decide to optimize for GC, GC then you need to choose a type and set the memory size for it. At this point if you have multiple servers, as mentioned above, GC set different parameters on each machine and analyze their differences.

4. Analysis of results

You can start collecting data after setting up the GC parameters, please analyze the results in at least 24 hours before collection. If you are lucky enough, you might find the best GC parameters of the system. Otherwise, you need to analyze and check the output log memory allocation, and then need to find the optimal parameters of the system by continuously adjusting the GC type / memory size.

5. If the results are satisfactory, the parameters are applied to all servers and end the GC optimization

If the GC optimization results are satisfactory, the parameters can be applied to all servers, and stop the GC optimization.

In the following chapters, you will see every step of the concrete work done.

 

GC status monitoring and analyzing the results

In the running Web application servers (Web Application Server, WAS) GC Status view on the best way is to use the jstatcommand. Author of "How to monitor Java GC" has introduced a jstatcommand, so in this article I will focus on the data portion.

The following example shows a GC has not been performed to optimize the JVM state (although it is not running a server).

$ jstat -gcutil 21719 1s
S0    S1    E    O    P    YGC    YGCT    FGC    FGCT GCT
48.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.673
48.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.673

We look at YGC (starting from the application of the number of occurrences when Young GC to sampling) and YGCT (Young GC time used to start sampling from the application (s)), calculated YGCT / YGC will come, on average each the new generation of GC takes 50ms, which is a very small number, it can be seen by this result, we need not concerned about the impact of the new generation GC GC performance.

Now look at the FGC and FGCT (Full GC occurs when the number of starts to the sample from the application) (start time (seconds Full GC to be used when sampling from the application)), is calculated FGCT / FGC will come, average CPC GC old age takes 19.68s. There may be carried out three times a Full GC, each consuming 19.68s, there may be twice took only 1s, another took 58s. In either case, GC optimization is necessary.

Use jstatthe command can easily view the status of GC, GC but the analysis is the best way to add -verbosegcparameters to generate a log. In the previous article I have explained how to analyze these logs. HPJMeter is my favorite for the analysis -verbosegctool logs generated, it is easy to use, use HPJmeter can easily view the GC and GC execution time of occurrence frequency.

In addition, the GC execution time if all the following conditions are met, there is no need for optimization of the GC:

  • Minor GC execute very quickly (less than 50ms)
  • Minor GC is not performed frequently (about 10s performed once)
  • Full GC execute very quickly (less than 1s)
  • Full GC is not performed frequently (about 10min executed once)

Absolute, they vary with the state of the service number in parentheses is not. Some services may require a Full GC in less than 0.9s, while others will _ is wider. Therefore, for different services we need to consider whether you need to perform GC optimized according to different criteria.

When checking GC, and can not only view the Minor GC time and Full GC, GC must also pay attention to the number of times executed. If the new generation of space is too small, Minor GC will perform very frequently (sometimes executed once per second, or even more). In addition, the number of objects passed in old age will rise, resulting in increased frequency of Full GC. Therefore, in the execution jstatcommand, use the -gccapacityparameters to browse for taking up much space.

 

Set Type GC / Memory Size

Setting up the GC type

There are five Oracle JVM garbage collector, but in previous versions of JDK 7, you can only choose between Parallel GC, Parallel Compacting GC and CMS GC, but the specific choice of which, there is no specific principles and rules.

Since this is the case, how do we choose GC it? The best way is to spend three of these, but one thing must be clear --CMS GC is usually faster than other parallel (Parallel) GC (This is because the CMS GC is concurrent GC), if true, it would only select CMS GC on it, but CMS GC is not always faster when the concurrent mode failure occurs, CMS GC will be slower than the parallel GC.

Concurrent mode failure

Now let's take a deeper look at concurrent mode failure.

The biggest difference between concurrent GC and GC parallel CMS GC is a "mark - finishing" (Mark-Compact) algorithm using CMS GC "flag - Clear" (Mark-Sweep) algorithm (the details of which refer to the translator's article "GC algorithm and memory allocation strategy " ), Compact step is to eliminate the memory fragmentation by moving memory, thereby eliminating the empty area between the memory allocation.

For parallel GC is, whenever the implementation of Full GC, will be compact work, which consumes too much time. However, after completion of the implementation of Full GC, the next memory allocation will become faster (because the distribution of direct sequence adjacent memory).

Instead, CMS GC no compact process, so CMS GC run faster. But also because there is no sort of memory before performing disk cleanup, there will be a lot of fragmented memory blank area, which also led to not enough space allocated to the large object. For example, there's the old 300MB of free space, but even a 10MB objects are in no way is the old era, in this case, will be reported sequential storage "concurrent mode failure" of the warning, and then perform the compact operating system . But CMS compact GC execution of time-consuming operation in this case is much higher than the parallel GC, and this will lead to another question, a detailed description of "concurrent mode failure", the available reference Oracle engineer wrote "Understanding CMS GC Logs " .

To sum up, you need to select the one best suited for the type of GC system according to your situation.

Each system has its most suitable type GC waiting for you to find, if you have six servers, I suggest you set the same server every two parameters, then add -verbosegcparameters reanalysis.

Set the memory size

The following shows the memory size, the relationship between the number of runs GC and GC run time:

Large memory space

  • Reducing the number of GC
  • GC improved uptime

Small memory space

  • An increase in the number of GC
  • GC run time reduced

On how to set the size of memory, there is no standard answer, if sufficient server resources and Full GC can be completed within 1s, the memory to 10GB is also possible, but most servers are not in this state, when the memory is located is 10GB, Full GC will be time-consuming 10-30s, the specific nature of time and the size of the object.

That being the case, how do we set the memory size of it? I usually recommend is set to 500MB, this does not mean you have to pass -Xms500mand -Xmx500mto set the WAS memory parameters. According to the state before the GC optimization, if the Full GC also 300MB of free space, so the memory to 1GB is a good choice (300MB (default program takes) + 500MB (minimum spatial years old) + 200MB (free memory)) . This means you need to set up at least 500MB of space years old, so if you have three servers running, can put their memory are set to 1GB, 1.5GB, 2GB, and then check the results.

In theory, GC execution speed should follow 1GB> 1.5GB> 2GB, 1GB memory when GC fastest execution. However, the theory of Full GC 1GB memory consumption 1s, 2GB memory Full GC consumption 2 s in reality is not guaranteed, the actual running time also depends on the performance and size of the object server. Therefore, the best approach is to create as many measurements and monitor them.

When you set up memory space, you also need to set a parameter: NewRatio. NewRatioThe value is a fraction of the size of the new generation and the old space years. If XX:NewRatio=1, the new generation of space: space = 1 year old: 1, if the heap memory is 1GB, the new generation: Old Year = 500MB: 500MB. If NewRatioequal to 2, the new generation: Year Old = 1: 2, and therefore, NewRatiothe larger the value, the greater the space years old, the new generation of smaller space.

You might think that the NewRatioset 1 would be the best choice, however, is not the case, according to the author's experience, when NewRatiothe time is set to 2 or 3, the status of the entire GC to perform better.

What completed the fastest way is to optimize the GC? The answer is the result of comparative performance tests. In order to set different parameters for each server and monitor their best to look at the data after one or two days. When the GC to optimize the performance test, you need to ensure that different tests they have the same load and operating environment. However, even professional performance testers want to precisely control the load is very difficult and requires a lot of time to prepare. Therefore, more convenient and easy way to directly set the parameters to run, and then wait for the results of the run (even if it takes more time consuming).

GC analysis results optimization

GC Parameters and set up -verbosegcthe parameters, may be used to ensure that the tail command log is generated correctly. If the parameter is not properly set or log is not generated, then your time will be wasted. If not, then log collection, inspection results in one or two days to collect data and then. The easiest way is to log from the server to your local PC, and then analyze the data using HPJMeter.

In analyzing the results, please pay attention to the following points (the author of this priority is to develop according to their own experience, I think the most important factor to consider when selecting GC parameters are Full GC run time.):

  • Single Full GC run time
  • Single Minor GC run time
  • Full GC run interval
  • Minor GC run interval
  • Full GC entire time
  • Minor GC entire running time
  • GC entire running time
  • Full GC in the number of executions
  • The number of executions Minor GC

Find the best GC parameters is a very lucky, but most of the time, and we will not be so lucky, when optimizing for GC must be careful, because when you once try to do all the optimization work may occur OutOfMemoryErrorerror.

 

Optimization Case

So far, we have been introduced from the GC optimization theory, now is the time to these theories into practice, we will be a better understanding of GC optimized by a few examples.

Example 1

The following example is optimized for Service S, for the recently developed Service S, the implementation of Full GC consume too much time.

Now look at the implementation of jstat -gcutilthe results of

S0 S1 E O P YGC YGCT FGC FGCT GCT 12.16 0.00 5.18 63.78 20.32 54 2.047 5 6.946 8.993

Value Perm region of the left is not important for the initial GC optimization and parameter values ​​YGC is more more important for the optimization.

Minor GC average execution time and a Full GC consumed in the following table:

Table 3: Average execution time Minor Service S of Full GC and GC of
GC type GC executions GC execution time average value
Minor GC 54 2.047s 37ms
Full GC 5 6.946s 1.389s

37ms for Minor GC is not bad, but for the Full GC 1.389s it means that when GC occurs in the system database Timeout is set to 1s, it may be frequent timeouts.

First, you need to check before starting GC optimize memory usage. Use the jstat -gccapacitycommand to check the amount of memory situation. In view of the author on the server as follows:

NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX OGC OC PGCMN PGCMX PGC PC YGC FGC 212992.0 212992.0 212992.0 21248.0 21248.0 170496.0 1884160.0 1884160.0 1884160.0 1884160.0 262144.0 262144.0 262144.0 262144.0 54 5

The key values ​​are as follows:

  • New generation memory usage: 212,992 KB
  • Old's memory usage: 1,884,160 KB

Thus, in addition to permanent generation memory space allocated add up to 2GB, and the new generation: Year Old = 1: 9, in order to obtain than using jstatmore detailed results, need to add -verbosegca parameter obtaining logs, and the three servers according to the settings in the following manner (except without the use of any other parameters):

  • NewRatio = 2
  • NewRatio = 3
  • NewRatio = 4

A day later I got the GC log system, fortunately, the system did not occur in any Full GC After setting NewRatio.

Why is this? This is because most of the objects created soon after recovered, all of these objects have not been passed years old, but the new generation is destroyed recovered.

In this case, there is no need to change other parameters, as long as a selection of the most appropriate NewRatiovalue. So, how to determine the best value NewRatio it? To this end, we analyze each NewRatioaverage response time value in the Minor GC.

In each average response time Minor GC parameters are as follows:

  • NewRatio=2:45ms
  • NewRatio=3:34ms
  • NewRatio=4:30ms

We can draw according to the length of time GC NewRatio = 4 is the best parameter value (although NewRatio = 4 when the new generation space is minimal). After setting the parameters in the GC, Full GC server does not occur.

To illustrate this problem, the following is executed after a period of service execution jstat –gcutilresults:

S0 S1 E O P YGC YGCT FGC FGCT GCT 8.61 0.00 30.67 24.62 22.38 2424 30.219 0 0.000 30.219

You might think that a small server receives a request that makes GC occurs less frequently, in fact, although Full GC is not performed, but Minor GC was executed 2424 times.

Example 2

This is an example of a Service A. We have found through the company's internal application performance management (APM) JVM pauses for a long time (more than 8 seconds), so we had to optimize GC. We are trying to find the cause JVM pauses, and later found to be due to the implementation of Full GC time is too long, so we decided to optimize the GC.

At the beginning of GC optimization, we add the -verbosegcparameters, the results as shown below:

FIG 1: STW optimization time before GC

The figure is one of the images generated by the HPJMeter. JVM execution abscissa represents time, the ordinate represents time per GC. CMS is a green dot represents the result of Full GC, and Parallel Scavenge as a blue dot represents the result of Minor GC.

I said before CMS GC is the fastest GC, but the above results show that some of the time consuming CMS reached the 15s. What led to this result? Remember I said before: CMS will slow significantly in the implementation of compact (finishing) operations. In addition, the service's memory through -Xms1gand =Xmx4gset up, allocated memory is only 4GB.

So it will type from CMS GC GC instead of Parallel GC, the memory size to 2GB, and the NewRatioset 3. In the execution jstat -gcutilresult of several hours as follows:

S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 30.48 3.31 26.54 37.01 226 11.131 4 11.758 22.890

Full GC time is shortened, became every 3s, compared with the 15s has been significantly improved. But still not fast enough 3s, created for this purpose the following six cases author:

  • Case 1: -XX:+UseParallelGC -Xms1536m -Xmx1536m -XX:NewRatio=2
  • Case 2: -XX:+UseParallelGC -Xms1536m -Xmx1536m -XX:NewRatio=3
  • Case 3: -XX:+UseParallelGC -Xms1g -Xmx1g -XX:NewRatio=3
  • Case 4: -XX:+UseParallelOldGC -Xms1536m -Xmx1536m -XX:NewRatio=2
  • Case 5: -XX:+UseParallelOldGC -Xms1536m -Xmx1536m -XX:NewRatio=3
  • Case 6: -XX:+UseParallelOldGC -Xms1g -Xmx1g -XX:NewRatio=3

上面哪一种情况最快?结果显示,内存空间越小,运行结果最少。下图展示了性能最好的Case 6的结果图,它的最慢响应时间只有1.7s,并且响应时间的平均值已经被控制到了1s以内。

图2:Case 6的持续时间图

基于上图的结果,按照Case 6调整了GC参数,但这却导致每晚都会发生OutOfMemoryError。很难解释发生异常的具体原因,简单地说,应该是批处理程序导致了内存泄漏,我们正在解决相关的问题。

如果只对GC日志做一些短时间的分析就将相关参数部署到所有服务器上来执行GC优化,这将是非常危险的。切记,只有当你同时仔细分析服务的执行情况和GC日志后,才能保证GC优化没有错误地执行。

在上文中,我们通过两个GC优化的例子来说明了GC优化是怎样执行的。正如上文中提到的,例子中设置的GC参数可以设置在相同的服务器之上,但前提是他们具有相同的CPU、操作系统、JDK版本并且运行着相同的服务。此外,不要把我使用的参数照搬到你的应用上,它们可能在你的机器上并不能起到同样良好的效果。

总结

笔者没有执行heap dump并分析内存的详细内容,而是通过自己的经验进行GC优化。精确地分析内存可以得到更好的优化效果,不过这种分析一般只适用于内存使用量相对固定的场景。如果服务严重过载并占有了大量的内存,则建议你根据之前的经验进行GC优化。

笔者已经在一些服务上设置了G1 GC参数并进行了性能测试,但还没有应用于正式的生产环境。G1 GC的速度快于任何其他的GC类型,但是你必须要升级到JDK 7。此外,暂时还无法保证它的稳定性,没有人知道运行时是否会出现致命的错误,因此G1
GC暂时还不适合投入应用。

等未来JDK 7真正稳定了(这并不是说它现在不稳定),并且WAS针对JDK 7进行优化后,G1 GC最终能按照预期的那样来工作,等到那一天我们可能就不再需要GC优化了。

想了解关于GC优化的更多细节,请前往Slideshare.com 查看相关资料。强烈推荐Everything I Ever Learned About JVM Performance Tuning @Twitter,作者是Attila Szegedi, 一名Twitter工程师,请花些时间好好阅读它。

Guess you like

Origin www.cnblogs.com/sharpest/p/10965502.html