jvm monitoring and tuning







1. JVM memory model and garbage collection algorithm
1. According to the Java virtual machine specification, the JVM divides memory into:
New (young generation)
Tenured (old generation)
permanent generation (Perm)
  Among them, New and Tenured belong to heap memory, and heap memory will It is allocated from the memory specified by the JVM startup parameter (-Xmx:3G). Perm does not belong to the heap memory and is directly allocated by the virtual machine, but its size can be adjusted by parameters such as -XX:PermSize -XX:MaxPermSize.

Young generation (New): The
young Tenured: The objects in the young generation that are not collected by garbage collection will be copied to the old generation
Permanent generation (Perm): Permanent generation storage Class and Method meta information, the size of which is related to the scale of the project, the amount of classes, and methods. Generally, 128M is enough. The setting principle is to reserve 30% of the space.
New is divided into several parts:
Eden: Eden is used to store the objects just allocated by the JVM.
Survivor1
Survivro2: The two Survivors have the same size. When the objects in Eden are not collected by garbage collection, they will be between the two Survivors. Copy back and forth, when a certain condition is met, such as the number of copies, it will be copied to Tenured. Obviously, Survivor just increases the duration of objects in the young generation, increasing the possibility of being garbage collected.
2. Garbage collection algorithms
  Garbage collection algorithms can be divided into three categories, all based on mark-sweep (copy) algorithms:
Serial algorithm (single thread)
parallel algorithm
Concurrent algorithm
  The JVM will select a suitable recycling algorithm for each memory generation according to the hardware configuration of the machine. For example, if the machine has more than one core, a parallel algorithm will be selected for the young generation. For details on the selection, please refer to the JVM tuning document.
  A little explanation is that the parallel algorithm uses multi-threading for garbage collection, and the execution of the program will be suspended during the collection period, while the concurrent algorithm is also multi-threaded recycling, but the application execution is not stopped during the period. Therefore, concurrent algorithms are suitable for some programs with high interactivity. After observation, the concurrent algorithm will reduce the size of the young generation, in fact, use a large old generation, which in turn has relatively low throughput compared to the parallel algorithm.

  Another question is, when does the garbage collection action take place?
When the young generation memory is full, a normal GC is triggered, which only recycles the young generation. When it needs to be emphasized, the full young generation refers to the full Eden generation. Survivor full will not trigger GC
.
When . Full GC will lead to the unloading of Class and Method meta information.
  Another problem is, when will OutOfMemoryException be thrown, not when the memory is
exhausted. 98% of the time is spent on memory recovery. The memory recovered
each time is less than 2%
  meeting these two conditions will trigger OutOfMemoryException, which will leave a small gap for the system to do some operations before Down, such as manually printing Heap Dump.

2. Memory leaks and solutions
1. Some phenomena before the system crashes:
the time for each garbage collection is getting longer and longer, from the previous 10ms to about 50ms, and the FullGC time is also extended from the previous 0.5s to 4, 5s
FullGC More and more times, the most frequent FullGC is performed in less than 1 minute.
The memory of the old generation is getting bigger and bigger and no memory is released in the old generation after each FullGC.
After that, the system will not be able to respond to new requests and will gradually reach the critical value of OutOfMemoryError.

2. Generate heap dump file Generate the current Heap information
through JMX MBean, the size is a 3G (the size of the entire heap) hprof file, if JMX is not started, the file can be generated by the Java jmap command.

3. Analyze the dump file
The next thing to consider is how to open the 3G heap information file. Obviously, the general Window system does not have such a large memory, and must use a high-configuration Linux. Of course, we can import graphics on Linux to Window with the help of X-Window. We consider using the following tools to open the file:
Visual VM
IBM HeapAnalyzer
JDK comes with Hprof tool In order to ensure the loading speed when
using these tools, it is recommended to set the maximum memory to 6G. After use, it was found that these tools could not visually observe the memory leak. Although Visual VM could observe the object size, it could not see the call stack. Although HeapAnalyzer could see the call stack, it could not open a 3G file correctly. Therefore, we chose Eclipse's special static memory analysis tool: Mat.

4. Analysis of memory leaks
Through Mat, we can clearly see which objects are suspected to be memory leaks, which objects occupy the largest space and the calling relationship of the objects. In this case, there are many instances of JbpmContext in ThreadLocal. After investigation, the Context of JBPM is not closed.
In addition, through Mat or JMX, we can also analyze the thread state, and we can observe which object the thread is blocked on, so as to judge the bottleneck of the system.

5. Regression problems
   Q: Why does the garbage collection take longer and longer before a crash?
   A: According to the memory model and garbage collection algorithm, garbage collection is divided into two parts: memory marking and clearing (copying). The marking part is unchanged as long as the memory size is fixed for a fixed time, and the copying part changes, because each garbage collection has some The memory that cannot be reclaimed increases the amount of replication, resulting in prolonged time. Therefore, the garbage collection time can also be used as a basis for judging memory leaks.
   Q: Why is the number of Full GCs increasing?
   A: Therefore, the accumulation of memory gradually exhausts the memory of the old generation, resulting in no more space for new objects to be allocated, which leads to frequent garbage collections
   . Q: Why does the memory occupied by the old generation become larger and larger?
   A: Because the memory of the young generation cannot be recycled, more and more are copied to

the
. For a 16G, 64bit Linux server, it is a serious waste of resources.
While the CPU load was insufficient, and occasionally users reported that the request took too long, we realized that the program and JVM had to be tuned. From the following aspects:
Thread pool: Solve the problem of long user response time
Connection pool
JVM startup parameters: Adjust the memory ratio and garbage collection algorithm of each generation to improve throughput
Program algorithm: Improve program logic algorithm to improve performance
  1. Java thread Pool (java.util.concurrent.ThreadPoolExecutor)
    The thread pool used by most JVM6 applications is the thread pool that comes with the JDK. The reason why the mature Java thread pool is described is that the behavior of the thread pool is different from the A little bit different from what we imagined. The Java thread pool has several important configuration parameters:
corePoolSize: the number of core threads (the latest number of threads)
maximumPoolSize: the maximum number of threads, tasks exceeding this number will be rejected, users can customize the processing method through the RejectedExecutionHandler interface
keepAliveTime: the time that the thread remains active
workQueue: the work queue, which stores the executed tasks
    The Java thread pool needs to pass in a Queue parameter (workQueue) to store the executed tasks, and for different choices of Queue, the thread pool has completely different behaviors:
SynchronousQueue: A waiting queue with no capacity, the insert operation of a thread must wait The remove operation of another thread, using this Queue thread pool will allocate a new thread
LinkedBlockingQueue for each task: Unbounded queue, using this Queue, the thread pool will ignore the maximumPoolSize parameter, and only use the corePoolSize thread to process all tasks, not processed The tasks of ArrayBlockingQueue are queued in
LinkedBlockingQueue: Bounded queue, under the action of bounded queue and maximumPoolSize, the program will be difficult to tune: larger Queue and small maximumPoolSize will lead to low CPU load; small Queue and For a large pool, Queue does not start to function as it should.
    In fact, our requirements are very simple. We hope that the thread pool can be the same as the connection pool. The minimum number of threads and the maximum number of threads can be set. When the minimum number < task < maximum number, new threads should be allocated for processing; when task > maximum number, It should wait for an idle thread to process the task.
    However, the design idea of ​​the thread pool is that the task should be placed in the Queue. When the Queue cannot be placed, consider using a new thread for processing. If the Queue is full and new threads cannot be derived, the task will be rejected. The design leads to "put first and wait for execution", "do not let go before executing", "refuse to wait". Therefore, according to different Queue parameters, the maximumPoolSize cannot be increased blindly to improve the throughput.
    Of course, to achieve our goal, we must encapsulate the thread pool. Fortunately, there are enough custom interfaces in ThreadPoolExecutor to help us achieve our goal. The way we encapsulate is:
use SynchronousQueue as a parameter to make maximumPoolSize play a role to prevent threads from being allocated without limit, and at the same time can improve system throughput by increasing
maximumPoolSize Customize a RejectedExecutionHandler to process when the number of threads exceeds maximumPoolSize, process The method is to check whether the thread pool can execute new tasks at intervals. If the rejected tasks can be put back into the thread pool, the check time depends on the size of keepAliveTime.
  2. When the connection pool (org.apache.commons.dbcp.BasicDataSource)
    uses org.apache.commons.dbcp.BasicDataSource, because the default configuration is used before, when the traffic volume is large, many Tomcat threads are observed through JMX All are blocked on the lock of the Apache ObjectPool used by BasicDataSource. The direct reason was that the maximum number of connections in the BasicDataSource connection pool was set too small. The default BasicDataSource configuration only used 8 maximum connections.
    I also observed a problem that when the system is not accessed for a long time, such as 2 days, Mysql on the DB will disconnect all connections, resulting in unusable connections cached in the connection pool. In order to solve these problems, we fully studied BasicDataSource and found some optimization points:
Mysql supports 100 connections by default, so the configuration of each connection pool should be done according to the number of machines in the cluster. Set to 60
initialSize: The parameter is the number of connections that have been opened
minEvictableIdleTimeMillis: This parameter sets the idle time of each connection, after which the connection will be closed
timeBetweenEvictionRunsMillis: The running cycle of the background thread, used to detect expired connections
maxActive: The maximum allocation Number of connections
maxIdle: The maximum idle number. When the connection is found to be greater than maxIdle after the connection is used up, the connection will be closed directly. Only connections with initialSize < x < maxIdle will be periodically checked for expiration. This parameter is mainly used to improve throughput during peak access.
How is initialSize maintained? After researching the code, it is found that BasicDataSource will close all overdue connections, and then open the initialSize number of connections. This feature, together with minEvictableIdleTimeMillis and timeBetweenEvictionRunsMillis, ensures that all overdue initialSize connections will be reconnected, thus preventing Mysql from breaking if there is no action for a long time. connection drop problem.
  3. JVM parameters
    In the JVM startup parameters, you can set some parameter settings related to memory and garbage collection. By default, the JVM will work well without any settings, but some well-configured servers and specific applications must be carefully tuned to obtain best performance. By setting, we hope to achieve some goals:
the GC time is sufficient,
the number of small GCs is low enough, the cycle
of full GC is long enough, and the
  first two are currently contradictory. If the GC time is small, a smaller heap is required. To ensure that the number of GCs is small enough, a larger heap must be guaranteed, and we can only take a balance.
   (1) For the setting of the JVM heap, the minimum and maximum values ​​can generally be limited by -Xms -Xmx. In order to prevent the garbage collector from shrinking the heap between the minimum and maximum values, we usually set the maximum and minimum values ​​to The same value
   (2) the young generation and the old generation will allocate heap memory according to the default ratio (1:2). You can adjust the size between the two by adjusting the ratio NewRadio between the two, or for the collection generation. , such as the young generation, set its absolute size through -XX:newSize -XX:MaxNewSize. Similarly, in order to prevent the heap shrinkage of the young generation, we usually set -XX:newSize -XX:MaxNewSize to the same size
   (3) How big are the young and old generation settings reasonable? There is no doubt that this question of mine has no answer, otherwise there would be no tuning. Let's observe the impact of the size changes of the two. A
larger young generation will inevitably lead to a smaller old generation. A large young generation will prolong the cycle of ordinary GC, but will increase the time of each GC; a small old generation will The smaller young generation that leads to more frequent Full GC
will inevitably lead to a larger old generation, and the small young generation will lead to frequent normal GC, but each GC time will be shorter; the large old generation will reduce the amount of Full GC. frequency
How to choose should depend on the distribution of the application object life cycle: if the application has a large number of temporary objects, a larger young generation should be selected; if there are relatively many persistent objects, the old generation should be appropriately increased. However, many applications do not have such obvious features. The following two points should be considered when making a decision: (A) Based on the principle of as few Full GC as possible, let the old generation cache common objects as much as possible. The default ratio of JVM is 1:2. (B) By observing the application for a period of time to see how much memory the old generation will occupy at the peak, increase the young generation according to the actual situation without affecting the Full GC. For example, the ratio can be controlled at 1:1. However, at least 1/3 of the growth space should be reserved for the old generation
  (4) On machines with better configuration (such as multi-core, large memory), a parallel collection algorithm can be selected for the old generation: -XX:+UseParallelOldGC , the default Collect (5) Thread stack settings for Serial
  : Each thread will open a 1M stack by default, which is used to store stack frames, call parameters, local variables, etc. For most applications, this default value is too much, generally 256K is enough use. In theory, in the case of constant memory, reducing the stack of each thread can generate more threads, but this is actually limited by the operating system.
  (4) You can use the following parameters to print Heap Dump information
-XX:HeapDumpPath
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Xloggc:/usr/aaa/dump/heap_trace.txt
    The following parameters can control the printing heap information when OutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
Please take a look at the Java parameter configuration for a time: (Server: Linux 64Bit, 8Core×16G)

JAVA_OPTS="$JAVA_OPTS -server -Xms3G -Xmx3G -Xss256k -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseParallelOldGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/aaa/dump -XX:+ PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/usr/aaa/dump/heap_trace.txt -XX:NewSize=1G -XX:MaxNewSize=1G"
After observation, the configuration is very stable, the time of each ordinary GC is about 10ms, Full GC basically does not occur, or only occurs once every long, long time.
By analyzing the dump file, it can be found that Full GC occurs every 1 hour. After multiple verifications, as long as the JMX service is enabled in the JVM, JMX will Perform a Full GC every hour to clear references, please refer to the attached documentation for this.
4. Program algorithm tuning: not the focus of this time

Reference :
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
Source: http://blog.csdn.net/chen77716/ article/details/5695893

================================================= ===========================================
Tuning Method
Everything is for this Step one, tuning, before tuning, we need to remember the following principles:

1. Most Java applications do not need GC optimization on the server;
2. Most Java applications that cause GC problems are not because of wrong parameter settings, but code problems;
3. Before the application goes online, consider the JVM parameters are set to the optimal (most suitable);
4. Reduce the number of objects created;
5. Reduce the use of global variables and large objects;
6. GC optimization is a last resort;
7. In actual use, analyze GC situation optimization code is much more than optimizing GC parameters;

GC optimization has two purposes (http://www.360doc.com/content/13/0305/10/15643_269388816.shtml):
1. Will transfer to the old The number of objects in the era is reduced to a minimum;
2. Reduce the execution time of full GC;

in order to achieve the above purpose, generally, you need to do the following:
1. Reduce the use of global variables and large objects;
2. Adjust the size of the new generation
3. Set the size of the old age to be the most appropriate; 4.
Select the appropriate GC collector;

in the above 4 methods, several "suitable" are used, so what is suitable, generally, Please refer to the suggestions in the "Collector Pairings" and "Starting Memory Allocations" sections above. However, these suggestions are not omnipotent, and they need to be developed and changed according to your machine and application. In actual operation, you can set the two machines to different GC parameters, and compare them. Choose those that really improve performance or reduce GC parameters. Parameters for GC time.

The really proficient use of GC tuning is based on the actual combat experience of multiple GC monitoring and tuning. The general steps for monitoring and tuning are:
1. Monitor the status of the GC
Use various JVM tools to view the current log, analyze the current JVM parameter settings, and analyze the current heap memory snapshot and gc log. According to the actual memory division of each area and the GC execution time, think whether to optimize;

2. Analyze the results to determine whether Need to be optimized
If the parameters are set reasonably, the system does not have a timeout log, the GC frequency is not high, and the GC time is not high, then there is no need to perform GC optimization; if the GC time exceeds 1-3 seconds, or if the GC is frequent, it must be optimized;
Note: If the following indicators are met, GC is generally not required:
   Minor GC execution time is less than 50ms;
   Minor GC execution is infrequent, about once every 10 seconds;
   Full GC execution time is less than 1s;
   Full GC execution frequency is not frequent, Not less than once every 10 minutes;

3. Adjust the GC type and memory allocation.
If the memory allocation is too large or too small, or the GC collector used is relatively slow, you should first adjust these parameters and find one or more machines first. Beta, then compare the performance of the optimized machine and the unoptimized machine, and make the final choice in a targeted manner;
4. Continuous analysis and adjustment
Through continuous trial and error, analyze and find the most suitable parameters
5. Apply parameters comprehensively
If the most suitable parameters are found, apply these parameters to all servers and follow up.


Tuning example The
above content is just talk on paper, let's illustrate with some real examples:
Example 1: The
author found that some development and testing machines had an exception yesterday: java.lang.OutOfMemoryError: GC overhead limit exceeded, this exception represents:
GC spends too much time in order to release a small space. There are generally two reasons: 1. The heap is too small; 2. There are infinite loops or large objects;
the author first ruled out the second reason, because this application also It is running online, if there is a problem, it will hang up long ago. Therefore, it is suspected that the heap setting in this machine is too small;
use ps -ef |grep "java" to check and find that:


the heap area setting of the application is only 768m, while the machine memory has 2g, and only this java application is run on the machine. Other places that need to occupy memory. In addition, this application is relatively large and needs to occupy a lot of memory;
the author judged from the above situation that it is only necessary to change the size settings of each area in the heap, so it is changed to the following situation:


Tracking the operation situation, it is found that there are no related exceptions. Reappears;

Example 2: (http://www.360doc.com/content/13/0305/10/15643_269388816.shtml)
A service system often freezes. After analyzing the reasons, it is found that the Full GC time is too long:
jstat - gcutil:
S0 S1 E O P YGC YGCT FGC FGCT GCT
12.16 0.00 5.18 63.78 20.32 54 2.047 5 6.946 8.993 After
analyzing the above data, it was found that the Young GC was executed 54 times, taking 2.047 seconds, and each Young GC took 37ms, which is in the normal range. The Full GC was executed 5 times, and it took 6.946 seconds, with an average of 1.389s each time. The problem shown in the data is: the Full GC takes a long time. The analysis of the system refers to the discovery that NewRatio=9, that is to say, the freshman The ratio of generation to old generation size is 1:9, which is the cause of the problem:
1. The young generation is too small, which causes objects to enter the old generation in advance, triggering Full GC in the old generation;
2. When the old generation is large, it takes a long time to perform a Full GC;
the optimization method is to adjust the value of NewRatio to 4, It was found that the Full GC did not happen again, only the Young GC was executing. This is to control the object to be cleaned up in the new generation without entering the old generation (this approach is useful for some applications, but it is not necessary for all applications)

Example 3:
Once the application is in the performance test process, It is found that the memory usage rate is high and Full GC is frequent. Use sudo -u admin -H jmap -dump:format=b,file=filename.hprof pid to dump the memory, generate a dump file, and use the mat gap under Eclipse for analysis , found:


As can be seen from the figure, there is a problem with this thread. A large number of objects referenced by the queue LinkedBlockingQueue have not been released, resulting in the entire thread occupying up to 378m of memory. At this time, the developer is notified to optimize the code and release the related objects. .

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327055206&siteId=291194637