About GC performance tuning principle and practice, to see which one is enough!

Foreword

This article describes the basic principles and theories GC, GC tuning method ideas and methods, based on Hotspot jdk1.8, after learning you will learn how to GC production system problems arise for investigation resolved.

text

Contents of this paper are as follows:

  • GC basic principle involves tuning goals, GC event classification, JVM memory allocation strategy, GC log analysis, etc.
  • CMS principle and tuning.
  • G1 principle and Tuning
  • GC Troubleshooting and Solutions

1. GC Fundamentals

1.1. GC tuning goals

By GC tune the Java program in most cases, focusing on two goals:

  • Response speed (Responsiveness) how rapidly the response speed of a program or system in response to a request:

    For example, the order the user query response time, high system response speed requirements, the larger dwell time is unacceptable. Tuning focus is to respond quickly in a short period of time.

  • Throughput (Throughput) : maximum workload throughput attention in a particular period of application system

    For example, the number of tasks per hour to complete a batch system, optimized in terms of system throughput, longer GC pause time is also acceptable, because of the high throughput applications are more concerned about how to complete the task as quickly as possible, not consider rapid response to user requests

Tuning in the GC, GC pause time impact caused by the application system response, GC processing threads CPU utilization affect system throughput.

1.2. GC generational collection algorithm

Modern garbage collectors are basically using generational collection algorithm, the main idea: will be divided into two on the Java heap memory logic: the new generation, old time, for different survival periods, objects of different sizes to take a different garbage collection policy .

1.2.1. New Generation (Young Generation)

Also known as the new generation of the young generation, most objects are created in the new generation, the life cycle of many objects is very short. After each new generation of garbage collection (also known as Young GC, Minor GC, YGC) only a few objects to survive, so the use of replication algorithm, only a small amount of copy operation costs to complete the recovery.

** the new generation is divided into three areas: ** a Eden area, two Survivor areas (S0, S1, also known From Survivor, To Survivor), most of the objects generated in the Eden area.

When the Eden area is full, also live objects are copied to the two Survivor areas (one of); Survivor When this area is full, the survival of this district and not objects promoted to the old year to meet the conditions will be copied to another a Survivor areas. The object each time it is copied, age plus 1, after reaching age threshold was promoted and transferred to the old era.

1.2.2. Years old (Old Generation)

Experience in the new generation of object N times the garbage after the recovery is still alive, will be placed in the old era, the area target high survival rate. Old's garbage collection typically use "mark - finishing" algorithm.

1.3. GC event classification

Depending on the area of ​​recycling garbage collection, garbage collection is divided into:

  • Young GC
  • Old GC
  • Full GC
  • Mixed GC

1.3.1. Young GC

The new generation of memory garbage collection event called Young GC (also known as Minor GC), when the JVM can not allocate space in the new generation of memory for the new object will always trigger Young GC. Such as when filled Eden zone, the higher the frequency of the new object allocation, Young GC higher the frequency.

Young GC across the board will cause each stop (Stop-The-World), to suspend all application threads, pause pause time is relatively old's GC caused almost negligible.

1.3.2. Old GC/Full GC/Mixed GC

GC Old : only clean up the old GC's event space, only the CMS concurrent collection is the model.

GC Full : clean up the whole heap of GC events, including the new generation, old time, Yuan space.

GC Mixed : GC clean up the whole new generation as well as some of the old era, only G1 have this mode.

1.4. GC log analysis

GC log is an important tool that accurately records the results of the execution time and the execution of every GC, GC analysis log can tune stack disposed and GC settings, or improved object allocation mode application.

Open the JVM startup parameters are as follows:

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps  -XX:+PrintGCTimeStamps
复制代码

Common Young GC, Full GC logs the following meanings:

  • Young GC

  • Full GC

Free GC logs graphical analysis tool recommended the following two:

  • GCViewer : Download jar package directly run
  • gceasy : Web tools, using online upload GC logs

1.5 Memory Allocation Strategy

Java provides automatic memory management can be attributed to solve the object of memory allocation and recovery issues. As already described garbage collection, here are a few of the most common memory allocation strategy:

1.5.1. Objects priority allocation in Eden District

In most cases, the new generation of the previous target area allocated Eden. When the Eden area is not enough space allocated, the virtual machine will launch a Young GC.

1.5.2. Large objects directly into the old year

JVM provides an object size threshold parameter (-XX: PretenureSizeThreshold, the default value is 0, no matter what are the first representatives of allocated memory in Eden).

Objects greater than the threshold value of the parameter settings directly in the allocation of years old, to avoid large memory copy objects occur directly in Eden and two Survivor.

1.5.3. Long-term survival of the object enters years old

Each object is undergoing a garbage collector, and has not been recovered off its age increases 1, greater than the age threshold parameter (-XX: MaxTenuringThreshold, default 15) of the object, will be promoted to the old era.

1.5.4. Space allocation guarantees

Before when Young GC, JVM need to estimate: whether old years after Young GC can accommodate the new generation of live objects promoted to the old age, to determine whether to trigger early recovery of the old GC's space, space allocation is calculated based on the security policy.

After Young GC If successful (after Young GC promotion objects can be placed in the old years), represents guarantee success, do not have to be Full GC, improve performance.

If it fails, "promotion failed" error occurs, a failure on behalf of the security, the need for Full GC.

1.5.5. Dynamic Age Determination

The age of the new generation of the object may not reach the threshold (MaxTenuringThreshold parameter specifies) was promoted years old.

If after the sum of Young GC, the new generation of live objects to achieve the same age of all objects larger than the size of any Survivor space (space S0 + S1) of the half case or the S0-S1 region receiving the forthcoming new generation of the object can not survive.

Target age greater than or equal to the age you can go directly years old, no need to wait until the age MaxTenuringThreshold in requirements.

In addition, if Young GC S0 or S1 area is insufficient to accommodate: the new generation does not meet the conditions for promotion years old live objects, will lead to the survival of these objects directly into the old era, need to be avoided.

2. CMS principle and Tuning

2.1 Terminology

2.1.1. Reachability analysis algorithm

Used to determine whether an object has survived, the basic idea is through a series of objects called "GC Root" as a starting point (common GC Root systematic class loader, the stack of objects in the active state of the thread, etc.), based on the object reference relationship search from GC Roots down, called a reference path traversed chain, when an object is not connected to any reference GC Root chain proved object is no longer viable.

2.1.2. Stop The World

GC analysis process object reference relationship, in order to ensure the accuracy of analysis results, need to stop all Java threads of execution, to ensure that the relationship is no longer referenced dynamic changes, the pause event called Stop The World (STW).

2.1.3. Safepoint

During code execution in some special position, when the thread execution to these locations indicate the current state of the virtual machine is safe, if there is a need GC, threads can be suspended in this position.

HotSpot using active interrupt, which allows the execution threads at runtime whether to suspend polling flag, if you need to interrupt pending.

2.2. CMS Introduction to Algorithms

CMS (Concurrent Mark and Sweep concurrent - mark - sweep), is based on a concurrent, garbage collection algorithm using a marking algorithm, only the garbage collector for the old era.

When CMS collector work, as far as possible GC threads and user threads concurrently, in order to achieve the objective of reducing STW time.

By following command line parameters, enable CMS garbage collector:

-XX:+UseConcMarkSweepGC
复制代码

It is worth to add that, here it refers to the CMS GC GC old age, and Full GC refers to the entire stack of GC events, including the new generation, old time, Yuan space, both differentiated.

2.3. The new generation garbage collection

CMS can be used with the new generation garbage collection has Serial collector and ParNew collector.

This two collectors are used markers replication algorithm, STW will trigger an event, stop all application threads. The difference is that, Serial single-threaded execution, ParNew is multi-threaded execution.

2.4 years old garbage collection

CMS GC to obtain a minimum dwell time for the purpose of minimizing the STW time, can be divided into seven phases:

2.4.1. Initial labels (Initial Mark)

The initial phase of the target marker that marks the old era all surviving objects, including a direct reference to GC Root, as well as live objects in the object referenced by the new generation, the trigger event for the first time STW.

This process is a multi-threaded (before JDK7 single-threaded, parallel after JDK8, parameters can be adjusted by CMSParallelInitialMarkEnabled).

2.4.2. Concurrent mark (Concurrent Mark)

GC threads and concurrent mark phase applications concurrently executing threads, traversing the initial Phase 1 marked the survival of the object, and then recursively mark objects those objects reachable.

2.4.3. Concurrent precleaning (Concurrent Preclean)

Concurrent precleaning stage and GC-threaded applications concurrent execution threads is because Phase 2 is applied concurrently executing threads, there may be some reference relationship has changed.

By marking the card (Card Marking), ahead of the year old logical space divided into regions (Card) of equal size.

If the reference relations area change, JVM will change is marked as "dirty zone" (Dirty Card), then at this stage, these are the dirty area to find out, refresh reference relationship, clear the "dirty zone" mark.

2.4.4. Precleaning concurrent cancelable (Concurrent Abortable Preclean)

Concurrent to cancel the pre-cleaning phase does not stop the application thread. Prior to this stage trying to do some work as much as possible to reduce the application pause times in the final stage marks the STW (Final Remark).

Continuous cycle of treatment at this stage: old age mark up the object, the object Dirty Card Scanning processing area, the termination condition for the loop are:

  • Achieve cycle times
  • Cycle execution time reaches a threshold
  • The new generation of memory usage threshold is reached

2.4.5. The final mark (Final Remark)

This is the second time the event GC (and last) STW phase, the goal is to complete all the old era marked surviving objects, this stage will perform:

  • Traversing the new generation of objects, re-mark
  • According to GC Roots, relabeled
  • Traversal old age Dirty Card, relabeled

2.4.6. Concurrent Clear (Concurrent Sweep)

Concurrent with the cleanup phase applications concurrently, without STW pause, remove garbage objects based on the labeling results.

2.4.7. Concurrent reset (Concurrent Reset)

Concurrent reset phase executed concurrently with the application, the internal reset algorithm CMS data relevant for the next GC cycle to prepare.

2.5. CMS FAQ

2.5.1. The final stage marks a pause for too long

CMS GC pause times of about 80% of the final mark phase (Final Remark), if the stage pause too long, the new generation is a common cause of invalid references old age, in the previous phase to cancel the concurrent precleaning stage within a threshold time execution cycle is not completed too late to trigger Young GC, clean up these invalid references.

By adding parameters: -XX: + CMSScavengeBeforeRemark.

Before performing final operations to trigger Young GC, thereby reducing standstill as references to the new generation of old age, reduce the final phase of the mark.

But if triggered Young GC on stage (pre-cleanup concurrent cancelable), will be re-triggering Young GC.

2.5.2. Concurrent mode failure & promotion failure

Concurrent mode failure : When the CMS in the implementation of the recovery, Cenozoic garbage collection, while years old and there is not enough space to accommodate objects of promotion, CMS garbage collection will degenerate into a Full GC single-threaded. All application threads will be suspended, the old era of all invalid objects are recovered.

Promotion failure : When the Full GC Cenozoic garbage collection, there's old enough space to accommodate objects of promotion, but due to the fragmented free space, resulting in promotion fails, then triggers a single thread and with compression action.

Concurrent mode failure will result in failure and promoted a long pause, a common solution ideas are as follows:

  • Lowering the threshold to trigger the CMS GC

    That argument -XX: CMSInitiatingOccupancyFraction value, so that CMS GC executed as soon as possible to ensure that there is enough space

  • CMS increased the number of threads that the parameters -XX: ConcGCThreads

  • Increasing the space years old

  • Try to let the objects recovered during the Cenozoic, avoid entering years old

2.5.3. Memory fragmentation issues

Usually the CMS GC process is based on clear labeling algorithm, with no compression action, leading to more and more memory fragmentation needs to be compressed.

The following common scenarios can trigger memory fragmentation compression:

  • Young GC Cenozoic Cenozoic promotion guarantee failure occurs (promotion failed))
  • Active program execution System.gc ()

CMSFullGCsBeforeCompaction the value of the parameter by setting the number of times to trigger a Full GC compression.

The default value is 0, representing each entry Full GC will trigger compression, with compression algorithms for motion Serial Old single-threaded algorithm mentioned above, the pause time (STW) is very long, it is necessary to reduce the compression time as possible.

3. G1 principle and Tuning

3.1. G1 Introduction

G1 (Garbage-First) is a server-oriented garbage collector, and support the new generation's old space garbage collection, mainly for multi-core processors and is equipped with large-capacity memory machines.

G1 main design goals are: to achieve predictable and configurable STW pause time.

3.2. G1 heap space division

3.2.1. Region

To achieve recovery low pause time of large memory space will be divided into a plurality of equal size Region. Each area is likely to be small heap Eden District, Survivor or Old District area, but at the same time can only belong to a generation of.

Logically, all of the Eden area and Survivor areas together is the new generation, all of the Old district together is the old era and the new generation and the old Region's own memory area by the G1 automatic control, constantly changing.

3.2.2 giant objects

When more than half the size of the object Region, it is considered to be a giant objects (Humongous Object), they are assigned directly to the giant target area (Humongous Regions) years old.

These huge area is a contiguous set of regions, in each Region up to a huge objects, the object may comprise a plurality of giant Region.

G1 heap memory is divided into a number Region of significance:

  • Every GC does not have to have to deal with a whole heap space, but handle only part of Region, large capacity memory GC.
  • By calculating the recovery value of each Region, including the time required to recover recyclable space, for a limited time as possible to recover more garbage objects, the garbage collection pause times caused by control within the expected time frame configuration, which is the origin of the name G1: Garbage-First.

3.3. G1 mode of operation

For the new generation and the old years, G1 offers two kinds of GC mode, Young GC and Mixed GC, two choices that lead to Stop The World.

3.3.1. Young GC

When the new generation of lack of space, G1 trigger Young GC reclaim the new generation of space.

Young GC Eden area were mainly for GC, which is triggered when Eden runs out of space, based on generational and ideological recovery replication algorithm, each new generation of Young GC Region are all selected.

Young also calculated the next space required for GC Eden area and the area of ​​Survivor, the new generation of dynamic adjustment of the number of occupied Region to control Young GC overhead.

3.3.2. Mixed GC

Years old when the space reaches a threshold trigger Mixed GC, all the new generation in the Region selected, according to the global concurrent mark phase (described below to) collect statistics obtained high returns several years old Region.

Within a user-specified target range overhead, choose the highest possible earnings old's Region were GC, to control spending through which old Mixed GC's Region selection and choose how much Region.

3.4. Global concurrent mark

Global concurrent mark is to identify the main higher earnings recovery area Mixed GC Region computing, specifically divided into five stages:

3.4.1. Initial labels (Initial Mark)

Suspend all application threads (STW), concurrently object-tag from the GC Root begin directly reachable (the native stack objects, global objects, JNI objects).

When the trigger condition is reached, Gl will not initiate concurrent mark period immediately, but wait for the next new generation of collection, using a new generation STW collection period, the initial completion flag, which is called route through (Piggybacking).

3.4.2. Root zone scanning (Root Region Scan)

After the initial mark pause, the new generation also complete collection of objects copied to the Survivor's work, application threads began to perk up.

At this time, in order to ensure the correctness of the labeling algorithm, all the newly copied to the target partition Survivor, there is a reference to the need to find out what the object's old objects, these objects marked as root (Root).

This process is called scan root partition (Root Region Scanning), while scanning Suvivor partition also referred root partition (Root Region).

The root partition must scan the next new generation of complete (concurrent mark in the next process may be interrupted several times the new generation garbage collection) before garbage collection start, because every GC will produce a new set of live objects.

3.4.3. Concurrent mark (Concurrent Marking)

Mark application threads and threads execute in parallel, marking the survival of the object information of each of the heap Region, this step may be interrupted by a new Young GC.

All marked tasks must be completed before scanning filled, if concurrent mark takes a long time, then it is possible in concurrent mark process, but also experienced several young generation collections.

3.4.4. Again mark (Remark)

CMS and similar suspension of all application threads (STW), to mark the completion of the process of application threads stopped briefly, mark the object changes in concurrent mark phase, and all live objects unmarked, while the complete survival data calculation.

3.4.5. Cleanup (Cleanup)

In preparation for the upcoming transfer phase, this phase also for the next mark performs all necessary calculations finishing:

  • Region finishing update each respective RSet (Remember Set, HashMap structure, the old record which points to the object's this Region, key references to point to this Region object, value to point to a specific Card Region of this region, can be determined by RSet Region survival information objects, avoid scanning whole heap)
  • Recycling Region does not contain the live objects
  • Statistical calculations high recovery income (based on free space and target pause) years old partition set

3.5. G1 tuning Precautions

3.5.1. Full GC problem

Normal processing flow G1's no Full GC, only the garbage collection process, but to (or active trigger) will appear when, Serial old gc G1 of Full GC is a single thread of execution will result in very long STW, is tuning the focus needs to be avoided Full GC.

Common reasons are as follows:

  • Active program execution System.gc ()
  • Global during concurrent mark's old space is filled (concurrent mode failure)
  • Mixed During the GC's old space is filled (promotion failure)
  • Young GC when Survivor's old space and there is not enough space to accommodate the live objects

Similarly CMS, common solution is:

  • Increasing -XX: ConcGCThreads = n option marked increase in the number of concurrent threads, or the number of parallel threads during STW: -XX: ParallelGCThreads = n.
  • Reduce -XX: InitiatingHeapOccupancyPercent early start marking period.
  • Increasing memory reserved -XX: G1ReservePercent = n, the default value is 10, representing 10% of the memory reserved for the heap, when the area is not enough space Survivor tries to use the memory when new reservation object is promoted.

3.5.2. A giant object allocation

Giant object area each Region contains a huge target, the remaining space is no longer utilized, resulting in fragmentation of space, when there is no suitable space allocated G1 huge objects, G1 serial Full GC starts to free up space.

By increasing -XX: G1HeapRegionSize Region size is increased, so that a considerable part of the object is no longer giant huge objects, but rather the ordinary distribution.

3.5.3. Do not set the size of the Young district

The reason is to try to meet the target pause time, Young logical zone will be dynamically adjusted. If you set the size, it will be overwritten and lost control of the disabled pause time.

3.5.4. The average response time is provided

Average response time using the application as a reference to set MaxGCPauseMillis, JVM will try to meet this condition, a request may be 90% or more of response time in this, but does not represent all the requests can be met, the average response time is too small will lead to frequent GC.

4. The tuning method and ideas

How to analyze system JVM GC operating conditions and rational optimization?

GC optimized core idea is that, as far as possible object allocation and recovery in the new generation, try to avoid too many objects into the old era, leading to frequent garbage collection years old, while giving the system enough memory to reduce the number of new generation garbage collection, system analysis and optimization is also revolve around this idea.

4.1. Health Analysis System

Analysis of the health system:

  • System requests per second, how many objects are created for each request, how much memory footprint.
  • Young GC trigger frequency, rate the object into the old era.
  • Old's memory footprint, Full GC trigger frequency, reasons Full GC triggered long Full reasons for the GC.

Common tools are as follows:

4.1.1. jstat

jstat is JVM comes with command-line tool that can be used for statistical memory allocation rate, GC number, GC-consuming. Common command format is as follows:

jstat -gc <pid> <统计间隔时间>  <统计次数>
复制代码

Output return value represents the following meanings:

For example: jstat -gc 32683 1000 10, Statistical Process pid = 32683, the statistics once a second, statistically 10 times.

4.1.2. jmap

jmap also comes with JVM command-line tool that can be used to understand the object distributed runtime system. Common command format is as follows:

// 命令行输出类名、类数量数量,类占用内存大小,
// 按照类占用内存大小降序排列
jmap -histo <pid>

// 生成堆内存转储快照,在当前目录下导出dump.hrpof的二进制文件,
// 可以用eclipse的MAT图形化工具分析
jmap -dump:live,format=b,file=dump.hprof <pid>
复制代码

4.1.3. jinfo

To view extended parameter Java applications that are running, including Java System properties and JVM command line parameters. Command format is as follows:

jinfo <pid> 
复制代码

4.1.4. Other GC Tool

  • Monitoring warning system: Zabbix, Prometheus, Open-Falcon
  • jdk automatic real-time memory monitoring tool: VisualVM
  • External monitor heap memory: Java VisualVM plugins installation Buffer Pools, google perf tools, Java NMT (Native Memory Tracking) tool
  • GC log analysis: GCViewer, gceasy
  • GC parameter checking and optimization: xxfox.perfma.com/

4.2. GC optimization case

4.2.1. Data analysis platform system frequently Full GC

The main platform for users to statistical timing analysis in behavioral App, and supports export a report, using the CMS GC algorithm.

Data Analyst found that the use of the system to open the page often Caton, by jstat command found about 10% of the surviving objects into the old system each year after Young GC.

The original Survivor areas because space is set too small, live objects after each Young GC Survivor does not fit in the area, to advance into the old era.

By Survivor transfer large area, so that the area can accommodate Survivor after Young GC survival object, the object has experienced many times in Survivor Young GC area reaches age threshold before entering the old era.

After adjustment after every GC Young's live objects into the old stable operation when only a few hundred Kb, Full GC frequency is greatly reduced.

4.2.2. Business Gateway docking OOM

Gateway major consumer Kafka data, calculate and then forwarded to the queue after another Kafka, the system runs several child appears OOM, reboot the system several hours and OOM for data processing.

By jmap export heap memory, in eclipse MAT tools to analyze it to find out why: topic data in the code for a business Kafka are asynchronous log print, the traffic data is large, the accumulation of a large number of objects in memory waiting to be printed, resulting OOM .

4.2.3. The authentication system frequent long Full GC

Account authentication systems offer all kinds of services, service discovery system often unavailable for a long time Full GC occurs by Zabbix monitoring platform monitoring system found that frequent use, and trigger heap years old usually do not fill up, we found out is a business code calls System.gc ().

summary

GC problems can be said that there is no shortcut, performance troubleshooting line itself is not simple, in addition to the introduction to the principles and tools of this article mastery, we also need to continue to accumulate experience, truly optimal performance.

Limited space, no longer expand the use of GC describes common parameters can be cloned from GitHub:

https://github.com/caison/caison-blog-demo
复制代码

reference

  • 《Java Performance: The Definitive Guide》 Scott Oaks
  • "In-depth understanding of the Java Virtual Machine: JVM advanced features and best practices (Second Edition" Zhou Zhihua
  • Java performance tuning combat
  • Getting Started with the G1 Garbage Collector
  • GC reference manual version -Java
  • Consult principle G1 algorithm --RednaxelaFX answer
  • Java Hotspot G1 GC of some key technologies - the US group technical team

Reprint: Chen Caihua (caison), Akulaku core technology development engineers, like distributed systems, online troubleshooting, schema design

About Public No.

This account continued to share back-end technology of dry cargo, including virtual machine-based, multi-threaded programming, high-performance framework, asynchronous, caching, and messaging middleware, distributed and micro services architecture and advanced learning and other learning materials and articles.

Guess you like

Origin juejin.im/post/5d8c5a5de51d4578323d51bd