[Grocery store] Java JVM # Java walls of the GC and memory allocation strategy

There is a wall between Java and C ++ by the dynamic memory allocation and garbage collection surrounded by the "walls" of people want to go outside the wall, the wall of people want to come out. - "in-depth understanding of the Java Virtual Machine"

Foreword

Read half of the last chapter of the walls, then look the other half --GC.

Why GC and memory allocation strategy? When you need to troubleshoot a variety of memory overflow, memory leaks, garbage collection when the system becomes to achieve higher concurrency bottleneck, we need to implement the necessary control and regulation of these "automation" technology.

The program counter, the virtual machine stack, along with the threads native method stacks life cycle, so more need to consider the Java heap and garbage collection methods zone. We only know what objects will be created when the program is in operation during this part of the memory allocation and recovery are dynamic.

Objects dead yet?

How to determine an object is not used, the memory block can be recovered out GC. There are two main methods.

Reference counting algorithm

Each object has a counter that is, if there is a place where there is reference to the object, the counter is incremented, otherwise it is decremented by one, to know when the counter value of 0 it means that this object is not used, can be recovered. However, the mainstream of the Java virtual machine do not use this method, because the problem can not be solved circular references. For example there are objects A, B refers to an object, while the object B and the object references A, while the counter 1 are two objects, two objects but logically has no use, and which occupy memory space.

Reachability analysis algorithm

Mainstream virtual machine uses this algorithm is to determine whether the object is alive (or used). The basic idea is that through a series of algorithms are called "GC Roots" objects as a starting point, to start the search down from these nodes, through which the search path is called a reference chain ( search is a reference, not the object itself ). When an object is no time to GC Roots reference chain is connected, it is considered unusable. E.g. FIG big brother very classic book, Object5, Object6, Object7 all objects can be recycled.

As an object GC Roots includes about several:

  1. Virtual Machine stack (Local Variable Table local stack frame) in the object reference.
  2. Approach to object class static property references.
  3. Object literal reference methods zone.
  4. Native method stacks object JNL (Native Method) references.

Reference types

Java quoted definition is very traditional: If the reference type of data stored value represents is another piece of memory starting address, this memory is said to represent a reference. But some references cited in line with the definition, but this reference points to the object may have been unusable. So the interpretation of the traditional definition is enhanced:When the memory space is also enough, it can be saved in memory; if the memory space during garbage collection is still very tight, you can discard these objects, many of the system's caching features are in line with this definition.

Therefore, reference will be divided into four types.

  1. Strong references: The most common references is a new object up GC Roots. Long and strong references, GC will never reclaim the space.
  2. Soft Quote: soft references used to describe some, but not as well as with objects necessary. Soft references before memory overflow exception, these objects will be listed as a second within the scope of recovery recovery. If the recovery has not enough space, memory overflow exception will be thrown.
  3. Weak References: A weak reference is used to describe non-essential objects, but weaker than soft reference object associated with weak references only to survive until the next garbage collection occurs regardless of whether enough memory, will be recovered.
  4. False quote: Quote dead letter. Whether there is a virtual object is referenced, it will not hurt its lifetime, can not be achieved by a phantom reference object instance. The sole purpose of setting a virtual reference associated with the object is able to receive a notification when the object system is recovered collector.

finalize () action

Accessibility to the object to be detected unreachable, it was not immediately recovered memory, need to go through at least two markers. The first marker and a screening with the proviso that if the override finalize () method, and if not, this object has been performed or finalize () method (An object can only be performed once finalize () method), The virtual machines it as "not necessary to perform."

If this object overrides the finalize () method, and is not performed, the object will be placed in a queue F-Queue, and to execute it in accordance with the low priority thread Finalizer.Since the Finalizer thread priority is very low, so it is necessary to perform thread sleep while waiting for its execution.Finalizer thread of execution does not necessarily have to wait until they have finished it performs garbage collection, after all, there is the task execution can be very time consuming.

In rewriting the finalize () method, this object once (only once, after all, finalize () method can only be executed once) the opportunity to save themselves, then you can be your own (use this keyword) re-reference object on the chain associate accessibility up just fine.

But finalize () method there is little chance on demand business, after all, it features try-finally can be done, after all, this is more of a real-time method is for you, and better control.

Recovery process area

This part is not the point, after all, now popular JDK1.8 has no method area, and garbage collection efficiency of this space is extremely low. Only you need to know this space, as long as the two parts are recovered, discarding useless classes and constants like.

Abandoned constant good understanding, they say, a string "abc", no longer be cited, according to reachability algorithm that good judgment. For useless class to determine the conditions required to meet the following three:

  1. All instances of the class have been recovered, which is the Java heap any instance of the class does not exist.
  2. Class loader to load the class ClassLoader has been recovered.
  3. java.lang.Class object for the object is not referenced in any one place, it can not be accessed by means of reflection class anywhere.

Garbage collection algorithm

Mark - sweep algorithm

The most basic algorithm is the "mark - sweep" (Mark-Sweep) algorithm, the algorithm is divided into "mark" and "clear" two stages: first mark all objects need to be recovered, all unified object is marked recovery after completion mark . Clear labeling algorithm has two shortcomings: the first is efficiency, two-stage efficiency is not high. The second issue is the issue of space, clearly marked will produce a lot of debris, so that the physical space is not continuous, leading to a larger target allocation of space, it is very easy to trigger a garbage collection mechanism.

Replication algorithm

Copy algorithm space into two parts, with one part only once, when this part of the space is used up, simply copy the objects that survive up to another part, then this part of the used space-time clean out. So it is every general and retrograde GC action is only space. This eliminates the need to consider the problem of defragmentation, as long as the moving top of the stack pointer in order to allocate memory on the line.

Now commercial virtual machine basically recovered using this algorithm new generation of data. When the GC time, the new generation is divided into two portions, Eden space , and Survivor , than the average size of these two parts is 8: 1, the old year space is a Survivor, so the new generation: Year Old = 9: 1. When a GC operation viable target than the new generation of Survivor, you need years old assignment guarantee to supplement the shortage of space.

Mark - Collation Algorithm

"Mark - finishing" Firstly, the unreachable object is marked, then moved to the end of the survival of the object, and then clear the memory directly outside the off-side boundary. Physically this is a continuous space of.

Generational collection algorithm

Generational collection algorithm, refers to a different space to select a different recovery mechanisms according to their actual situation. In general the use of the new generation of replication algorithm. Mark's old general use - Collation Algorithm.

HotSpot algorithm

Enumeration root

Since the JVM memory management is very big, the space occupied by the object reference may be small and very fragmented, in a GC avoid consuming too much time, so they need a kind way to quickly get to the object reference. HotSpot implemented inside the virtual machine, there is a data structure called OopMap to store these object references for rapid positioning. In a method of execution time, bytecode level encounter a OopMap record, it records what the position of the byte code for this method of operation by the offset has a reference, so that you can find references.

Security point

Although OopMap can quickly find all the references, but it is impossible for each instruction are added OopMap record, after all, the memory consumption is very large. Only in certain places will be added OopMap record, these places are called safe point . Select the security point of need to meet, "whether to allow the implementation of the program for a long time," the feature. "Long-running" The most obvious feature is multiplexed sequence of instructions. For example, method calls, loops jump, jump on abnormal functions. There is also need to pay attention to a problem, a thread is when reached a safe point, and to be started GC, the need to let the entire program will stop, prevent new garbage during the GC, so that this garbage collection is not complete. So it is necessary to make all of the threads to secure points, then unified garbage collection. Here again two mechanisms, preemptive interrupt and active interrupt .

Preemptive interrupts: When GC occurs, first of all thread interrupts, if found some threads are not safe, let them revive, and then re-ran the security point of interruption, then garbage collection.

Active interrupt: do not operate directly on the thread, just simply set a flag, each thread access to polling this flag, when a thread execution to go to a safe point polling and found marked interrupt status, they will hang themselves when all threads are suspended when you conduct a GC operation.

Security Zone

Once GC, we need to complete the security point, but some thread is no way to wait for it to reach a safe point, for example, sleep (), and so it is unlikely that all threads are finished and then continue to sleep. So in addition to a safe point, but also to introduce the concept of security in the region.Refers to a safe area in a code fragment, a reference relationship does not change again occur, the GC is safe.When a thread executes in a secure area, you can freely GC, when this thread to leave the secure area, you need to see if this time and GC operation, there is no, then you can leave, if there GC operation, you need to wait for the GC after leaving the security area.

The garbage collector

A few simple garbage collector

Serial Collector: The garbage collector thread is working, when it began recycling, all threads interrupts for the new generation.

ParNew Collector: ParNew multithreaded version of Serial. In addition to Serial, ParNew is the only work in conjunction with CMS collector. ParNew (a small number of CPU) Serial outstanding performance no better than in a smaller number of single-threaded or multithreaded, after all, the cost of switching thread is also required. This collection is also used in the new generation.

Parallel Scavenge collector: is used in the new generation, the collector is more concerned with certain i.e. running time occupied by user code and user code proportion of the total garbage collection time. This collector may dynamically adjust parameters to assure proper dwell time and maximum throughput.

Serial Old Collector: single-threaded for collectors of old age.

Parallel Old Collector: multithreaded years old collector.

CMS collector

CMS is a way to get the shortest recovery time objectives collector, a large part of the current focus on the Java application server, or Internet station B / S system, with particular emphasis on such applications in response to the service, hoping a shorter dwell time.

CMS requires four steps: initial mark, concurrent mark and re-mark, concurrent cleared. Wherein the initial marking and re-marking is required for all threads are terminated. Concurrent mark allows users to work threads run simultaneously, the new garbage that may arise, to re-mark is to solve this problem.

CMS has three significant drawbacks:

  1. CMS collector is very sensitive to CPU resources, CPU time when a small number of poor performance.
  2. CMS threshold is low, due to the need to leave part of the space concurrently, it is not 100% you need to open the GC. Now the maximum footprint of 92%.
  3. The use of the "mark - sweep" function, it will generate a lot of debris.

G1 collector

G1 garbage collector is a collector for server applications. G1 collector can act on the new generation and the old era. And has a very good concurrent parallelism, you can organize the space, there is a very good feature is predictable pause time, it allows the user to specify a fixed time M milliseconds, the time occupied by the garbage collector can not exceed N ms .

G1 collection lets Java heap space is divided into a plurality Region (which still has the old and the new generation's) own management, so that you can for garbage collection based on a certain area. Background defenders and a priority list, which is the first phone designated space Region.

Meanwhile, in order to solve different Region communications issues, such as ARegion object reference the object within BRegion, each Remembered Set Region maintains a record of the information.

Memory allocation and recovery strategies

The main object is allocated on the Eden area of ​​the new generation, or allocated on TLAB (exclusive thread), few cases can also be assigned directly in the old era. It depends garbage collector parameters and settings you use. Here are a few common memory allocation rules.

Limited distribution on the object Eden

If you find enough space on Eden, will conduct a new generation GC. Front also said that in addition to Eden, and two Survivor areas, memory accounted for about 8: 1: 1. New Generation: Year Old = 9: 1. Further elderly GC collection time is 10 times the new generation.

Large objects directly into the old year

In general newborn objects in the new generation, over a period of time, after a certain number of new generation GC (default 15), surviving objects again into the old era. But some of the larger objects, such as string or very large arrays directly into old age, and thus avoid a number of new generation GC, back and forth to the copy space this long.

Long-term survival of the object enters years old

After a certain number of new generation GC (MaxTenuringThreshold default 15), surviving objects again into the old era.

Age Determination of dynamic objects

If we Servivor space of the same age (the number of experienced GC) is greater than the total size of all the objects in the space of half Servivor time, age greater than or equal to the value of the object directly into the old era, without waiting MaxTenuringThreshold requirements of age.

Space guarantee mechanism

Space is the guarantee mechanism in the new generation GC time, if Servivor not enough space to put objects from Eden, you can put some data by the Guarantor to the old era.

Prior to the new generation GC, virtual machine zone to check whether the largest contiguous space available is greater than the old and the new generation's total space of all objects, if greater than, the GC is secure. If not, go back to see if the open space guarantee mechanism, if opened will continue to check whether old's maximum available contiguous space larger than the average size of previous years promoted the old object, if more than can be risky to try GC, if not greater than triggered global GC (Full GC).

Why is this adventure? Because the next new generation of extra data is not necessarily old's release, after all, did not the old man's done a guarantee. What extra data it can not be put down, which requires experience to judge, count down from the previous average of the new generation of data over, assuming that the frequency is equal to the probability, and the remaining space to compare years old.

Guess you like

Origin juejin.im/post/5d738b84e51d4561c83e7cc2