JVM and garbage collection data area <JVM-depth understanding of reading notes>

table of Contents

Foreword

Zhou Zhiming teacher's book, "Inside the JAVA virtual machine" (hereinafter referred to as "the book") can be described as advanced java engineer reading the books recently I read the first two parts of the book, that is, the first five chapters, there are a lot of harvest. So you want to write an article. to use their knowledge to understand to sum up the first five chapters.

Although it is a summary, but still strongly recommend you to see the original. Original did not "lead to something more than what I need to sum up," but every bar makes me gain something, but I can not remember all book written, recorded only in accordance with their own ideas, in series. I recommend everyone to read the original again

Automatic memory management

The book mentioned several times:

There is a wall between Java and C ++ dynamic allocation of memory and garbage collection surrounded by a high wall, the wall of people want to go, people want to come out of the wall.

C / C ++ programmers have to allocate memory for each object's absolute control, but this will be very tedious .Java programmer need not allocate memory for processing, there are JVM dynamics, but more difficult to troubleshoot in times of memory leaks.

The memory allocation of the new dynamic object, at the right time and recovering / relieve unwanted objects, which is the automatic memory management of the JVM.

Runtime data area

JVM in the implementation of java code, the system will be assigned to his memory is divided into several areas, to facilitate the management of more classic runtime data area is shown below:

2019-08-08-17-04-29

Program Counter :

The program counter is a relatively small memory space independent threads, which can be seen as an indicator of the current row number of the byte code execution thread.

VM stack

Virtual machine stack is the thread private memory. Each method creates a "stack frame" at the time of execution of the will, which is stored in the local variable table, the operand stack, dynamic linking, for export and other information can be understood as a virtual machine stack storage Some additional information is required, a "stack frame" method runtime stack popping corresponds to a method of execution start and end.

Native method stacks

If we stack above the VM understood as "java method in order to perform and record some of the content", then the local Native method stacks second method is to record the basic line with other aspects of the virtual machine specification of these provisions is not a strict Therefore various different implementations of virtual machines. the famous "HotSpot" virtual machine stack and native method stacks merged.

stack

Heap (Heap) JVM memory is the largest one, is the main work area garbage collection. Examples of this region is the sole purpose of storage class. Depending on the heap there are different zoning virtual machines to perform garbage collection work which detailed zoning in place behind garbage collection will be described in detail.

Methods district

The method area is also a thread shared area for storing information already loaded classes, constants, static variables, the time compiler to compile the code and so on.

He has a more famous name "perpetual generations", HotSpot virtual machine to implement the method area became a permanent generations, to avoid the garbage collection to achieve a separate method area. Pros and cons of this move is not my side dishes of chicken can be analyzed, but we to understand why it is called permanent behalf? because the contents of the storage area, garbage collection efficiency is relatively low (constant static variables is less need to be recovered), so when the data enters this area, if there is a moment permanently.

This region there is also a separate area, runtime constant pool, when the class is loaded, the various reference symbols and literals will enter this area. During the run, also may be placed in the new constant constant pool, such string.intern()method.

Direct Memory

Direct memory not reflected in the data area of ​​FIG JVM runtime, but an additional memory area in NIO JDK1.4 introduced may be allocated off heap memory directly through Native methods. This improves performance.

The size of this area is not allocated to the virtual machine by limiting the size of the memory, but also always been limited by the physical memory of the machine, therefore, when there is an OutOfMemoryError, and the code to use in a large number NIO, they can be taken into account that a memory overflow occurs.

Memory Allocation

Virtual machine creation process on the object

When it comes to the creation of objects, perhaps we will think of that classic title: a subclass of a parent class, several static methods several common methods, several constructors, ask the print order of these methods.

But do not get me wrong, those things are not important now, the machines need to create an object of this process is far more complex, multi summarized as follows:

  1. When confronted with the new keyword, first check whether the constant pool can be found, and checks if the class is already loaded. If not, the first load classes.
  2. According to determine the size of the memory to acquire, access methods into pointer collisions and ? Free list . Pointer collision : If the memory is neat, the left is used, the right is idle, we need only move one space at a time allocation pointer can be. free list : If the memory is not regular, mutually staggered used and unused, the JVM must maintain a list to record what space is available which method to use to allocate memory depending on the specific use. garbage collector, because the garbage collector with some sort of memory function. so you can use the pointer crashed.
  3. After get allocated space, to all of the memory is initialized to zero value (not including the object header)
  4. Virtual machine settings object information, such as information belonging to the object class, the class metadata information, the hash code .GC generational information.
  5. Now is the constructor is executed, in order to set the value for each field.

In fact, there is a problem that is complicated by the problem, if there is only one pointer finger in the second step has been used and unused memory between, so frequent in the creation process, there must be concurrency issues. Virtual machine to solve this problem there are two main ways:

  1. CAS plus failure retry mechanism.
  2. TLAB. Local thread allocation buffer, each thread start with a small stack memory application, then partitioned this piece of memory, so it only needs to be synchronized at the time of application TLAB, increasing the ability to handle concurrency.

Objects created include what information?

In HotSpot, the object information includes: an object header, examples and data alignment padding.

Object header : the object includes a two part header information, object operational data (hash code, the GC age, etc.), the type of pointer (which indicate that it is an instance of the class). Examples of data : this data is that we define the code those fields etc. alignment padding : this data is not necessarily exist, when the object instance data is not an integer multiple of 8 bytes, empty characters are aligned with it.

Object Memory allocation mechanism

In fact, the object memory allocation and selection of the garbage collector, the virtual machine startup parameters, etc. have a great relationship, and therefore can not be identified, said: XXX XXX distribution on but always there are some universal rules.

Priority allocation in Eden

In most cases, the object is first in the regional distribution of Eden, Eden area when insufficient space, the virtual machine will be a Minor GC (new generation GC).

Large objects directly into the old year

Large Object (VM provides the parameters: -XX: PretenureSizeThreshold to adjust the threshold value of the large object) will be allocated directly in the old years due to the new generation garbage collection algorithm to copy, if the object is assigned to the new generation of large, may result in. a lot of memory replication occurs between two Survivor areas. affect the efficiency of garbage collection.

Long-term survival of the object enters years old

Each object has a counter age, when the object was born in eden after a minor GC and also the new generation, then, age 1. When you add to the age of 15 (the default), will be promoted to the old era.

Dynamic determine age

After reaching the top was promoted to the age old rule is not the only time, when the total size of the object Survivor space of the same age greater than half a comprehensive Survivor space, the virtual believe that age is a more appropriate threshold value, it will be older than or equal to the value of the object to move all the old era to go.

Allocation guarantees

When the minor GC imminent, the virtual machine to check whether old's assignment as the guarantee (the old era of contiguous memory is greater than the sum of all the new generation of viable object), if established, then that can be used as guarantees, were minorGC.

If false, then check inside the virtual machine settings HandlePromotionFailure whether to allow adventure, if allowed, proceed minorGC, otherwise FullGC is performed. If the venture fails, it would be a FullGC to free up enough space in the old era.

Garbage Collection

Speaking of garbage collection, we can always say that some fragmentary, because the JVM is too broad application, in addition to Java developers and many other JVM-based developers also need to understand that. But we have no system of finishing off here it?

Garbage collection, is about to release unused memory out to provide a follow-up program to use so there are three questions:

  1. Which memory for recycling?
  2. When recycling?
  3. How to recover?

We have a look at a problem.

Which memory for recycling?

Of course it is that the dead will never use objects for recycling.

Determine how an object will never be used up it?

Reference counting

The first is a reference counting method, its idea is to set up a counter for each object, whenever there is a reference to another place object, the count is incremented adder 1. When freed elsewhere reference to it, is decremented by 1 so the object counter is equal to 0, that is, the object can not be referenced again the.

This algorithm can actually achieve simple, fast speed judgment, but the mainstream JVM implementations which do not use this method because it has a more deadly problem is not solve the problem of circular references.

When two objects refer to each other, There is no other references, they should be recycled, but this time they are counter to 1. cause they can not be recovered.

We tested it with the following code:

public class ReferenceCountTest {

    public static final byte[] MB1 = new byte[1024 * 1024];
    public ReferenceCountTest reference;

    public static void main(String[] args) {

        ReferenceCountTest a = new ReferenceCountTest();
        ReferenceCountTest b = new ReferenceCountTest();

        a.reference = b;
        b.reference = a;

        a = null;
        b = null;

        System.gc();

    }
}
复制代码

Operating parameters: +XX:PrintGCoutput result [GC (System.gc()) 7057K->2294K(125952K), 0.0024641 secs]can be seen, recovered memory is lost, that I use the HotSpot virtual machine not using reference counting method to determine whether or not an object has survived.

Reachability algorithm

The basic idea of this algorithm is that, through a series of GC ROOT as a starting point to begin the search from the nodes along the chain of references, when an object to GCROOTS no path up, they think this is the object that can be recycled.

2019-08-10-11-25-11

In the figure above, object5,6,7 although there are references to each other, but because they do not reach the GCROOTS, also dead objects.

In Java GCROOTS generally include the following:

  • Virtual machine stack stack frame of the local variable table
  • Const reference
  • Reference a static properties
  • Reference the local Native method stack method

When recycling?

In fact, this problem is more complex, and many of the JVM implementation is not the same, we roughly an example to explain HotSpot.

First of all we need to know, garbage collection is a need to "Stop The World", as if the entire JVM is not suspended, then it can not determine which memory to be recovered in a moment. As if you give your mother to clean the room when you will catch go out, because if you keep spamming, there is no way to clean it up.

All current JVM implementation, during the enumeration root node (that is, to determine which memory is the need to be recovered) when this step needs to stop, we are doing just minimize the GC pauses to reduce the impact on the system .

Since the GC needs to "Stop The World", but the operation is not a sir you can stop at any time to match the GC.

So when the need GC pauses, the need to give a little time, so that all threads run to the nearest "safe point." In addition, in order to solve some thread in a suspended state in the GC, there is the concept of a security point of expansion concept, safe area, when a thread into the security zone, it will hang a sign, telling other people before I sign off, GC do not ask me. and when a thread wants to leave the security zone, you need to check whether they can safely leave identity.

How to recover it?

Implement different virtual machines are not the same, the same virtual machine heap may perform different regions are not the same, but in general, these types of algorithms thoughts are below.

Clear labeling

The most basic is marked ** - Clear (Mark-Sweep) **, the process of the algorithm and name, first mark all objects need to be recovered, after the reunification of their recovery as shown below.

2019-08-10-11-53-57

His advantages are: the idea is simple and easy to implement disadvantages are mainly two:

1. The high efficiency is not
2 in the recovered state after the drawing, because of the direct removal, the available memory is not continuous, full of debris, so that when a subsequent need to allocate a large object can not find enough space continuously, next advance will trigger a GC

Subsequent algorithm is mainly to mark - sweep the improved algorithm.

Replication algorithm

To solve the above problem, a "copy" algorithm, copy algorithm memory capacity is divided into two equal, uses only one of which, when used up, in which additionally to copy live objects on a memory, and this piece of memory has been used for the overall recovery. this will make the debris problem not consider the recovery and distribution, efficiency greatly improved, but the cost is only ever use half the memory, and that price was too expensive.

Copy algorithm execution as shown below:

2019-08-10-16-17-37

Modern commercial virtual machine basically using this algorithm to recover the new generation, because the new generation garbage collection is relatively frequent, more efficiency requirements higher.

While some improvements replication algorithm Statistically, 98% of all new generation object Chaosheng Xi die, it does not need to copy the memory in accordance with the algorithm 1: 1 to be divided, but is divided into Eden:Survivor1:Survivor2=8:1:1(adjustable ratio) tris block space, and the Eden each use a Survivor region, when necessary garbage collection, in which the copy live objects to another survivor then eden and have been used for a unified survivor recovered. Thus, compared to normal replication algorithm, every time you can use up to 90% of the space, less waste.

However, survivor memory size estimate is that we get, we have no way of ensuring the survival of garbage collection every time the object is less than 10%, it's old and needs to be allocated guarantee. Allocation guarantee means that if the survivor is not enough, you can apply for space objects stored in the old era.

Mark - Collation Algorithm (Mark-Compact)

In the object replication algorithm lower survival rate is a reliable method, but when the higher survival rate of the object, in extreme cases, a gc when 100% of the objects are alive, then copy the efficiency of the algorithm is not high. Therefore, HotSpot in the old era is another algorithm that is. mark - collation algorithm .

Mark - Collation Algorithm, the first and still the mark - sweep the same algorithm marking process, but not directly after the removal, but the surviving objects neatly finishing point, then the boundary is limited to recover lost memory outside the boundaries schematic follows:

2019-08-10-16-32-17

Generational collection algorithm

In the above garbage collection algorithm is also mentioned in the new generation , old's concept and so on, this is due to the current virtual machine using generational collection algorithm.

The main purpose is generational: Depending on the survival period of the object, the memory area is divided into pieces, stored objects of different life cycle, in order to facilitate the use of different garbage collection algorithms to improve the characteristics of the memory recovery efficiency.

For example, the new generation of low survival rate of the object, you can use the copy algorithm, each copy of the object can be a small amount, and high efficiency.

The high survival rate of the old era of the object, and no one can vouch for him to do the distribution, it is necessary to use the mark - finishing or mark - sweep algorithm.

So HotSpot, the entire Java heap is roughly look like this (the proportion of the old and the new generation's default is 1: 2):

2019-08-10-16-41-44

The garbage collector

Serial Collector

This is the most basic and oldest of the garbage collector, the process is a single-threaded collection, is still the default JVM new generation of collectors under the Client mode.

Below is his collection process:

2019-08-10-21-08-36

ParNew

ParNew collector is multi-threaded version of the Serial collectors, in addition to the use of multiple threads for garbage collection, the other exactly the same behavior and Serial collector figure is his collection process:

2019-08-10-21-10-01

Parallel Scavenge collector

The collectors in the definition and ParNew very similar, but its main concern is to improve the system throughput. His collection process and ParNew similar.

Serial Old

Serial Old collector's edition Serial old collector, using a mark - Collation Algorithm his collection process and Serial same.

Parallel Old

This is the Parallel Scaevnge collector's version of the old, multi-thread mark - sorting algorithm for collection.

His collection process and Parallel Scavenge collector is the same.

CMS collector

Concurrent Mark Sweep is a pause in the shortest time for the purpose of a collector, his collection process more complicated, divided into four steps:

  • Inst and
  • Concurrent mark
  • Relabel
  • Concurrent Clear

His collection process is as follows:

2019-08-10-21-19-28

G1 collector

G1 collector is the development of a better collector, his collection steps about the following sections:

  • The initial mark
  • Concurrent mark
  • The final mark
  • Filter Recycling

His collection process is as follows:

2019-08-10-21-27-56

to sum up

Unlimited garbage collector can not match, the following is a chart with them:

2019-08-10-21-29-50

Here relatively brief introduction to the garbage collector, mainly garbage collector is actually a very complex thing, but the good thing is a package, which is not too complicated to know, most of the time we use the latest stable the research results can be ....

However, we should look at, in the sense that the bottleneck lies in the garbage collector, when you can have the ability and the basics can go a detailed study.

Reference article

In-depth understanding of JVM


Finish.



ChangeLog

2019-08-11 completed

Above are all personal income and think, correct me if wrong welcomed the comments section.

Welcome to reprint, please sign, and retain the original link.

Contact E-mail: [email protected]

More study notes, see the individual blog ------> Huyan ten

Guess you like

Origin juejin.im/post/5d50c0b0f265da03ee6a4987