JVM (2): JVM memory model and garbage collection mechanism

JVM memory model and garbage collection mechanism


The java virtual machine will divide the memory it manages into several different data areas during the execution of the java program. The previous chapter described a lot about the runtime data area. ), so the memory design also focuses on these two aspects (note that these two areas are shared by threads). For the virtual machine stack, the local method stack, and the program counter are all thread-private

diagram

image-20221003101736014

One is the non-heap area and the other is the heap area.堆区分为两大块,一个是Old区,一个是Young区。默认Old:Young = 2 :1, Young分为两大块,一块是Eden区,一块是survivor区(s0+s1),Eden:s0:s1= 8:1:1,s0和S1一样大,也可以叫做Fron和To

According to the previous introduction to Heap, we can know that the creation of general objects and arrays will allocate memory space in the heap. The key is that there are so many areas in the heap, which area is the creation of an object in? ?

object creation area

In general, newly created objects will be allocated to the Eden area, and some special large objects will be directly allocated to the Old area.

For example, objects A, B, C, etc. are created in the Eden area, but the memory space in the Eden area is limited, such as 100M. If 100M has been used or reaches a set threshold, it is necessary to clean up the Eden memory space. , That is, garbage collection (Garbage Collect), such a GC we call Minor GC, Minor GC refers to the GC in the Young area.

After GC, some objects will be cleaned up, and some objects may still be alive. For the surviving objects, they need to be copied to the Survivor area, and then these objects in the Eden area should be cleared

Detailed Explanation of Survivor District

It can be seen from the diagram that Survivor is divided into two pieces, S0 and S1, which can also be called From and To.在同一个时间点,S0和S1只能有一个区有数据,另外一个是空的。

Next to the above GC, for example, at the beginning, there are only objects in the Eden area and the From area, and the To area is empty. At this time, a GC operation is performed, and the age of the objects in the From area will be +1. We know that the Eden area All surviving objects in will be copied to the To area, and the surviving objects in the From area will have two destinations.

If the age of the object reaches the previously set age threshold, the object will be moved to the Old area, and the objects that do not reach the threshold will be copied to the To area. At this time, the Eden area and the From area have been emptied (the objects that have been GC must be gone, and the objects that have not been GC have their own places).

At this time, From and To exchange roles, the previous From becomes To, and the previous To becomes From.

That is to say, it is necessary to ensure that the Survivor area named To is empty anyway.

Minor GC will repeat this process until the To area is filled, and then copy the object to the old generation (Old area)

Detailed Explanation of the Old District

It can be seen from the above analysis that generally the Old area is an object with a relatively older age, and the latter is an object that exceeds a certain threshold. There will also be GC operations in the Old area. The GC in the Old area is called Major GC. The age of the surviving objects after each GC will also be +1. If the age exceeds a certain threshold, it will be recycled.

Understanding of object life cycle

I am an ordinary JAVA object, I was born in Eden District, and I also saw a little brother who looks very similar to me in Eden District, and we played in Eden District for quite a long time. One day there were too many people in the Eden area, so I was forced to go to the "From" area of ​​the Survivor area. Since I went to the Survivor area, I started to drift, sometimes in the "From" area of ​​Survivor, Sometimes in the "To" area of ​​survivor, I live without a fixed place. Knowing that when I was 18 years old, my father said that I was an adult and it was time to go to the equipment, so I went to the Old area. In the old age, people A lot, and all of them are quite old, and I also met a lot of people here. In the old generation, I lived for 20 years (each GC plus one year old) and then died, and finally I was recycled as garbage and landed.

common problem

  • How to understand Minor/Major/Full GC

    Minor GC: new generation GC

    Major GC: Old generation GC

    Full GC: new generation GC + old generation GC

  • Why is the Survivor area needed? Can't it just be Eden?

    If there is no survivor, every time a Minor GC is performed in the Eden area, the surviving objects will be sent to the old generation. In this way, the old generation will be filled soon, triggering a Major GC (it should be that the Major GC is usually accompanied by the Minor GC , can also be seen as triggering Full GC). The memory space of the old generation is much larger than that of the new generation, and it takes much longer to perform a Full GC than a Minor GC. Frequent Full GC takes a long time and affects the execution and response speed of large programs.

    You might say, then increase or decrease the space of the old generation. If the old generation space is increased, more surviving objects can fill the old generation. Although the frequency of Full GC is reduced, as the space in the old age increases, once a Full GC occurs, it will take longer to execute. If the old generation space is reduced, although the time required for Full GC is reduced, the old generation will soon be filled with surviving objects, and the frequency of Full GC will increase;

    Therefore, the significance of Survivor is to reduce the number of objects sent to the old generation, thereby reducing the occurrence of Full GC. Survivor's pre-screening guarantees that only objects that can survive 16 Minor GCs in the new generation will be sent to the new generation. old age.

  • Why do we need two Survivor zones?

    The biggest advantage is to solve fragmentation, that is to say why a Survivor area does not work. As mentioned above, we know that a Survivor area must be set. If there is only one Survivor area now, let’s simulate the process:

    The newly created object is in Eden. Once Eden is full, a Minor GC is triggered, and the surviving objects in Eden will be moved to the Survivor area, and the cycle continues. The next time Eden is full, the problem will come. At this time During Minor GC, Eden and Survivor each have some surviving objects. If the surviving objects in the Eden area are forced into the Survivor area at this time, it is obvious that the memory occupied by these two objects is discontinuous, which leads to memory fragmentation. change. There is always one Survivor space that is empty, and another non-empty Survivor space that is continuous and free of fragments.

  • Why is Edsn:s1:s2 in the new generation 8:1:1?

    GC is statistically calculated that when the memory usage exceeds 98%, the memory should be recycled once by minor GC. But in practical applications, we can't really leave only 2% for them. In other words, it is a bit late to GC when the memory usage reaches 98%. It should be 10% more memory space reserved.

    In this way, the available memory in the new generation: the memory required by the copy algorithm is 9:1. When recycling, the data in the Eden area and the From area are recycled, that is, the ratio of Eden and From is 9. In addition, the JVM stipulates that, s1 and s2 in the two Survivor areas are relative, so Eden:s1:s2 = 8:1:1.

    Summarize:

    1. Available memory in the new generation: the memory guaranteed by the copy algorithm is 9:1
    2. Eden:s1 area in available memory is 8:1
    3. That is, Eden:s1:s2 = 8:1:1 in the new generation

Garbage Collect

It was said before that there is garbage collection in the heap memory, such as Minor GC in the Young area, Major GC in the Old area, and Full GC in the Young area and the Old area. But for an object, how can it be determined that it is garbage? Do you need to recycle? How to recycle it? We still need to explore these issues in detail.

Because Java automatically performs memory management and garbage collection, if we do not understand all aspects of garbage collection, it will be difficult for us to troubleshoot and solve problems once they occur. The automatic garbage collection mechanism is to find objects in the Java heap and process them Classification and discrimination, find out the objects that are being used and the objects that are no longer used, and then exclude those objects that will not be used from the heap.

About the garbage collection of various departments in the runtime data area

The program counter, virtual machine stack, and local method stack 3 areas are created and destroyed with the thread; the stack frame in the stack is executed in an orderly manner with the entry and exit of the method. Popping and stacking operations. How much memory is allocated in each stack frame is basically known when the class structure is determined (although some optimizations will be performed by the JIT compiler during runtime, but in the discussion based on the conceptual model in this chapter, it can be generally considered as The compiler can know), so the memory allocation and recovery in these areas are deterministic, and there is no need to think too much about recycling in these areas, because when the method ends or the thread ends, the memory will naturally follow It was recycled. The Java heap is different from the method area. The memory required by multiple implementation classes in an interface may be different, and the memory required by multiple branches in a method may also be different. We will not know it until the program is running. Which objects are created, the allocation and recovery of this part of memory are dynamic, and the garbage collector focuses on this part of memory

About the recycling method area

Many people think that the method area (or the permanent generation in the HotSpot virtual set) does not have garbage collection. The Java virtual machine specification does say that the virtual machine is not required to implement garbage collection in the method area, and it is "cost-effective" to perform garbage collection in the method area. Generally relatively low: In the heap, especially in the new generation, a garbage collection of conventional applications can generally reclaim 70% to 95% of the space, while the garbage collection efficiency of the permanent generation is much lower than this.

Permanent generation garbage collection mainly recycles two parts: obsolete constants and useless classes. Recycling obsolete constants is very similar to reclaiming objects in the Java heap. Take the recycling of literals in the constant pool as an example. Suppose a string "abc" has entered the constant pool, but there is no String object called "abc" in the current system. In other words, there is no String object that references the constant. The "abc" constant in the pool has no other references to this literal value. If memory recovery occurs at this time, and if necessary, the "abc" constant will be "requested" out of the constant pool by the system. The symbol application of other classes (interfaces), methods, and fields in the constant pool is also similar.

How to determine if an object is garbage?

To perform garbage collection, you must first know what kind of objects are garbage.

reference counting

For an object, as long as the application holds a reference to the object, it means that the object is not garbage. If an object does not have any pointer references to it, it is garbage.弊端就是如果AB相互持有引用,导致永远不能被回收

accessibility analysis

Through the object of GC Root, start looking down to see if an object is reachable

Can be used as GC ROOT: class loader, Thread, local variable table of virtual machine stack, static members, constant references, variables of local method stack, etc.

Objects referenced in the virtual machine stack (local variable table in the stack frame), objects referenced by class static properties in the method area, objects referenced by constants in the method area, and referenced by JNI (generally referred to as Native methods) in the local method stack object.

garbage collection algorithm

After being able to determine that an object is garbage, the next thing to consider is recycling. How to recycle? There must be a corresponding algorithm. The following introduces common garbage collection algorithms.

Mark-clear (Mark-Sweep)

  • mark

    Find objects in memory that need to be recycled and mark them

At this time, all objects in the heap will be scanned once to determine the objects that need to be recycled, which is time-consuming

image-20221005133212600

  • to clear

    Clear the objects that are marked to be recycled, and release the corresponding memory space

    image-20221005133340231

shortcoming

  1. The two processes of marking and clearing are time-consuming and inefficient
  2. A large number of discontinuous memory fragments will be generated. Too much space fragmentation may cause that when the program needs to allocate large objects in the future, it will not be able to find enough continuous memory and have to trigger another garbage collection action in advance.

Copying

Divide the memory into two equal areas, and only use one of them at a time, as shown in the following figure:

image-20221005134106371

When one piece of memory is used up, copy the surviving object to another piece, and then clear the used memory space at one time.

image-20221005134222427

shortcoming

  1. low space utilization

Mark-Compact (Mark-Compact)

The copy collection algorithm will perform more copy operations when the object survival rate is high, and the efficiency will become lower. More importantly, if you don't want to waste 50% of the space, you need additional space for allocation guarantees to deal with the extreme situation where all objects in the used memory are 100% alive, so the old generation generally cannot directly use this algorithm .

The marking process is still the same as the "mark-clear" algorithm, but the subsequent steps do not directly clean up the recyclable objects, but all surviving objects move to one end, and then directly clean up the memory outside the end boundary

image-20221005171000865

Let all surviving objects move to one section, and clean up the memory outside the boundary.

image-20221005171046935

Generational Collection Algorithm

The 3 garbage collection algorithms are introduced above, so which one should be used in the heap memory?

Generational Collection Algorithm

In order to increase the efficiency of garbage collection, the JVM will divide the memory into several blocks according to the life cycle of the object, and divide the heap into the new generation and the old generation. In this way, the most appropriate collection algorithm can be adopted according to the characteristics of each era.

  • In the new generation, it is found that a large number of objects die and only a small number of objects survive each time garbage is collected. Then, a copy algorithm is used, and the collection can be completed only by paying the cost of copying a small number of surviving objects.
  • In the old generation, because of the high survival rate of objects, there is no additional space to allocate guarantees for them, so the "mark-clear" or "mark-compact" algorithm must be used for recycling.

**Young area: **Replication algorithm (after the object is allocated, the life cycle may be relatively short, and the copying efficiency of the Young area is relatively high)

**Old area: **mark clearing or mark finishing (objects in the Old area live for a long time, it is not necessary to copy and copy, it is better to mark and clean up)

garbage collector

If the collection algorithm is the methodology of memory recovery, then the garbage collector is the specific implementation of memory recovery.

image-20221005174628962

Serial collector

The Serial collector is the most basic collector with the longest development history. It used to be (before JDK1.3.1) the only choice for the new generation of virtual machines to collect. It is a single-threaded collector, which not only means that it only uses one CPU or one collection thread to complete the garbage collection work, but more importantly, it needs to suspend other threads during garbage collection.

Advantages: Simple and efficient, with high single-threaded collection efficiency

Disadvantage: the collection process needs to suspend all threads

Algorithm: Copy Algorithm

Scope of application: new generation

Application: the default new generation collector in Client mode

image-20221005194844979

ParNew collector

This collector can be understood as a multi-threaded version of the Serial collector

Advantages: When there are multiple CPUs, it is more efficient than Serial.

Disadvantages: The collection process suspends all application threads, and the efficiency is worse than Serial when a single CPU is used.

Algorithm: Copy Algorithm

Scope of application: new generation

Application: The preferred new generation collector in a virtual machine running in Server mode

image-20221005195259685

Parallel Scavenge Collector

The Parallel Scavenge collector is a new generation collector. It is also a collector that uses the replication algorithm and a parallel multi-threaded collector. It looks the same as ParNew, but Parallel Scanvenge pays more attention to系统的吞吐量

Throughput = time to run user code / (time to run user code + garbage collection time)

For example, if the virtual machine runs for 100 minutes in total, and the garbage collection takes 1 minute, throughput = (100-1) / 100 = 99%.

If the throughput is greater, it means that the garbage collection time is shorter, and the user code can make full use of CPU resources to complete the calculation tasks of the program as soon as possible.

Command: -XX:MaxGCPauseMillis controls the maximum garbage collection pause time

-XX: GCRatio directly sets the size of the throughput.

Serial Old Collector

The Serial Old collector is the old version of the Serial collector. It is also a single-threaded collector. The difference is that it uses the "mark-sort algorithm", and the operation process is the same as the Serial collector.

image-20221005200323891

Parallel Old Collector

The Parallel Old collector is an old version of the Parallel Scavenge collector, which uses multithreading and the "mark-compact algorithm" for garbage collection.吞吐量优先

CMS collector

The CMS (Concurrent Mark Sweep) collector is a 最短停顿时间collector that targets acquisition. The "mark-clear algorithm" is used, and the whole process is divided into 4 steps

  1. Initial mark CMS initial mark marks the objects that GC Roots can be associated with
  2. Concurrent mark CMS concurrent mark for GC Roots Tracing
  3. Remark CMS remark Modify and concurrently mark content changed by user programs
  4. Concurrent sweep CMS concurrent sweep

Because of the concurrent marking and concurrent clearing throughout the process, the collector thread can work together with the user thread, so in general, the memory recovery process of the CMS collector is executed concurrently by the user thread.

**Advantages: **Concurrent collection, low pause

**Disadvantages: **Generate a lot of space fragmentation, concurrent phase will reduce throughput

image-20221005202515020

G1 collector

The G1 collector is officially used as a commercial collector in JDK 7. Compared with the previous collectors, G1 has the following characteristics

  1. Parallel and concurrent
  2. Generational collection (still retains the concept of generation)
  3. Spatial integration (it belongs to the "mark-sort" algorithm as a whole, which will not cause space fragmentation)
  4. Predictable pause (more advanced than CMS is that it allows users to specify a time segment of M milliseconds in length, and the time spent on garbage collection must not exceed N seconds)

When using the G1 collector, the memory layout of the Java heap is very different from other collectors. It divides the entire Java heap into multiple independent regions (Regions) of equal size, although the concepts of the new generation and the old generation are still retained. , but the new generation and the old generation are no longer physically isolated, they are all part of the Region (does not need to be continuous) collection

The working process can be divided into the following steps:

  1. Initial marking (Initial Marking) Mark the objects that GC Roots can associate with, and modify the value of TAMS, you need to suspend the user thread
  2. Concurrent Marking conducts reachability analysis from GC Roots, finds out surviving objects, and executes them concurrently with user threads
  3. Final Marking (Final Marking) corrects the data that changes due to the concurrent execution of user programs during the concurrent marking phase, and the user thread needs to be suspended
  4. Screening recovery (Live Data Counting and Evacuation) sorts the recovery value and cost of each Region, and formulates a recovery plan based on the user's expected GC pause time

image-20221005204734004

garbage collector classification

  • Serial collectors Serial and Serial Old

    There is only one garbage collection thread, and the user thread is suspended, which is suitable for embedded devices with relatively small memory

  • Parallel collector [throughput priority] Parallel Scanvenge, Parallel Old

    Multiple garbage collection threads work in parallel, but at this time the user thread is still in a waiting state. Applicable to several interactive scenarios such as scientific computing and background processing

  • Concurrent collector [Pause priority] CMS, G1

    The user thread and the garbage collection thread are executed at the same time (but not necessarily in parallel, they may be executed alternately), and the garbage collection thread will not stop the running of the user thread during execution. Applicable to scenarios that require relative time, such as Web

common problem

  • Throughput and Pause Time

    • Pause time -> The garbage collector performs garbage collection and the corresponding time for the terminal application to execute
    • Throughput -> time to run user code / (time to run user code + time to garbage collect)

    The shorter the pause time, the more suitable for programs that need to interact with users, and a good response speed can improve user experience;

    High throughput can efficiently use CPU time and complete the calculation tasks of the program as soon as possible. It is mainly suitable for tasks that operate in the background without requiring too much interaction.

    These two indicators are also the criteria for evaluating the benefits of the garbage collector. Tuning is also observing the values ​​of these two variables, and a balance has been reached.

  • How to choose the right garbage collector

    • Give priority to adjusting the size of the heap and let the server choose by itself
    • If the memory is less than 100M, use the serial collector
    • If it is a single core and there is no pause time requirement, use serial or JVM to choose
    • If the pause time is allowed to exceed 1 second, choose parallelism or JVM's own choice
    • If response time is most important and cannot exceed 1 second, use a concurrent collector
  • G1 collector

    JDK7 started to use, JDK8 is very mature, JDK9 default garbage collector, suitable for new and old age

    Is the G1 collector used?

    1. More than 50% of the heap is occupied by live objects
    2. The speed of object allocation and promotion varies wildly
    3. Garbage collection takes longer
  • How to enable the required garbage collector

    1. serial

      -XX:+UseSerialGC

      -XX:+UseSerialOldGC

    2. Parallel (throughput priority)

      -XX:+UseParallelGC

      -XX:+UseParallelOldGC

    3. Concurrent collector (response time priority)

      -XX:+UseConcMarkSweepGC

      -XX:+UseG1GC
      for 1 second, use the concurrent collector

  • G1 collector

    JDK7 started to use, JDK8 is very mature, JDK9 default garbage collector, suitable for new and old age

    Is the G1 collector used?

    1. More than 50% of the heap is occupied by live objects
    2. The speed of object allocation and promotion varies wildly
    3. Garbage collection takes longer
  • How to enable the required garbage collector

    1. serial

      -XX:+UseSerialGC

      -XX:+UseSerialOldGC

    2. Parallel (throughput priority)

      -XX:+UseParallelGC

      -XX:+UseParallelOldGC

    3. Concurrent collector (response time priority)

      -XX:+UseConcMarkSweepGC

      -XX:+UseG1GC

Guess you like

Origin blog.csdn.net/Hong_pro/article/details/127177379