Garbage collector and memory allocation strategy

garbage collection

  reference counting algorithm

    Add a reference counter to the object. Whenever there is a place to refer to it, the counter value is incremented by 1, and when the reference is invalid, the counter value is decremented by 1; the object whose counter is 0 at any time is no longer used. But it cannot solve the problem of circular references between objects.

  Reachability Analysis Algorithms

    Through a series of GC Roots objects as the starting point, start the search from these nodes, and the path traversed by the search becomes the reference chain. When an object is not connected to the GC Roots by any reference chain, it proves that the object is not available. use. In the Java language, objects that can be used as GC Roots include: objects referenced in the virtual machine stack; objects referenced by class static properties in the method area; objects referenced by constants in the method area and objects referenced by JNI in the local method stack.

 

  Java divides references into four types: Strong Reference, Soft Reference, Weak Reference and Plant Reference.

    Strong references are ubiquitous in program code, such as references generated by new Object. As long as the strong reference still exists, the garbage collector will not reclaim the referenced object.

    Soft references are used to describe objects that are useful but not necessary. For objects associated with soft references, these objects will be listed in the reclamation range for a second reclamation before a memory overflow exception occurs in the system. If there is not enough memory for this collection, a memory exception will be thrown

    Weak references are also used to describe non-essential objects, but their strength is weaker than soft references. Objects associated with weak references can only survive until the next garbage collection occurs. When the garbage collector works, objects that are only associated with weak references will be reclaimed regardless of whether the current memory is sufficient or not.

    Phantom references, also known as ghost references or phantom references, are the weakest kind of reference relationship. Whether an object has a virtual reference or not will not affect its lifetime at all, and it is impossible to obtain an object instance through a weak reference. The only purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector

 

  The unreachable object in the reachability analysis algorithm is not "necessary to die". At this time, it is in the "probation" stage. To truly declare an object dead, at least two marking processes are required: if the object is reaching After the sex analysis, it is found that there is no reference chain connected to the GC Roots, then it will be marked and screened for the first time. The screening condition is whether it is necessary to execute the finalize() method on this object. When the object does not override the finalize() method, or the finalize() method has been called by the virtual machine, the virtual machine treats both cases as "unnecessary to execute". If the object is determined to be necessary to execute the finalize() method, then the object will be placed in the F-Queue queue and executed later by a low-priority Finalizer thread automatically created by the virtual machine. The GC will mark the objects in the F-Queue for the second time. If the object is to successfully re-associate with any object on the reference chain before finalize(), it will be removed at the "soon to be recycled" collection. The finalize() method of the object will only be executed once. If the object faces the second recycling, the finalize() method will not be executed again.

  

  Mark-Sweep algorithm

    The mark and clear algorithm is divided into two stages: mark and clear: first mark all the objects that need to be recycled, and after the mark is completed, all the marked objects are uniformly recycled. The mark clearing algorithm has two shortcomings: the efficiency of marking and clearing is not high; after mark clearing, a large number of discontinuous memory fragments are generated, and too much space fragmentation may lead to the need to allocate larger objects in the future running process. Couldn't find enough contiguous memory and had to trigger another garbage collection early.

  Copying algorithm

    The copying algorithm divides the available memory into two equal-sized blocks by capacity, and only uses one of them at a time. When the memory of this block is used up, the surviving objects are copied to another block, and then the used memory space is cleaned up at one time. In this way, only half of the area is reclaimed each time, and there is no need to consider complex situations such as memory fragmentation during memory allocation. Just move the top pointer of the heap and allocate memory in sequence, which is simple to implement and efficient to run. Today's commercial virtual machines use this collection algorithm to recycle the young generation. IBM divides memory into a larger Eden space and two smaller Survivor spaces, using Eden and one Survivor each time. When recycling, copy the surviving objects in Eden and Survivor to another Survivor space at one time, and finally clean up Eden and the Survivor space just used. The default size of Eden and Survivor in HotSpot virtual machine is 8:1. When the Survivor space is not enough, it needs to rely on other memory for allocation guarantee.

  Generational Collection algorithm

    Divide the Java heap into young generation and old generation. In the new generation, a large number of objects are found dead each time a garbage collection is performed, and only a few survive. In this case, the replication algorithm is used, and the collection can be completed with only a small cost of copying the inventory objects. In the old age, because the object has a high survival rate and there is no additional space to allocate it, the "mark-clean" or "mark-clean" algorithm must be used for recycling.

  

  HotSpot algorithm implementation

    In HotSpot's implementation, it uses a set of data structures called OopMaps to know where object references are stored. When the class is loaded, HotSpot calculates what type of data is at what offset in the object. During the JIT compilation process, it also records which positions in the stack and registers are references at specific locations. The GC can know this information when it scans. These specific locations are called safe points, that is, the program execution does not stop at all places to start GC, and can only pause when it reaches a safe point. The selection of the safe point is basically based on whether the program "has the feature of allowing the program to execute for a long time" as a criterion. The most obvious feature of "long execution" is the multiplexing of instruction sequences, such as method calls, loop jumps, exception jumps, etc., so the instructions with these functions will generate SafePoint. There are two schemes to make all threads run to the nearest safe point to stop - preemptive interrupt (Preemptive Suspension) and active interrupt (Voluntary Suspension). Preemptive interruption does not require the active cooperation of the execution code of the thread. When the GC occurs, all threads are first interrupted. If it is found that the place where the thread is interrupted is not at the safe point, the thread is restored and it runs to the safe point. Few virtual machine implementations today employ preemptive interrupts to suspend threads in response to GC events. The idea of ​​active interrupt is that when the GC needs to interrupt the thread, it does not directly operate on the thread, but simply sets a flag. Each thread actively polls the flag when executing, and interrupts and suspends itself when the interrupt flag is found to be true. The location of the polling flag coincides with the safe point. Safe points can only solve the situation that the thread is executing, but when the thread is suspended, a safe area (Safe Region) is needed to solve it. A safe area means that in a piece of code, the reference relationship will not change. Starting a GC anywhere in this area is a safe point. When the thread executes the code in the Safe Region, it first identifies that it has entered the Safe Region. When the JVM initiates GC during this period, it does not need to care about the thread in the Safe Region state. When the thread is about to leave the Safe Region, it needs to check whether the system has completed the enumeration of the root nodes. If so, the thread continues to execute, otherwise it must wait until it receives a signal that it is safe to leave the Safe Region.

 

Recycling method area

  Garbage collection in the method area is less cost-effective. Permanent generation garbage collection mainly recycles two parts: discarded constants and useless classes. To determine whether a constant is an abandoned constant is to see if there is an object in the current system that references the variable in the constant pool. To judge a class as a useless class, three conditions need to be met: all instances of the class have been recycled, that is, there is no instance of the class in the Java heap; the ClassLoader that loaded the class has been recycled; the java. The lang.Class object is not referenced anywhere and the methods of this class cannot be accessed through reflection anywhere.

 

garbage collector

  

  Serial

    Serial is a single-threaded collector, which must suspend all other worker threads during garbage collection until it finishes collecting. Serial has no ready-made interaction overhead for a single CPU environment, so it is very efficient.

 

  ParNew

    ParNew is a multithreaded version of the Serial collector. Currently only it works with the CMS collector other than the Serial collector.

 

 

  Parallel Scavenge Collector

    Parallel Scavenge is a new generation collector, it is also a collector using the replication algorithm, and it is also a parallel multi-threaded collector. The characteristic of the Parallel Scavenge collector is that its focus is different from other collectors. The focus of collectors such as CMS is to shorten the pause time of user threads during garbage collection as much as possible, while the purpose of the Parallel Scavenge collector is to achieve a controllable Throughput. Throughput is the ratio of the time the CPU spends running user code to the total CPU consumption time (running user code + garbage collection time). The Paralle Scavenge collector provides two parameters for precise control of throughput, namely -XX:MaxGCPauseMillis to control the maximum garbage collection pause time and -XX:GCTimeRatio to directly set the throughput size. The allowed value of the MaxGCPauseMiilis parameter is a number of milliseconds greater than 0, and the collector will try to ensure that the time spent in memory collection does not exceed the set value. The shortened GC pause time is traded at the expense of throughput and young generation space. After the switch -XX:+UseAdaptiveSizePolicy is turned on, there is no need to manually specify the size of the new generation, the ratio of Eden and Survivor areas, and the age of objects in the old generation. The virtual machine collects performance monitoring information according to the current system operation. Dynamic Adjusting these parameters has provided the most suitable pause time or maximum throughput, and this adjustment method has become an adaptive adjustment strategy for GC.

 

  Serial Old collector

    Serial Old is the old version of the Serial collector. It is also a single-threaded collector. It uses the "mark-sort" algorithm. This mobile device is mainly used by a virtual machine in Client mode. In Server mode, Serial Old has two main purposes: one is to be used with the Parallel Scavenge collector in JDK1.5 and earlier versions, and the other is to be used as a backup plan for the CMS collector. Used when Concurrent Mode Failure.

  

  Parallel Old collector

    Parallel Old is an older version of the Parallel Scanvenge collector, using multithreading and a "mark-and-sort" algorithm.

 

  CMS collector

    The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest recovery pause time. CMS is based on the "mark-sweep" algorithm. Its operation process is divided into four steps: initial mark (CMS initial mark), concurrent mark (CMS concurrent mark), remark (CMS remark) and concurrent clear (CMS concurrent) sweep). The two steps of initial marking and re-marking still require "Stop The World". The initial marking is only to mark the objects that GC Roots can directly associate with, which is very fast. The concurrent marking stage is the process of GC Roots Tracing, and the re-marking stage is to correct the marking caused by the user program continuing to operate during the concurrent marking period. The marking record of the changed part of the object, the pause time of this stage is generally slightly longer than the initial marking stage, but much shorter than the time of concurrent marking. In the whole process, the longest concurrent marking and concurrent clearing process collector threads can work together with user threads, so in general, the memory reclamation process of the CMS collector is performed concurrently with user threads. CMS has 3 distinct drawbacks: The CMS collector is very sensitive to CPU resources. In the concurrent phase, although the user thread will not be paused, it will cause the application to slow down because a part of the thread is occupied, and the total throughput will be reduced. The default number of recycling threads started by CMS is (number of CPUs + 3)/4. That is, when there are more than 4 CPUs, no less than 25% of the CPU resources of the garbage mobile phone thread will be recycled concurrently, and it will decrease as the number of CPUs increases. But when the CPU is less than 4, the impact of CMS on the user program may become relatively large. To cope with this situation, the virtual machine provides a variant of the CMS collector called the "Incremental Concurrent Mark Sweep", which does the same thing as a single-CPU stick-single PC operating system using a preemptive The idea of ​​simulating the multi-tasking mechanism is the same, let the GC thread and the user thread run alternately during concurrent marking and cleaning, and minimize the time for the exclusive resource of the GC thread. Users are no longer encouraged to use it. CMS collector cannot handle floating garbage (Floating Garbage), there may be a "Concurrent Mode Failure" failure resulting in another Full GC. Since the user thread is still running in the concurrent cleanup phase of the CMS, and new garbage is continuously generated, after this part of the garbage is marked, the CMS can only clean it up in the next GC. This part of the garbage is called "floating garbage". Since the user thread still needs to run during the garbage collection phase, it is necessary to reserve enough memory space for the user thread to use. CMS is a collector based on the "mark-sweep" algorithm, so that a large amount of space debris will be generated after the collection is completed. When there are too many space fragments, space cannot be allocated for large objects, resulting in triggering a Full GC in advance. The CMS collector provides a -XX:+UseCMSCompactAtFullCollection switch parameter to enable the process of merging and defragmenting memory fragments when the CMS is about to perform a FullGC. The memory sorting process cannot be concurrent, and there are no space fragments, but the pause time has to be longer. -XX:CMSFullGCsBeforeCompaction is used to set how many times to perform full GC without compression, and then perform a compression with

  

  

  G1 collector

    G1 (Garbage-First) is a garbage collector for server-side applications. GI has the following characteristics:

      Parallelism and Concurrency: G1 can make full use of the hardware conditions in multi-CPU and multi-core environments, and use multiple CPUs to shorten the pause time of Stop-The-World

      Generational collection: Although G1 can independently manage the entire GC heap without the cooperation of other collectors, it can handle newly created objects and old objects that have survived for a period of time and survived multiple GCs in different ways.

      Spatial integration: Unlike the "mark-clean" algorithm of CMS, G1 is a collector based on the "mark-clean" algorithm as a whole and a "copy" algorithm locally, but in any case, this Both algorithms mean that G1 does not generate memory space fragmentation during operation, and can provide regular usable memory after collection. This feature is beneficial for the program to run for a long time, and the next GC will not be triggered in advance because the contiguous memory space cannot be found when allocating large objects.

      Predictable pauses: G1 can build a predictable pause time model, but the user can explicitly specify that in a time segment of length M milliseconds, the time spent on garbage collection must exceed N milliseconds

    G1 divides the entire Java heap into multiple independent regions (Regions) of equal size. Although the concepts of the new generation and the old generation are still retained, the new generation and the old generation are no longer physically isolated. They are all part of the Region. gather. G1 avoids a region-wide garbage collection of the entire Java heap. G1 tracks the value of garbage accumulation in each Region (the amount of space obtained by recycling and the experience value of the time required for recycling), and maintains a priority list in the background. Each time, according to the allowed collection time, the Region with the largest value is preferentially recycled. Regions cannot be isolated. An object allocated in a certain Region can not only be referenced by other objects in this Region, but can have a reference relationship with any object in the entire Java heap, which is making a reachability judgment. When determining whether an object is alive, the entire Java heap needs to be scanned. In the G1 collector, the object references between Regions and the object references between the young generation and the old generation in other collectors, the JVM uses the Remembered Set to avoid all-pair scanning. Each Region in G1 has a corresponding Remembered Set. When the JVM finds that when the program writes the data of the Reference type, it will generate a Write Barrier to temporarily interrupt the writing operation, and check whether the object referenced by the Reference is in a different Region. middle. If so, the relevant reference information is recorded in the Remembered Set of the Region to which the referenced object belongs through CardTable. When performing memory reclamation, adding a Remembered Set to the enumeration range of the GC root node can ensure that there will be no omissions in the full heap scan.

    If the operation of maintaining the Remembered Set is not calculated, the operation of the G1 collector can be roughly divided into the following steps: Initial Marking, Concurrent Marking, Final Marking, Live Data Counting and Evacuation). The initial marking stage is only to mark the objects that GC Roots can directly associate with, and modify the value of TAMS (Next Top at Mark Start), so that when the user program runs concurrently in the next stage, new objects can be created in the correct available Regions. This stage needs to pause the thread, but the time is very short. The concurrent marking phase is to analyze the reachability of objects in the heap from the GC Root to find out the surviving objects. This phase takes a long time, but it can be executed concurrently with the user program. In the final marking stage, in order to correct the part of the marking record that changes the marking due to the continuous operation of the user program during the concurrent marking, the virtual machine records the object changes during this period in the thread Remembered Set Logs. In the final marking stage, the Remembered Set needs to be The data of Logs is merged into the Remembered Set. This stage needs to stop the thread, but it can be executed in parallel. Finally, in the screening and recycling stage, the recycling value and cost of each Region are first sorted, and the recycling plan is formulated according to the GC pause time expected by the user.

 

  

  Summary of Garbage Collector Parameters

parameter describe
UseSerialGC The default value of the virtual machine running in Client mode, after this switch is turned on, use the Serial + Serial Old collector combination for memory reclamation
UseParNewGC When this switch is turned on, use the collector combination of ParNew + Serial Old for memory reclamation
UseConcMarkSweepGC When this switch is turned on, use the collector combination of ParNew + CMS + Serial Old for memory reclamation. The Serial Old collector will be used as a backup collector after the CMS collector fails Concurrent Mode Failure
UseParallelGC  The virtual machine runs in the default value of Server mode. After this switch is turned on, the collector combination of Parallel Scavenge + Serial Old (PS Mark Sweep) is used for memory reclamation
SurvivorRatio The ratio of the capacity of the Eden area to the Survivor area in the new generation, the default is 8, which means Eden: Survivor = 8 : 1
PretenureSizeThreshold The object size is directly promoted to the old age. After setting this parameter, objects larger than this parameter are directly allocated in the old age.
MaxTenuringThreshold The age of the object promoted to the old generation. After each object persists in Minor GC once, the age will increase, and when it exceeds this parameter value, it will enter the old age
UseAdpativeSizePolicy Dynamically adjust the size of various regions in the Java heap and the age of entering the old generation
HandlePromotionFailure  Whether to allow the allocation guarantee failure, that is, the remaining space in the old generation is not enough to cope with the extreme situation that all objects in the entire Eden and Survivor areas of the new generation are alive
ParallelGCThreads  Set the number of threads for memory reclamation during parallel GC
GCTimeRatio The ratio of GC time to total time, the default value is 99, which allows 1% of GC time. Only takes effect when using the Parallel Scavenge collector
MaxGCPauseMillis  Set the maximum pause time for GC. Only takes effect when using the Parallel Scavenge collector
CMSInitiatingOccupancyFraction  Sets how much old generation space is used by the CMS collector to trigger garbage collection. The default value is 68%, which only takes effect when the CMS collector
UseCMSCompactAtFullCollection  Sets whether the CMS collector will perform a memory defragmentation after garbage collection. It only takes effect when the CMS collector is used.
CMSFullGCsBeforeCompaction  Set the CMS collector to start a memory defragmentation after several garbage collections. Only takes effect when using the CMS collector

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  

 

  New Generation GC (Minor GC): The garbage collection action that occurs in the new generation.

  Old age GC (Major GC/Full GC): GC that occurs in the old age, there is a Major GC, which is often accompanied by at least one Minor GC. Major GC is generally more than 10 times slower than Minor GC

 

  Memory allocation and reclamation strategy

    The automatic memory management advocated in the Java technology system can ultimately be attributed to automatically solving two problems: allocating memory to objects and reclaiming the memory allocated to objects. The memory allocation of the object is mainly in the new generation Eden area. If the local thread allocation buffer is started, it will be allocated on the TLAB according to the thread priority. A few cases may also be allocated directly in the old generation.  

    Objects are allocated preferentially in Eden: In most cases, objects are allocated in the new generation Eden area. When the Eden area does not have enough space for allocation, the virtual machine will initiate a Minor GC. The virtual machine provides the collector log parameter -XX:+PrintGCDetails, which tells the virtual machine to print the memory reclamation log when garbage collection occurs, and output the current allocation of memory areas when the process exits.

    Large objects go directly into the old age: Large objects are Java objects that require a lot of contiguous memory space. The virtual machine provides a -XX:PretenureSizeThreshold parameter, so that objects larger than this setting value are directly allocated in the old age.

    Long-lived objects will enter the old age: the virtual machine defines an object age counter for each object. If the object is still alive after Eden's birth and the first Minor GC, and can be accommodated by the Survivor, it will be moved to the Survivor space, and the age of the object is set to 1. Every time the object "survives" a Minor GC in the Survivor area, the age increases by 1 year. When its age increases to a certain level (15 years by default), it will be promoted to the old generation. You can use -XX:MaxTenuringThreshold to set the age threshold for the object to be promoted to the old generation.

    Dynamic object age determination: The virtual machine does not always require that the age of the object must reach the MaxTenruingThreshold to be promoted to the old age. If the sum of the size of all objects of the same age in the Survivor space is greater than half of the Survivor space, objects whose age is greater than or equal to this age can directly enter the old age without waiting for the age required in MaxTenuringThreshold.

    Space allocation guarantee: Before Minor GC occurs, the virtual machine first checks whether the maximum available continuous space in the old generation is greater than the total space of all objects in the young generation. If this condition is true, then Minor GC can be guaranteed to be safe. If not, the virtual machine checks whether the HandlePromotionFailure setting value allows guarantee failure. If it is allowed, it will continue to check whether the maximum available continuous space in the old age is greater than the average size of objects promoted to the old age. The HandlePromotionFailure setting does not allow risk, and it is necessary to change to a Full GC at this time. The new generation adopts the copy collection algorithm, but for memory utilization, only one of the Survivor spaces is used as a rotating backup. Therefore, when a large number of objects still survive after the Minor GC, the old generation is required to allocate guarantees. The contained object goes directly into the old age. Taking the average value for comparison is still a method of dynamic probability. If the number of objects surviving in a Minor GC increases suddenly, which is much higher than the average value, it will still lead to Handle Promotion Failure. If there is a Handle Promotion Failure failure, you have to re-initiate a Full GC.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324894348&siteId=291194637