Java garbage collection algorithm in the Detailed

I. Introduction

  Some time ago generally looked at "in-depth understanding of the Java Virtual Machine," this book, have a certain understanding of the relevant knowledge base, ready to write about JVMfamily blog, this is the second. This blog just to talk about the JVMuse to which the garbage collection algorithm.


Second, the text

 2.1 What is garbage collection

  Before the formal presentation garbage collection algorithm, first is that what is garbage collection. Here the refuse mainly refers to the object has no longer continue to use, of course, there may be other, such as no longer in use classes and constants, but mainly refers to the object, so the following algorithm will introduce the subject to collection. So garbage collection is the meaning: the memory object has not been used (or classes and constants) clear, free up memory space .

  JVMThe memory model divided into five parts, wherein the heap memory sole purpose is to store an object, the object is basically stored in the heap memory of. Stack, in order to facilitate garbage collection, the memory will be generally divided into two parts:

  • New Generation : used to store the short life cycle of an object. Since this piece of memory objects in a shorter survival time, so frequent garbage collection, and each recycling in general can release large amounts of space;
  • Old Year : used to store long life cycle of the object. The new generation survived longer objects are migrated here (of course, not the only object enters years old this method), so here stored object life cycle generally longer, so the frequency of garbage collection in the area this one happen lower, released less space;

  The following discussion started JVMgarbage collection algorithm.


 2.2 How to identify spam

  The first step is to find the garbage collection garbage (we are here mainly to the object, for example), that is, objects can not be used. Objects can not be used under what circumstances? Very simple, no references to the object, naturally, we can not use it, for example, look at the following code:

public static void main(String[] args) throws InterruptedException {
    Object a = new Object();
    a = null;
}

  The above code, I create an object and use variables ato point to this object, but after that, I turn nullassigned to the a, what it will be? Not difficult to find, we have been unable to use this object, and it has been lost, because we have been unable to call the object by any variable, but it is still in memory. In this case, the memory occupied by the object is a waste of resources, we want it to be cleared. So, we can think of, when an object is no more references to it, you can think that he is a target of the garbage.

(1) Reference counting

  Reference counting is unnecessary to identify the object by reference. We record the number of each reference to an object, if a new variable reference objects, the object reference number plus 1; If a failure of reference, reference number minus 1, and the reference number of the object 0 It can be recycled garbage. Here we must note that if the member variables of these objects garbage references to other objects, the object is released when the garbage, it's natural that a reference on the failure.

  This algorithm is simple, high efficiency, however, it has not been used in mainstream Javavirtual machine, because it has a big drawback - it is difficult to resolve the circular reference problem. What is a circular reference, look at the following piece of code:

public class Main {
    
    private Object obj;
    
    public static void main(String[] args) {
        Main m1 = new Main();
        Main m2 = new Main();

        // 循环引用
        m1.obj = m2;
        m2.obj = m1;

        m1 = null;
        m2 = null;
    }
}

  The above code, create two objects m1and m2which has a property obj. And m1the objpointed m2, and m2the objpointed m1. Forming a plurality of reference rings, which is a circular reference. This garbage collector for use reference counting algorithm is a problem in that the above code last, m1and m2are set to null, they point to two objects can no longer be used, but since the two objects refer to each other, resulting in their reference count is not 0, the garbage collector will not discriminating them useless object. It is because of the existence of this problem, Javathe garbage collector basically do not use this algorithm.

(2) reachability analysis

  Reachability analysis is Javathe main method of garbage collection discrimination useless objects. The steps of this method is that, starting from the root object, use DFSor BFSalgorithm, traversing along the recursive reference, and can not be traversed to the object, the object is no longer being used, it can be garbage collected. The so-called root, is a reference type variable, as we can use directly:

  • The method parameters or local variables;
  • Static members or static member of the class;
  • Constants in the code;

  The efficiency of this method with respect to the reference count is relatively complicated, and less efficient, but solve the problem of the circular reference, it is Javaa method mainly used in garbage collection.


 How to refuse release 2.3

  Garbage release refers to remove unwanted objects, release the memory space occupied by them, to facilitate continued use. This introduces three ways:

  • Mark - sweep algorithm;
  • Replication algorithm;
  • Mark - Collation Algorithm;

  Three algorithms depending on the specific circumstances, with the use of in order to play the best results. Here's to introduce one by one.


(1) The mark - sweep algorithm (Mark-Sweep)

  Mark - sweep is more than the above three algorithms in the most basic kind, why it is the most basic, because its principle is very simple. Italian name suggests, this algorithm is divided into two steps: (1) markers; (2) clear.

  • Mark : Mark refers to what we said above reachability analysis, using said before reachability analysis algorithm traverse the object, all unreachable objects will be marked as spam, wait for recovery;
  • Clear : This step is very simple, direct release of garbage objects share memory space;

  This algorithm has two problems:

  1. Low efficiency, marking and clearance of these two steps are relatively low efficiency, low clearance efficiency because of the need to scan the entire memory space occupied by an object-by-release memory;
  2. After using this algorithm garbage, it will cause a lot of memory fragmentation, it may appear more memory remaining, but no large contiguous space, resulting in a large space object can not be assigned, triggering garbage collection again;

  Let's look at the effect of this algorithm by two comparison chart. We can see by the image below, after the garbage collection caused a lot of memory fragmentation.


(2) replication algorithm (Copying)

  In order to solve the low efficiency and memory fragmentation problems, it was suggested that a new algorithm - replication algorithm. The algorithm principle is: the memory is divided into two areas of equal size, a storage object a reservation. When storing the target area can not be allocated piece of space to copy all still alive objects to the area reserved for the piece, and then direct the release of all the memory currently in use in the region. As a result, the object is still alive into the reserved area, and garbage objects have been released. Meanwhile, the space is cleared before it is used, has become a new reserved area, and became a space before the reserved area is used, and thus the cycle of using two spaces.

  As we mentioned earlier, heap memory is divided into the old and the new generation's. In the new generation, each time garbage collection can release large amounts of objects, only a few survive, so only a small part of the object to be copied to a reserved area, which also means that copy will not be too time-consuming. In addition, the direct release of space being used all the memory, than the efficiency of a section of the release should be much higher. At the same time, the object is copied to another area, are neatly placed, so no memory fragmentation, it is possible to simply allocate more space. So, copy algorithm efficiency is much higher than the mark - sweep algorithm. The following is a chart presentation copy algorithm:

  However, there is a problem here, replication algorithm the memory area is divided into two equal parts, it also means that each has half the space can not be used, it would be too wasted. So, for the division of space, the need to make some improvements. IBMResearch shows that 98%objects survival time is very short, so there is no need to retain half of the space for replication. In a practical implementation, the space will be divided into three regions, a larger Edenspace, as well as two smaller Survivorspaces. When allocating space for the new object, first of all be assigned to a Edenspace, if Edenwhen space is no longer assigned space, will trigger garbage collection, this time, will Edencopy live objects in space into which a Survivorspace, and then emptied Edenspace. When Edenspatial again due to inability to allocate space to trigger garbage collection, it will be Edenlive objects in space, and the last time it was copied into the Survivorlive objects in space, are copied to another piece of Survivorspace, and then Edenone on and Survivoremptied. That is, alternately using two Survivorspaces, the garbage collector to store the object in any natural survival. In a specific implementation, a ratio of these three transport space is 8:1:1, that is to say only 10%the space can not be used.

  As can be seen, this algorithm are short, the efficiency is very high in most of the lifetime of the object, but if most of the objects of the life cycle is very long, no longer apply, so the algorithm generally only be used in the new generation . Here we have to consider a question, when we used to say when the above memory is divided into three blocks this way, there may be a problem: If after a certain time garbage collection, there are still a large number of objects to survive, this time a Survivorlack of space to store these objects how to do? This time we need to have the space to do another guarantee, and when this happens, these objects will be placed in another space, that space is called the security space . As we go to the bank loans, the need for a guarantor, when people can not repay the loan, the guarantor repaid. Above algorithm is used in the new generation, the so-called guarantee space, in fact, is the old era. On behalf of the elderly algorithm provides guarantees, but in most cases, Survivorare able to meet the demand.


(3) mark - finishing (Mark-Compact)

  As the old era of objects are generally longer than the survival time, it is not suitable for garbage collection algorithm using the above copy in the old era. And it was according to the characteristics of old age, proposed mark - Collation Algorithm, pay attention to look at here is consolidation , rather than the first algorithm clear. This algorithm is also divided into two steps mark and finishing, marking the first step and the algorithm is the same, the key is the finishing step. The so-called finishing, is still alive in the memory of the object to move to one side, until those objects move towards each other, neatly arranged, and then directly to clear all the memory does not belong to this part. Mark - sort of benefit is to solve the problem of memory fragmentation. The following is a presentation charts of the algorithm:


(4) generational collection algorithm

  Generational collection algorithm is not a new idea, but rather the use of the three algorithms above. Also mentioned earlier, for the convenience of garbage collection, the heap memory is generally divided into the new generation and the old year into two parts.

  • For the new generation, this is an area subject survival time is short, and each time the garbage collector can reclaim most of the memory, it is suitable for use replication algorithm, while the old space years as a guarantee of the algorithm;
  • For old time, every time garbage collector can only release a small part of the space, the use of replication algorithm, each will need to do a lot of duplication, but this time the Survivorneed for a larger space, so not suitable for use replication algorithm, so the old year in general, the use of mark - sweep or mark - collation algorithm;


Third, the summary

  On the face of JVMthe garbage collection algorithm to do a more detailed description, I believe after reading this blog post would be part of this deeper understanding. However, in the final analysis, only the contents of the above theory, then I will write a blog, to say something about JVMspecifically how to allocate and release objects, as a JVMseries of blog Title III.


Fourth, the reference

  • "In-depth understanding of the Java Virtual Machine"

Guess you like

Origin www.cnblogs.com/tuyang1129/p/12508216.html