Explain the garbage collection mechanism of JVM in detail

What is garbage collection?

Literally, the literal meaning is to find garbage objects and throw them away; the fact is just the opposite, garbage collection is to find out the active objects and mark the remaining objects as garbage objects. Based on this theory, we describe the automatic garbage collection mechanism of java virtual machine in detail.

Let's start with some basic properties, concepts, and methods of garbage collection, without going straight to the topic: the JVM's garbage collection mechanism.

Disclaimer:   This article mainly discusses the garbage collection mechanism of Oracle HotSpot and OpenJDK. As for some other JVMs, such as JRockit or IBM J9, they may use different mechanisms for garbage collection.

Manual memory management

Before introducing modern automatic garbage collection mechanisms, let's go back to the days of manually allocating memory for data and then manually reclaiming that memory. In this day and age, if you forget to free the allocated memory, you can't reuse the memory space. Although this memory has been declared and allocated, it can no longer be used. We call such a scenario: memory leak .

The following is a purchase of manual memory management developed in C language:

int send_request() {
    size_t n = read_size();
    int *elements = malloc(n * sizeof(int));

    if(read_elements(n, elements) < n) {
        // elements not freed!
        return -1;
    }

    // …

    free(elements)
    return 0;
}

From the code, we can see that it is easy to forget to release the memory, which can easily cause memory leaks. After a memory leak occurs, we can only find the specific cause by checking the code. Therefore, the best way is to have a mechanism to automatically reclaim memory that is no longer used, thereby reducing the probability of human error. This automatic memory recovery mechanism is called Garbage Collection (referred to as: GC).

smart pointer

One of the most straightforward ways to automate garbage collection is to use a destructor. For example, we can use the vector in C++ to implement, when the variable is detached from the vector, the destructor will be automatically called to reclaim the memory:

int send_request() {
    size_t n = read_size();
    vector<int> elements = vector<int>(n);

    if(read_elements(elements.size(), &elements[0]) < n) {
        return -1;
    }

    return 0;
}

However, in some complex scenarios, especially when shared objects are referenced by multiple threads at the same time, the use of destructor alone obviously cannot meet the requirements. Based on the above scenarios, the easiest way is to count the references to the object. For each object, we will record how many times the object is currently referenced. When the number of references becomes 0, it indicates that the memory occupied by this object can be recycled. A well-known implementation is to use C++'s shared pointers:

int send_request() {
    size_t n = read_size();
    auto elements = make_shared<vector<int>>();

    // read elements

    store_in_cache(elements);

    // process elements further

    return 0;
}

In order to avoid the element being read again when the function is called, we can put it into the cache. In this case, it is not feasible to use vector to destroy the object. Therefore, we use shared_ptr instead , which will keep an eye on the reference count of the element. When the pointer is referenced elsewhere, the reference count will be incremented by one, and when the pointer is freed, the reference count will be decremented by one. When the reference count is decremented to 0, the shared_ptr deletes the associated vector.

Automated memory management

Looking at the C++ code above, we can see when we need to care about memory management. But what if we applied this mechanism to all objects? This will make our work very easy, because, as developers, we no longer have to worry about garbage collection. During runtime, objects that are no longer in use are automatically detected and cleared to free up memory space. In other words, automatic garbage collection. The first garbage collector was in the LISP language, which was born in 1959. In the following days, this technology is constantly improving.

reference counting

Many languages, such as Perl, PHP, and Python, use the C++ shared pointer method mentioned above for garbage collection. It may be more clearly shown by the following figure:

The green cloud area indicates that the objects referenced by them are still being used by the program. These objects may be local variables in the current running method, some static variables, or some other objects. There may be differences in this from language to language, but those are not the point.

The blue link represents the active object call link in memory, and the circled number identifies the number of times the object is currently referenced. The grey chains represent objects that are no longer referenced explicitly (meaning those referenced by the green cloud). The gray objects are garbage objects that can be collected by the garbage collector.

This method looks really good, but it has a fatal flaw, that is, it will generate a separate garbage closed loop. The above objects are actually meaningless, but their reference counts are not zero. As shown below:

The red closed loop in the figure has actually become a garbage object that is no longer used by the application, but since the number of references is not zero, it cannot be recycled, resulting in a memory leak problem.

There are many ways to solve this problem, such as the use of weak references ('weak'reference), or a separate algorithm for garbage collection of closed-loop links. The Perl, Python, and PHP we mentioned above all use one of these methods for garbage collection. We will not describe their implementation in detail, but will interpret the mechanism of the JVM in detail in subsequent chapters.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325164999&siteId=291194637