The beauty of Java: JVM memory management and garbage collection

In many Java interviews, questions about Java garbage collection will be asked. It is mentioned that garbage collection must involve the JVM memory management mechanism. The execution efficiency of the Java language has always been ridiculed by C and C++ programmers. In fact, this is the case. Java is indeed very low in execution efficiency. On the one hand, the Java language adopts object-oriented thinking, which also determines that it must have high development efficiency and low execution efficiency. On the other hand, the Java language makes a good promise to the programmer: the programmer does not need to manage memory, because the JVM has garbage collection (GC), which will automatically perform garbage collection.

actually not:

1. Garbage collection does not perform GC at any time according to the requirements of the programmer.

2. Garbage collection does not clean up memory in a timely manner, although sometimes programs require additional memory.

3. Programmers cannot control garbage collection.

Because of the above facts, when we write programs, we can only arrange memory reasonably according to the rules of garbage collection, which requires us to thoroughly understand the memory management mechanism of JVM, so that we can control the program as we like! This chapter is about JVM memory management and garbage collection in the beauty of Java [evolution from rookie to expert] series . After learning this chapter, readers will have a basic understanding of JVM.

 

First, the structure of JVM memory

The Java virtual machine divides memory into several different management areas, each of which has its own purpose, undertakes different tasks and uses different algorithms during garbage collection according to different characteristics. The whole is divided into the following parts:

Program Counter Register, JVM Stacks, Native Method Stacks, Heap, Method Area

As shown below:

1. Program Counter Register

This is a relatively small piece of memory, not on the RAM, but directly on the CPU. Programmers cannot directly manipulate it. Its function is: when the JVM interprets the bytecode file (.class), it stores the data of the current thread. The line number of the executed bytecode is just a conceptual model. Various JVMs use different methods. When the bytecode interpreter works, it selects the next instruction to be executed by changing the value of the program counter, branching, Basic functions such as looping, jumping, etc. are all completed by relying on this technical area. There is another situation, which is what we often call Java multi-threading. Multi-threading is achieved by switching the current process in turn. At the same time, a core can only execute one instruction. Therefore, for each program, there must be A counter to record the execution progress of the program, so that when the current process resumes execution, it can start from the right place. Therefore, each thread must have an independent program counter, which is a thread-private memory. If a thread is executing a Java method, the counter records the address of the bytecode instruction. If a native method is executed, the counter record is empty. This memory area is the only one that does not have any OutOfMemoryError in the Java specification. Area.

2, JVM virtual machine stack (JVM Stacks)

The JVM virtual machine stack is what we often call the stack of the stack (we often roughly divide the memory into heap and stack). Like the program counter, it is also private to the thread. The life cycle is the same as that of the thread. When each method is executed, it will generate A stack frame that stores information such as local variable tables, dynamic links, operands, method exits, etc. The execution process of the method is the process of popping and pushing the stack frame in the JVM. The local variable table stores various basic data types, such as boolean, byte, char, etc., and reference types (which store the memory addresses pointing to each object). Therefore, it has a feature: the memory space can be stored in It is determined at compile time, not changed at run time. There are two possible Java exceptions to this memory area: StackOverFlowError and OutOfMemoryError.

3. Native Method Stacks

As can be seen from the name, the native method stack is used to process native methods in Java. There are many Native methods in Object, the ancestor class of Java classes, such as hashCode(), wait(), etc. Their execution is often performed with the help of It depends on the operating system, but the JVM needs to make some specifications for them to handle their execution process. In this area, there can be different implementation methods. For our commonly used Sun's JVM, the native method stack and the JVM virtual machine stack are the same.

4. Heap

Heap memory is the most important piece of memory, and the one that needs to be studied the most. Because the optimization of Java performance is mainly for this part of the memory. All object instances and arrays are allocated on the heap (with the gradual maturity of JIT technology, this sentence seems a bit absolute, but at least it is basically the case), and the heap can be controlled by -Xmx and -Xms the size of. The development of JIT technology has produced new technologies, such as stack allocation and scalar replacement. Maybe in the near future, just-in-time compilation will be born and mature. At that time, "all object instances and arrays are allocated on the heap. "This sentence should be slightly changed. Heap memory is the main area of ​​garbage collection, so it will be highlighted in the garbage collection section below, and only conceptual explanations will be given here. Up to 2G on 32-bit systems, unlimited on 64-bit systems. It can be controlled by -Xms and -Xmx. -Xms is the minimum heap memory applied for when the JVM starts, and -Xmx is the maximum heap memory that the JVM can apply for.

5. Method Area

 The method area is a memory area shared by all threads, which is used to store data such as class information, constants, and static variables that have been loaded by the JVM. Generally speaking, the method area belongs to the persistent generation (about the persistent generation, it will be introduced in detail in the GC section, except for Persistent generation, as well as young and old generation), it is no wonder that the Java specification describes the method area as a logical part of the heap, but it is not the heap. Garbage collection of the method area is tricky, and even Sun's HotSpot VM isn't perfect at this. An important concept in the method area is introduced here: the runtime constant pool. It is mainly used to store literals (literal literals are simply constants) and references generated during compilation. In general, the memory allocation of constants can be determined during compilation, but not necessarily all of them. There are some possibilities that constants can also be put into the constant pool at runtime. For example, there is a Native method intern() in the String class. <About intern (), please see another article: http://blog.csdn.net/zhangerqing/article/details/8093919 >

A memory area outside the JVM memory management is added here: direct memory. In JDK1.4, the NIO class is newly added, and an I/O method based on channels and buffers is introduced. It can use the Native function library to directly allocate off-heap memory, which is what we call direct memory, so that in some cases The scene will improve the performance of the program.

2. Garbage recycling

There is a good saying: There is a wall between Java and C++ with memory allocation and garbage collection technology. People outside the wall want to go in, and people inside the wall want to go out! The meaning of this sentence is left to the readers to figure out for themselves. In general, C and C++ programmers sometimes suffer from memory leaks, and memory management is a headache, but Java programmers, and envious of C++ programmers, can control everything by themselves, so that they will not be in memory management. It seems helpless in terms of aspects, but this is the case. As Java programmers, it is difficult for us to control the memory recovery of the JVM. We can only adapt according to its principles and try to improve the performance of the program. Let's start to explain Java garbage collection, namely Garbage Collection, GC. from the following four aspects:

1. Why garbage collection?

As the program runs, more and more memory is occupied by the instance objects, variables and other information in the memory. If garbage collection is not performed in time, it will inevitably lead to a decline in program performance, and even cause some unnecessary problems due to insufficient available memory. system exception.

2. Which "garbage" needs to be recycled?

Among the five areas we introduced above, three do not need garbage collection: the program counter, the JVM stack, and the native method stack. Because their life cycle is synchronized with the thread, as the thread is destroyed, the memory they occupy will be automatically released, so only the method area and the heap need to be GC. Specific to which objects, a brief overview: if an object no longer has any references, it can be recycled. A popular explanation means that if an object has no effect, it can be recycled as waste.

3. When does garbage collection take place?

According to a classic reference counting algorithm, each object adds a reference counter. Each time it is referenced, the counter is incremented by 1. If the reference is lost, the counter is decremented by 1. When the counter remains 0 for a period of time, the object is considered to be available for use. Recyclable. However, this algorithm has obvious flaws: when two objects refer to each other, but they have no effect, they should be garbage collected as a rule, but they refer to each other and do not meet the conditions for garbage collection, so they cannot be perfect This memory cleanup is handled, so Sun's JVM does not use a reference counting algorithm for garbage collection. Instead, a so-called root search algorithm is used, as shown below:

The basic idea is: start with an object called GC Roots and search downwards. If an object cannot reach the GC Roots object, it means that it is no longer referenced and can be garbage collected There are some differences in fact. When an object is no longer referenced, it is not completely "dead". If the class overrides the finalize() method and has not been called by the system, the system will call the finalize() method once to Complete the final work. During this period, if the object can be re-associated with any object that has a reference to GC Roots, the object can be "reborn", if not, then it can be completely recycled), as above Object5, Object6, and Object7 in the figure, although they may still refer to each other, but generally speaking, they have no effect, which solves the problem that the reference counting algorithm cannot solve.

The concept of supplementary reference : After JDK 1.2, the reference has been expanded, and four kinds of references, strong, soft, if, and virtual, are introduced. The objects marked as these four kinds of references have different meanings during GC:

    a> Strong Reference. It is a reference added to an object that has just been new. Its characteristic is that it will never be recycled.

    b> Soft Reference. A class declared as a soft reference is an object that can be recycled. If the JVM memory is not tight, such objects can not be recycled. If the memory is tight, it will be recycled. There is a question here, since the object that is referenced as a soft reference can be recycled, why not do it? In fact, we know that there is a caching mechanism in Java. Take the literal cache as an example. Sometimes, the cached objects are dispensable at present, but they just stay in the memory. If there is still a need, there is no need to reallocate the memory. Therefore, these objects can be referenced as soft references, which is convenient to use and improves program performance.

    c> Weak Reference. Objects with weak references must be garbage collected. No matter whether the memory is tight or not, when GC is performed, objects marked as weak references will be cleaned up and recycled.

    d> Phantom Reference. The weak Phantom Reference can be ignored. The JVM does not care about the Phantom Reference at all. Its only function is to make some trace records to assist the use of the finalize function.

Finally, what kind of classes need to be recycled? Useless class, what is useless class? The following requirements must be met:

   1> All instance objects of this class have been recycled.

   2> The ClassLoader that loaded the class has been recycled.

   3> The reflection class java.lang.Class object corresponding to this class is not referenced anywhere.

4. How to perform garbage collection?

The content of this block is mainly to introduce the garbage collection algorithm, because as we have introduced earlier, the memory is mainly divided into three blocks, the new generation, the old generation, and the persistent generation. The characteristics of the three generations are different, resulting in different GC algorithms. The new generation is suitable for objects with a short life cycle and frequently created and destroyed, while the old generation is suitable for objects with a relatively long life cycle. The persistent generation in Sun HotSpot refers to Method area (some JVMs do not have persistent generation at all). First, the concepts and characteristics of the new generation, the old generation and the persistent generation are introduced:

New Generation: New Generation or Young Generation. The above is roughly divided into the Eden area and the Survivor area, and the Survivor area is divided into two parts of the same size: FromSpace and ToSpace. The newly created objects are allocated memory with the new generation. When the Eden space is insufficient, the surviving objects will be transferred to the Survivor. The size of the new generation can be controlled by -Xmn, or -XX:SurvivorRatio can be used to control Eden and Survivor Proportion.
Old Generation: Old Generation. It is used to store objects in the young generation that are still alive after multiple garbage collections, such as cache objects. The old generation occupied size is -Xmx value minus the value corresponding to -Xmn.

Persistent generation: Permanent Generation. In Sun's JVM, it means method area, although most JVMs do not have this generation. It mainly stores some information of constants and classes. The default minimum value is 16MB, and the maximum value is 64MB. The minimum and maximum values ​​can be set by -XX:PermSize and -XX:MaxPermSize.

Common GC algorithms:

Mark-Sweep

The most basic GC algorithm marks the objects that need to be recycled, then scans them, and recycles the marked ones, which results in two steps: marking and clearing. This algorithm is not efficient, and will generate memory fragmentation after the cleanup is completed. In this way, if there are large objects that require contiguous memory space, they need to be defragmented. Therefore, this algorithm needs to be improved.

Copying algorithm (Copying)

As we talked about earlier, the new generation memory is divided into three parts, the Eden area and 2 Survivor areas. Generally, Sun's JVM will adjust the ratio of the Eden area to the Survivor area to 8:1 to ensure that one Survivor area is free, so that , during garbage collection, put the objects that do not need to be recycled in the free Survivor area, and then completely clean up the Eden area and the first Survivor area, so there is a problem, that is, if the space of the second Survivor area is not enough What about big? At this time, it is necessary to temporarily borrow the memory of the persistent generation when the Survivor area is not enough. This algorithm applies to the young generation .

Mark-compact (or called compression) algorithm (Mark-Compact)

It is the same as the first half of the mark-clear algorithm, except that after marking the objects that do not need to be collected, move the marked objects together to make the memory continuous. In this way, only the memory outside the marked boundary is cleaned up. This algorithm is suitable for persistent generation .

Common garbage collectors: 

According to the many algorithms mentioned above, the JVM has different implementations every day. Let's take a look at some common garbage collectors:

We first introduce three practical garbage collectors: Serial GC (SerialGC), Parallel Scavenge (Parallel Scavenge), and Parallel GC (ParNew).

1. Serial GC. It is the most basic and oldest collector, but it is still widely used now. It is a single-threaded garbage collection mechanism, and not only that, its biggest feature is that all executing threads need to be suspended during garbage collection. (Stop The World), this is unacceptable for some applications, but we can think of it this way, as long as we can control the pause time within N milliseconds, most applications are still acceptable, and The fact is that it did not let us down. The pause of tens of millimeters is completely acceptable for us as a client . The collector is suitable for applications with a single CPU, a small space in the new generation and not very high requirements for pause time. Above, it is the default GC method at the client level, which can be specified by -XX:+UseSerialGC.

2. ParNew GC. Basically the same as Serial GC, but the essential difference is that multi-threading mechanism is added to improve efficiency, so that it can be used on the server side (Server), and it can cooperate with CMS GC, so it is more reasonable to set it on the Server side.

3. Parallel Scavenge GC. The entire scanning and copying process is carried out in a multi-threaded manner, which is suitable for applications with multiple CPUs and short pause time requirements. XX:ParallelGCThreads=4 to specify the number of threads. The following sets of combinations are given:


4. CMS (Concurrent Mark Sweep) collector. The goal of this collector is to solve the pause problem of Serial GC in order to achieve the shortest collection time. Common B/S architecture applications are suitable for this collector because of its high concurrency and high response characteristics. The CMS collector is implemented based on the "mark-sweep" algorithm, and the entire collection process is roughly divided into four steps:

Initial mark (CMS initial mark), concurrent mark (CMS concurrenr mark), remark (CMS remark), concurrent clear (CMS concurrent sweep).

Among them, the two steps of initial marking and re-marking still need to stop other user threads. The initial mark is only to mark the objects that GC ROOTS can directly associate with, which is very fast. The concurrent mark stage is the stage of the GC ROOTS root search algorithm, which will determine whether the object is alive. The re-marking phase is to correct the marking records of the part of the object whose marking is changed due to the user program continuing to run during the concurrent marking. The pause time of this phase will be slightly longer than the initial marking phase, but shorter than the concurrent marking phase. . Since the collector thread can work together with the user thread during the longest concurrent marking and concurrent clearing process in the whole process, overall, the memory recovery process of the CMS collector is performed concurrently with the user thread.

Advantages of CMS collectors: concurrent collection, low pauses, but CMS is far from perfect.

CMS collectors have three major disadvantages :

a> .The CMS collector is very sensitive to CPU resources. In the concurrent phase, although the user thread will not be paused, it will occupy the CPU resources and cause the reference program to slow down and the total throughput to decrease. The number of recycling threads started by CMS by default is: (number of CPUs + 3) / 4.

b> The .CMS collector cannot handle floating garbage, and "Concurrent Mode Failure" may occur, which leads to another Full GC. Since the user thread is still running in the concurrent cleaning phase of the CMS, new garbage will be continuously generated with the running of the program. This part of the garbage appears after the marking process, and the CMS cannot process them in this collection, so it has to wait for the next GC. Clean it up. This part of garbage is called "floating garbage". It is also because the user thread still needs to run during the garbage collection stage, that is, enough memory space needs to be reserved for the user thread to use, so the CMS collector cannot wait until the old age is almost completely filled before collecting it like other collectors. Part of the memory space is reserved for program operation during concurrent collection. By default, the CMS collector will be activated when the old generation uses 68% of the space. You can also provide the trigger percentage through the value of the parameter -XX:CMSInitiatingOccupancyFraction to reduce the number of memory recycling and improve performance. If the memory reserved during the running of the CMS cannot meet the needs of other threads of the program, a "Concurrent Mode Failure" failure will occur. At this time, the virtual machine will start the backup plan: temporarily enable the Serial Old collector to restart the old generation garbage collection, so that The pause time is long. Therefore, if the parameter -XX:CMSInitiatingOccupancyFraction is set too high, it will easily lead to the failure of "Concurrent Mode Failure", and the performance will be reduced.

c> . The last disadvantage is that CMS is a collector based on the "mark-sweep" algorithm. After collecting using the "mark-sweep" algorithm, a large number of fragments will be generated. When there are too many space fragments, it will bring a lot of trouble to object allocation. For example, for large objects, the memory space cannot find a continuous space to allocate and must trigger a Full GC in advance. In order to solve this problem, the CMS collector provides a -XX:UseCMSCompactAtFullCollection switch parameter to add a defragmentation process after the Full GC. You can also use the -XX:CMSFullGCBeforeCompaction parameter to set how many times to perform a full GC without compression, followed by to a defragmentation process.

5. G1 collector. Compared with the CMS collector, there are many improvements. First, it is based on the mark-sorting algorithm, which will not cause memory fragmentation problems. Second, it can control the pause more accurately, which will not be described in detail here.

6. Serial Old. Serial Old is the older version of the Serial collector, which also uses a single thread to perform the collection, using the "mark-collate" algorithm. Mainly use virtual machines in Client mode.

7. Parallel Old. Parallel Old is an older version of the Parallel Scavenge collector, using multithreading and a "mark-and-clean" algorithm.

8. The RTSJ garbage collector, used for Java real-time programming, will be introduced later.

3. Java program performance optimization

gc() call

Calling the gc method implies that the Java virtual machine does some effort to reclaim unused objects so that the memory currently occupied by those objects can be quickly reused. When control returns from a method call, the virtual machine has done its best to reclaim space from all discarded objects, and calling System.gc() is equivalent to calling Runtime.getRuntime().gc().

Finalize() call and rewrite

gc can only clear the memory allocated on the heap (all objects in the pure Java language use new to allocate memory on the heap), but cannot clear the memory allocated on the stack (when using JNI technology, memory may be allocated on the stack, For example, when java calls a c program, and the c program uses malloc to allocate memory). Therefore, if some objects are allocated to the memory area on the stack, then gc will not care, and the memory reclamation of the objects on the stack will rely on finalize(). For example, when java calls a non-java method (which may be C or C++), the C's malloc() function may be called inside the non-java code to allocate memory, and unless the free is called () Otherwise, the memory will not be released (because free() is a function of c). At this time, the work of releasing the memory must be performed, and gc will not work, so you need to call free() in an inherent method inside finalize().

good programming habits

(1) Avoid creating objects in the loop body, even if the object does not take up much memory space.
(2) Try to make the object meet the garbage collection standard in time.
(3) Do not use too deep inheritance hierarchy.
(4) Accessing local variables is better than accessing variables in a class.

This section will be constantly updated!

4. Frequently Asked Questions

1. Memory overflow

That is, the Java virtual machine memory you requested to allocate exceeds what the system can give you, and the system cannot meet the demand, so an overflow occurs.
2. Memory leak

It means that you apply to the system to allocate memory for use (new), but it is not returned (delete) after use. As a result, you can no longer access the memory you applied for, and the memory that has been allocated in the block can no longer be accessed. Use, with the continuous consumption of server memory, more and more unusable memory is available, and the system cannot allocate it to required programs again, resulting in leaks. If it goes on, the program will gradually run out of memory and overflow.


Reprinted in ( http://blog.csdn.net/zhangerqing ), thank you!

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326272137&siteId=291194637
Recommended