JVM study notes - in-depth understanding of the whole process of JVM object allocation, layout and access in the Java heap

Note: Reference book "In-depth Understanding of Java Virtual Machine JVM Advanced Features and Best Practices 2nd Edition" (Zhou Zhiming)

In the previous "JVM Study Notes - Java Memory Area", the runtime data area of ​​the Java virtual machine has been introduced. After understanding what is placed in the memory, we need to further understand other details of the data in the memory of the virtual machine. Such as how they are created, laid out, and accessed. We limit the scope of discussion to the commonly used HotSpot virtual machine, take the commonly used memory area Java heap as an example, and deeply discuss the whole process of object allocation, layout and access in the Java heap of the HotSpot virtual machine.

1.1 Object Creation

The objects discussed in this article are limited to ordinary Java objects, excluding arrays and Class objects.
When the virtual machine encounters a new instruction, it will first check whether the parameters of this instruction can locate a symbolic reference of a class in the constant pool, and check whether the class represented by this symbolic reference has been loaded, resolved and initialized. If not, the corresponding class loading process must be performed first.
After the class check passes, the virtual machine next allocates memory for the nascent object.
The size of the memory required by the object is completely determined after the class is loaded, and (as will be mentioned below) the task of allocating space for the object is equivalent to dividing a certain size of memory from the Java heap.
There are two ways to allocate memory in the Java heap, depending on whether the Java heap is regular:

  1. Bump the Pointer: Assuming that the memory in the Java heap is absolutely regular, all the used memory is placed on one side, the free memory is placed on the other side, and a pointer is placed in the middle as an indicator of the demarcation point. Allocating memory is simply moving that pointer to free space a distance equal to the size of the object.
  2. Free List: If the memory in the Java heap is not regular, and the used memory and the free memory are interleaved, there is no way to simply collide with the pointer, and the virtual machine must maintain a list to record which ones The memory block is available, find a large enough space from the list to divide the object instance at the time of allocation, and update the record on the list.

Which allocation method is selected depends on whether the Java heap is regular, and whether the Java heap is regular is determined by whether the garbage collector used has a compaction function. Therefore, when using a collector with a Compact process such as Serial and ParNew, the allocation algorithm adopted by the system is pointer collision, while when using a collector based on the Mark-Sweep algorithm such as CMS, a free list is usually used.
In addition to how to divide the available space, there is another problem that needs to be considered. Object creation is a very frequent behavior in the virtual machine. Even just modifying the location pointed to by a pointer is not thread-safe under concurrent conditions. , it may happen that memory is being allocated to object A, the pointer has not had time to be modified, and object B also uses the original pointer to allocate memory at the same time.
There are two solutions to this problem:

  1. Synchronize the action of allocating memory space - in fact, the virtual machine uses CAS (Compare And Switch) coupled with failed retry to ensure the atomicity of update operations.
  2. The action of memory allocation is divided into different spaces according to threads, that is, each thread pre-allocates a small piece of memory in the Java heap, which is called Thread Local Allocation Buffer (TLAB), which thread needs to allocate The memory is allocated on which thread's TLAB, and only when the TLAB is used up and a new TLAB is allocated, a synchronization lock is required.

After the memory allocation is completed, the virtual machine needs to initialize the allocated memory space to a zero value (excluding the object header). If TLAB is used, this work process can also be performed in advance of TLAB allocation. This step ensures that the instance fields of the object can be used directly in Java code without assigning initial values, and the program can access the zero values ​​corresponding to the data types of these fields.
Next, the virtual machine needs to make necessary settings for the object, such as which class instance the object is, how to find the metadata information of the class, the hash code of the object, and the GC generation age of the object. This information is stored in the Object Header (mentioned below) of the object. Depending on the current running state of the virtual machine, such as whether to enable biased locks or lightweight locks, the object headers will be set differently.
After the above work is completed, from the perspective of the virtual machine, a new object has been created, but from the perspective of the Java program, the object creation has just begun - the method has not yet been executed, the method is performed in a class Called when the object is instantiated, all fields are still zero. Therefore, in general, after executing the new instruction, the method will be executed, and the object will be initialized according to the programmer's wishes, so that a truly usable object is completely generated.

1.2 Memory layout of objects

In the HotSpot virtual machine, the layout of objects stored in memory can be divided into three areas: object header (Header), instance data (Instance Data), and alignment padding (Padding).
The object header of the HotSpot virtual machine includes two parts of information. The first part is used to store the runtime data of the object itself, such as hash code, GC generation age, lock status flag, locks held by threads, biased thread ID, biased timestamp, etc. , the length of this part of the data is 32bit and 64bit respectively in 32-bit and 64-bit virtual machines (without opening the compressed pointer), which is officially called "Mark Word". The object needs to store a lot of runtime data, which has exceeded the limit that can be recorded by the 32-bit and 64-bit Bitmap structure, but the object header information is an additional storage cost independent of the data defined by the object itself, considering the space efficiency of the virtual machine. , Mark Word is designed as a non-fixed data structure in order to store as much information as possible in a very small space, it will reuse its own storage space according to the state of the object.
For example, in a 32-bit HotSpot virtual machine, if the object is in an unlocked state, 25 bits of the 32-bit space of Mark Word are used to store the object hash code, 4 bits are used to store the generation age of the object, and 2 bits are used to store the object's generation age. For the storage lock flag bit, 1bit is fixed to 0, and the storage content of the object in other states is as follows:
Object header MarkWord
The other part of the object header is the type pointer, that is, the pointer of the object to its class metadata, and the virtual machine uses this pointer to determine Which class this object is an instance of. If the object is a Java array, there must also be a piece of data to record the length of the array in the object header, because the virtual machine can determine the size of the Java object through the metadata information of ordinary Java objects, but from the metadata of the array Unable to determine the size of the array.
The following instance data part is the effective information that the object actually stores, and it is also the field content of various types defined in the program code. Whether it is inherited from the parent class or defined in the subclass, it needs to be recorded. The storage order of this part is affected by the virtual machine allocation strategy parameter (FieldAllocationStyle) and the order in which the fields are defined in the Java source code.
The third part of alignment padding does not necessarily exist, nor does it have a special meaning, it just acts as a placeholder. Because the automatic memory management system of the HotSpot virtual machine requires that the starting address of the object must be an integer multiple of 8 bytes, in other words, the size of the object must be an integer multiple of 8 bytes. The object header part is exactly a multiple of 8 bytes. Therefore, when the instance data part is not aligned, alignment padding is required to complete it.

1.3 Access positioning of objects

To create an object is to use the object. Our Java program needs to operate the specific object on the heap through the reference data on the stack. Since the reference type only specifies a reference to an object in the Java virtual machine specification, and does not define how the reference should locate and access the specific location of the object in the heap, the object access method also depends on the virtual machine implementation. Depends. The current mainstream access methods are the use of handles and direct pointers.

  1. If handle access is used, then a piece of memory will be divided into the Java heap as the handle pool, the handle address of the object is stored in the reference, and the handle contains the specific address information of the object instance data and type data, as shown in the following figure shown.
    write picture description here
  2. If direct pointer access is used, then the layout of the Java heap object must consider how to place the relevant information of the access type data, and the object address is directly stored in the reference, as shown in the following figure.
    write picture description here
    These two object access methods have their own advantages. The biggest advantage of using handles to access is that the stable handle address is stored in the reference. When the object is moved (moving objects is a very common behavior during garbage collection), only the handle will be changed. Instance data pointer, and the reference itself does not need to be modified.
    The biggest advantage of using the direct pointer access method is that it is faster. It saves the time overhead of a pointer positioning. Since object access is very frequent in Java, this kind of overhead is also a very considerable execution cost. .

The above is an in-depth analysis of the whole process of JVM object allocation, layout and access in the Java heap.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324604032&siteId=291194637