Chapter 10 Java Object Layout

Object memory layout

Our objects are generally stored in our heap memory. We can divide instance objects into object headers, instance data, and alignment padding.

  • Object header : There are two types of information 对象标记and类元信息
    1. Store the runtime data of the object itself, such as hash code, GC generation age, lock status, locks held by threads, biased thread ID, etc. Store these data in Mark Word
    2. The other part stores the type pointer, which is the pointer of the object to its type metadata. The Java virtual machine uses this pointer to determine which class the object is an instance of.
  • Instance Data : stores the real and effective information of the object
  • Alignment padding (Padding) : For byte alignment, padding data is not necessary.

img

Problem introduction

public class Demo01 {
    
    
    public static void main(String[] args) {
    
    
        Object o = new Object();//?new 一个对象,内存占多少,记录在哪里?

        System.out.println(o.hashCode());//356573597,这个hashCode又是记录在哪里的

        synchronized (o){
    
    //加锁信息又是记录在哪里的

        }
        System.gc();//手动垃圾收集中,15次可以从新生代到养老区,那这个次数又是记录在哪里的
    }
}

  • These questions are stored in object tags

object header

We can find its description in the official Hotspot documentation (picture below). It can be found that it is a common format for both Java objects and virtual machine internal objects, and consists of two words (computer terminology). In addition, if the object is a Java array, there must be a piece of data in the object header to record the length of the array, because the virtual machine can determine the size of the Java object through the metadata information of ordinary Java objects, but from the metadata of the array The size of the array cannot be determined.

img

It mentioned that the object header consists of two words . What are these two words? We still look up in the official Hotspot document above, and we can find the definitions and explanations of two other nouns, namely mark word and klass pointer .

Mark Word

Used to store the runtime data of the object itself, such as hash code (HashCode), GC generation age, lock status flag, lock held by the thread, biased thread ID, biased timestamp, etc.

The length of Mark Word in a 32-bit JVM is 32 bits, and in a 64-bit JVM the length is 64 bits. Because the data to be stored in the object header has exceeded the 64-bit limit, Mark Word was designed taking into account the space efficiency of the virtual machine. into a dynamically defined data structure to store as much data as possible in a very small memory space and reuse our storage space according to the state of the object

This is how it is stored in the 32-bit JVM

img

  • There are very few 32-bit ones, just take a look, mainly 64-bit ones

This is how it is stored in a 64-bit JVM

img

In a 64-bit system, MarkWord occupies 8 bytes and the type pointer occupies 8 bytes, a total of 16 bytes.

  • Lock flag (lock) : distinguishes the lock status, 11 indicates that the object is waiting for GC recycling, only the last 2 digits of the lock flag (11) are valid.
    • 01 means no lock or biased lock
    • 00 means lightweight lock
    • 10 means heavyweight lock
    • 11 indicates the state of the object to be recycled
  • biased_lock : Whether to bias the lock. Since the lock identifiers of no lock and biased lock are both 01, there is no way to distinguish them. Here, a one-bit biased lock identifier is introduced.
  • Generational age (age) : Indicates the number of times the object has been GCed. When the number reaches the threshold, the object will be transferred to the old generation. Locked objects cannot be recycled
  • Object hashcode (hash) : Call System.identityHashCode() to calculate during runtime, delay the calculation, and assign the result here. When the object is locked, 31 bits of the calculated result are not enough to represent. In the weight lock state, the hashcode will be transferred to the Monitor. The lightweight lock will correspond to the lock record existing in the corresponding stack, and the corresponding bias lock will be the same as ours. Hashcode is mutually exclusive. If hashcode is required, there is no biased lock. We will explain in detail synchtonized later.
  • Thread ID of biased lock (JavaThread) : In biased mode, when a thread holds an object, the object here will be set to the ID of the thread. In subsequent operations, there is no need to attempt to acquire the lock.
  • Epoch : Bias lock During the CAS lock operation process, the bias indicator indicates which lock the object prefers.
  • ptr_to_lock_record : In the lightweight lock state, a pointer to the lock record in the stack. JVM uses atomic operations instead of OS mutexes. This technique is called lightweight locking. In the case of lightweight locking, the JVM sets a pointer to the lock record in the object's title word through a CAS operation.
  • ptr_to_heavyweight_monitor : Pointer to the object monitor Monitor in the heavyweight lock state. In the case of heavyweight locking, the JVM sets a pointer to the Monitor in the object's ptr_to_heavyweight_monitor.

Klass Pointer

That is, the type pointer is the pointer of the object to its class metadata. The virtual machine uses this pointer to determine which class the object is an instance of. JVM uses direct pointers

Object access positioning
Objects are created in order to use objects. Our Java program needs to operate specific objects on the heap through reference data on the stack. Since the Java virtual machine specification only stipulates that the reference type is a reference to an object, it does not define how this reference should be located and accessed to the specific location of the object in the heap. The object access method also depends on the virtual machine. Implementation dependent*. *The mainstream access methods include handles and direct pointers.
If you use handle access, a piece of memory will be divided in the Java heap as a handle pool. What is stored in the reference is the handle address of the object, and the handle contains the specific address information of the object instance data and type data. As shown in Figure 1.

img
Figure 1 Accessing objects through handles

If direct pointer access is used, the layout of the Java heap object must consider how to place the relevant information of the access type data. What is stored in the reference is directly the object address, as shown in Figure 2.

img
Figure 2 Accessing objects through direct pointers

These two methods of object access have their own advantages. The biggest advantage of using handles to access is that the stable handle address is stored in the reference. When the object is moved (moving objects during garbage collection is a very common behavior), only the address in the handle will be changed. Instance data pointer, and the reference itself does not need to be modified.
The biggest advantage of using direct pointers for access is that it is faster. It saves the time overhead of pointer positioning. Since objects are accessed very frequently in Java, this kind of overhead adds up to a very considerable execution cost. It can be seen from the object memory layout explained in the previous part that **as far as the virtual machine HotSpot is concerned, it uses the second method for object access, **but in the context of the entire software development, in various languages ​​​​and frameworks It is also very common to use handles for access.

Instance data

If the object has attribute fields, there will be data information here. If the object has no attribute fields, there will be no data here. It occupies different bytes according to different field types. For example, boolean type occupies 1 byte, int type occupies 4 bytes, oops (reference type), etc.;

Align data

Objects may or may not have alignment data. By default, the starting address of an object in the Java virtual machine heap needs to be aligned to a multiple of 8. If an object uses less than 8N bytes, it needs to be padded to make up for the remaining space after the object header and instance data occupy memory. If the object header and instance data already occupy the memory space allocated by the JVM, then there is no need to perform alignment filling.

The total SIZE of bytes allocated by all objects needs to be a multiple of 8. If the total SIZE occupied by the previous object header and instance data does not meet the requirements, it will be filled by aligning the data.

Why align data ? One reason for field memory alignment is so that fields only appear in cache lines on the same CPU. If the fields are not aligned, it is possible for fields to cross cache lines. That is, a read of this field may require the replacement of two cache lines, and a store of this field will pollute both cache lines. Both situations are detrimental to program execution efficiency. In fact, the ultimate purpose of filling it is for efficient computer addressing.

object creation

Here we only consider the new keyword (copying, deserialization, and reflection are not considered) and ordinary objects (excluding class objects and array objects)

1) Check whether there is class information corresponding to the object

When our Java virtual machine encounters a bytecode new instruction

  • First, it will check whether the parameters of this instruction can locate a symbol reference of a class in the constant pool, and check whether the symbol reference has been loaded, parsed, and initialized. If not, the corresponding class loading process will be executed.

2) Allocate memory

The size of the memory required by the object can be completely determined after the class loading is completed. Whether the corresponding heap memory is absolutely regular, different methods are used to allocate memory.

  • For regular heap memory, pointer collision is used
  • For irregular heap memory, a free list data structure is needed to realize memory allocation.

About the safety of memory allocation

We know that creating objects is a very frequent behavior in the JVM. Even if you only modify the location pointed by a pointer, it is not thread-safe in a concurrent situation. For example, when allocating memory to A, the pointer has not had time to modify the object. At the same time, the original pointer is used to allocate memory.

  • Use CAS with failure retry to ensure the atomicity of update operations
  • The other is to divide the memory allocation action into different spaces according to threads. That is, each thread pre-allocates a small piece of memory in the Java heap, called a thread local buffer (TLAB). Each thread needs to allocate Memory is allocated in its own local buffer. Synchronization locking is required only when the local buffer is used up and a new buffer is allocated.

3) Make necessary settings for the object

  • The allocated memory space (excluding the object header) must be initialized to zero value. This operation ensures that the instance field of the object can be used directly in Java code without assigning a value.
  • Make some necessary settings for the object, such as which class the object is an instance of (how to find the metadata of the class), the hash code of the object (actually it is delayed until the hashcode is actually called), and the GC generation of the object age etc.

4) Execution of constructor

  • In the above series of operations, from the perspective of the JVM, an object has been generated, but from the perspective of the Java program, it has just begun. We need to execute our corresponding construction method to perform the initialization of the object, such a real object Only then can it be considered completely constructed.

Summarize

The instantiation process of simple class objects

1. Load the class in the method area;

2. Apply for space in the stack memory and declare variable P;

3. Create space in the heap memory and allocate object addresses;

4. In the object space, the properties of the object are initialized by default, and the class member variables are initialized explicitly;

5. The constructor method is pushed onto the stack and initialized;

6. After the initialization is completed, assign the address in the heap memory to the reference variable, and the constructor pops the stack;

The instantiation process of subclass objects

1. Load the parent class first and then the subclass in the method area;

2. Apply for space in the stack and declare variable P;

3. Create space in the heap memory and allocate object addresses;

4. In the object space, the properties of the object (including the properties of the parent class) are initialized by default;

5. The subclass construction method is pushed onto the stack;

6. Display the attributes of the initialized parent class;

7. The parent class construction method is pushed onto the stack and popped out after execution;

8. Display the properties of the initialized subclass;

9. After initialization is completed, the address value in the heap memory is assigned to the reference variable P, and the subclass construction method is popped off the stack;

Guess you like

Origin blog.csdn.net/qq_50985215/article/details/131510914