How objects are stored in heap space

Table of contents

1. Object creation and memory allocation

When the virtual machine encounters a new bytecode instruction, it will first check whether this type has been loaded. If not, proceed with the class loading process

After the class loading check passes, the virtual machine will allocate memory for the new object.

The memory size required for an object can be determined after the class loading is completed, so you only need to divide a memory of a certain size from the heap.

1. How to divide memory from the heap

There are two ways to manage Java heap memory:

Pointer collision: The heap memory is absolutely regular, with data stored on one side and free on the other, separated by a pointer in the middle. Allocating memory only requires moving the pointer
Free list: Used memory and free memory are mixed, and the virtual machine needs to maintain a list to record which memory blocks are available. Divide from the list when allocating memory.

Which memory management method the virtual machine chooses is related to whether the Java heap is regular or not. Whether the heap is regular or not depends on whether the garbage collector used by the virtual machine has the "space compression and sorting" function.

When using collectors with space compression and sorting such as Serial and ParNew, pointer collision is used, which is simple and efficient.
Using CMS, a collector based on the Sweep algorithm, theoretically you need to use a free list to allocate memory.

2. How to ensure concurrency safety by dividing memory

Object creation is a very frequent operation in the virtual machine. If it is not processed, it is likely that memory is being allocated to A but has not been completed, and B uses the original memory state to allocate memory.

Virtual machines use two methods to ensure thread safety:

Synchronize the action of allocating memory space. (Specifically, the method of optimistic locking + failure retry is used)
The memory allocation action is performed in different spaces according to threads.

Each thread 堆中的Eden区pre-allocates a small piece of memory as a "local thread allocation buffer (TLAB)", and 线程优先在自己的TLAB中分配内存then synchronizes it when it is not enough.

Whether the virtual machine enables TLAB is set through this parameter:

-XX:+/-UseTLAB

3. Work after memory allocation is completed

The memory is allocated, and the virtual machine processes the allocated space, initializes everything except the object header to zero values, and then fills the object header.

At this point, for the virtual machine, the object is created. But this is a blank object. The constructor in the Java code has not yet been executed, so the object has not yet been initialized.

Only after the new instruction is executed and the <init>() constructor is executed, a usable Java object is completely created.

4. Summarize the creation process of Java objects

The creation process of Java objects:

Type checking, allocating memory, initializing zero value, setting object header, executing constructor method

When encountering the new keyword, first check whether the parameter of this instruction can find a symbol reference of this type in the constant pool.
If found, check whether the type has been loaded and initialized
If it is not found, it means that the class has not been loaded yet, and the class loading process is performed first.
After the class loading check phase passes, the memory size that the object of this class needs to occupy has been determined. JVM will allocate memory to the object
- Allocating memory involves three details:
  - Two methods of memory allocation: pointer collision and free list
  - Thread safety of memory allocation: optimistic locking + retry on failure
  - For small objects, the thread first allocates memory on its own "Thread Local Allocation Buffer TLAB" in the heap
    
    For large objects, you can choose to put them directly into the old generation.
Process the allocated memory space and initialize everything except the object header to zero values so that the object's member variables have default values.
Fill in the object header, set the object type, generation age, and whether to enable bias locking. The hashcode is loaded lazily on the first call.
Execute the constructor method of the object

Then a usable Java object is obtained.

5. Basic strategies for object memory allocation

Overall:

New objects are allocated first in the Eden area
Large objects enter the old generation directly
Long-lived objects will enter the old generation

1. New objects are allocated first in the Eden area.

If there is insufficient space in the Eden area, a Minor GC will be triggered.

If during the new generation GC, there are many surviving objects in the Eden area and the survivor area cannot fit them, the new generation objects will be copied to the old generation through the allocation guarantee mechanism.
If there is insufficient space in the old generation, a Full GC will be triggered, which is time-consuming.

2. Why do large objects directly enter the old age?

Large objects are objects that require a large amount of continuous memory space (such as strings and arrays).

There are two aspects to consider:

Maybe the Eden area of the new generation has insufficient memory space and has to trigger a GC in advance. Because large objects have a greater probability of running out of memory.
In the future new generation GC, if the large object survives, the survivor area may not be able to accommodate it, and it will still enter the old generation through the allocation guarantee mechanism. So you can choose to put it directly into the old generation.

There is one parameter:

-XX:PretenureSizeThreshold

Numbers larger than this are allocated directly in the old generation. The default is 0, which means it will not be allocated directly in the old generation.

3. Objects that survive for a long time will enter the old generation.

Each object will save a generational age. Every time it survives a GC, the generational age will be +1.

When the generation age exceeds the threshold, it will be promoted to the old generation.

This parameter can be adjusted, the default is 15

-XX:MaxTenuringThreshold

HotSpot uses a dynamic generational age mechanism here, which is documented in the generational collection theory.

2. Object memory layout

In the HotSpot virtual machine, the storage layout of objects in heap memory can be divided into three parts:

Object header (Header)
Instance Data
Alignment padding

1. Object header

The object header part contains two types of information:

mark word: used to store the runtime data of the object itself, such as hash value, CG generation age, lock status flag, lock held by the thread, biased thread ID, biased timestamp, etc.
klass word: type pointer, that is, the pointer of the object to its type metadata. The JVM uses this pointer to determine which class the object is an instance of.

If the object is an array, the object header must also store its length, otherwise the size of the array object cannot be determined.

2. Instance data

Stores the valid information of the object, including those defined in the code and inherited from the parent class. Storage order has two effects:

Writing order in code
Virtual machine allocation strategy

The default allocation strategy is that fields of the same width will be allocated and stored together. Based on this condition, the variables of the parent class will precede those of the subclass.

If HotSpot turns on this parameter, narrower variables in the subclass can be inserted into the gaps of the parent class variables, saving a little space.

+XX:CompactFields:true //The default is true

3. Alignment and filling

This part acts as a placeholder.

The automatic memory management system of the HotSpot virtual machine requires that the object starting address must be an integer multiple of 8 bytes, so the size of any object must be an integer multiple of 8 bytes.

The object header part has been carefully designed to be 8 bytes or 16 bytes, but the length of the instance data part cannot be guaranteed. If it is less than 8 bytes, it can be filled with alignment padding.

3. Object access positioning

The access positioning method of an object refers to how the reference on the stack points to the object on the heap.

In Java programs 通过栈上的 reference 数据来操作堆上的具体对象, this reference type is only fixed as a reference without specifying the implementation method.

Therefore, the virtual machine can freely implement the object access method. There are two mainstream methods:

Use handle access:
- A piece of memory is divided into the Java heap as a handle pool
- The handle address of the object stored in reference
- The handle contains the specific address information of the object's "instance data" and "type data".
Use direct pointer access:
- The specific address of the object is directly stored in reference

Both methods have their own advantages:

The biggest advantage of handle access is that the reference stores a stable handle address. When the object is moved (for example, garbage collection occurs), only the instance data pointer in the handle will be changed, without modifying the reference.
The advantage of direct pointer access is that it is faster and saves the overhead of pointer positioning. Object access operations are very frequent, which is also the solution used by HotSpot.

The specific object access positioning method is related to the type of GC.