Object layout parsing in the JVM

Continue to create, accelerate growth! This is the first day of my participation in the "Nuggets Daily New Plan · June Update Challenge", click to view the details of the event

Object memory layout

According to the description in the java virtual machine specification: java objects are divided into three parts: object header (Object Header), instance data (instance data), and alignment padding (padding).

object header

The object header of the HotSpot virtual machine mainly includes two parts (if the array object also includes the length of an array) information, the object header occupies 8 bytes on a 32-bit system, and 16 bytes on a 64-bit system (enable compression pointer).

Mark Word , which mainly stores hash code (HashCode), GC generation age, lock status identifier, thread held lock, biased thread ID, biased timestamp.
Type pointer (klass pointer) , that is, the pointer that the object points to its class metadata (that is, the Class class information that exists in the method area), the virtual machine uses this pointer to determine which class the object is an instance of.
Array length , if the object is an array, there is also a piece of data in the object header for recording the length of the array (not considered here).

instance data

The instance data part is the effective information actually stored by the object (that is, the information of the new object), and it is also the field content of various types defined in the program code. The memory usage of primitive types is as follows:

Reference types occupy 4 bytes each on 64-bit systems.

Align padding

Alignment padding does not necessarily exist, it has no special meaning, it only acts as a placeholder. Because the automatic memory management system of HotSpot VM requires that the starting address of the object must be an integer multiple of 8 bytes, that is to say, the size of the object must be an integer multiple of 8 bytes (this is a rule). The object header part is a multiple of 8 bytes, so when the object instance data part is not aligned, it needs to be filled by alignment padding.

Objects in HotSpot

HotSpot defines the object header as: openjdk.java.net/groups/hots…

Common structure at the beginning of every GC-managed heap object. (Every oop points to an object header.) Includes fundamental information about the heap object's layout, type, GC state, synchronization state, and identity hash code. Consists of two words. In arrays it is immediately followed by a length field. Note that both Java objects and VM-internal objects have a common object header format.

谷歌翻译：

每个 GC 管理的堆对象开头的公共结构。（每个 oop 都指向一个对象头。）包括关于堆对象的布局、类型、GC 状态、同步状态和身份哈希码的基本信息。 由两个词组成。在数组中，它紧跟一个长度字段。请注意，Java 对象和 VM 内部对象都有一个共同的对象头格式。

因此，HotSpot 虚拟机的对象头主要包括Mark Word和类型指针（klass pointer） 两部分：

而对于Mark Word的大小在 64 位的 HotSpot 虚拟机中 markOop.cpp 中有很好的注释，其大小为64 bits，而klass pointer在开启压缩指针的情况下为32 bits。

PS：1byte = 8bit,即1字节为8位

把它转化为下面的表格：

PS：我们知道在处理并发的情况时，一般都是通过加锁来保证线程的安全，例如synchronized，而这个锁其实就是给对象头中锁状态标识，所以这篇文章不仅仅是解释对象头是什么，同时也为了解synchronized的实现奠定基础。

通过上面的表格可以看到Java的对象头在对象的不同状态下会有不同的表现形式，主要有三种状态：无锁状态、加锁状态、gc标记状态。那么我可以理解Java当中的锁其实可以理解是给对象上锁，也就是改变对象头的状态，如果上锁成功则进入同步代码块（synchronized）。

但是Java当中的锁有分为很多种，从上图可以看出大体分为偏向锁、轻量锁、重量锁三种锁状态。这三种锁的效率完全不同，我们只有合理的设计代码，才能合理的利用锁、那么这三种锁的原理是什么？所以我们需要先研究这个对象头。

JOL分析对象布局

一个对象在JVM中到底占用多少内存呢？如Object obj = new Object()占多少字节?

我们可以通过JOL(Java Object Layout)工具来查看 new 出来的一个 java 对象的内部布局，以及一个普通的 java 对象占用多少字节。

注：以下测试都是开启压缩指针的情况，默认开启。

引入依赖：

<dependency>
    <groupId>org.openjdk.jol</groupId>
    <artifactId>jol-core</artifactId>
    <version>0.10</version>
</dependency>
复制代码

1、新建一个类 A，里面不包括任何的实例数据

public class A {

}
复制代码

2、测试

public class JOLTest {

    static A a = new A();

    public static void main(String[] args) {
        System.out.println(ClassLayout.parseInstance(a).toPrintable());
    }
}
复制代码

运行结果：

从结果可以看到，一个对象的大小为16 bytes，其中object header对象头占用12 bytes，还有4 bytes是对齐的字节（因为在64位虚拟机上对象的大小必须是 8 的倍数，也就是对齐填充），由于这个对象里面没有任何字段，故而对象的实例数据为 0 。由此可以引出两个问题：

什么是实例数据？
object header的12 bytes存的是什么？

第一个问题，实例数据是被 new 出来的对象信息，在 A 中添加一个 boolean 类型字段，我们知道 boolean 在内存中占用1 byte（基础数据类型 boolean 的 0 值为 false）：

public class A {
    boolean b;
}
复制代码

测试结果如下：

尽管我们添加了一个 boolea 类型的实例数据，但我们看到整个对象的大小仍是16 bytes，其中对象头12 bytes，boolean 字段 b（对象的实例数据）占 1 byte、剩下的3 bytes就是对齐填充数据。当然，如果你定义一个 int 类型字段，那就刚好 4 字节，无需填充。由此我们可以认为一个对象的布局大体分为三个部分分别是对象头（Object header）、对象的实例数据和对齐填充。

第二个问题，object header（对象头）中的12 bytes存的是什么？在这之前我们先来了解什么是大小端模式。

什么是大小端模式？

在 JVM 的定义中已经知道对象头的8 bytes为Mark Word，4 bytes为klass pointer。

而且在Mark Word中的有一个锁状态标识，在初始new的情况下，默认是无锁的，即在无锁无hash的情况下，Mark Word的存储内容为：

但实际对比发现，在前 8 位就明显已经使用了：00000001

这难道是JVM的bug吗？当然不是。在计算机系统中，我们是以字节为单位存放数据的，每个地址单元都对应着一个字节，1个字节为 8 位。但在C语言中存在不同的数据类型，占用的字节数也各不相同，那么就存在怎样存放多个字节的问题，因此就出现了大端存储模式和小端存储模式。

大端（存储）模式：即高字节存在低地址，低字节存在高地址。
小端（存储）模式：即高字节存在高地址，低字节存在低地址。

这里解释一下什么是高低字节和高低地址：

高低字节：在十进制中我们都说靠左边的是高位，靠右边的是低位，在其他进制也是如此。如 0x12345678，从高位到低位的字节依次是0x12、0x34、0x56和0x78。

高低地址：由高到低0x0000001 -> 0x0000002-> ...  -> 0x0000092
复制代码

而在这里对应的值是从后往前对应的，因为是小端存储，所以我们打印刚好是反着来的。

上图是没有计算 hash 的情况，如果计算 hash ：

public class JOLTest {

    static A a = new A();
    
    public static void main(String[] args) {
        System.out.println("---hashcode before---");
        System.out.println(ClassLayout.parseInstance(a).toPrintable());
        //转化成16进制，方便比较
        System.out.println("对象的hashcode值：" + Integer.toHexString(a.hashCode()));
        System.out.println("   ");
        System.out.println("----hashcode after---");
        //计算完hashcode之后的a对象的布局
        System.out.println(ClassLayout.parseInstance(a).toPrintable());
    }
}
复制代码

测试结果如下：

因此，没有进行 hashcode 之前的对象头信息，可以看到2b-8b之前的的56bit是没有值，打印完hashcode之后行就有值了，为什么是2-8B，不应该是1-7B呢？主要原因就是因为小端存储。第一个字节当中的八位分别存的就是分带年龄、偏向锁信息，和对象状态，这个8 bits分别表示的信息如下图，这个图会随着对象状态改变而改变，下图是无锁状态下：

对象锁的膨胀

一个对象的状态不会一成不变，随着锁的竞争，锁可以从偏向锁升级到轻量级锁，再升级的重量级锁。

注：这里只看结果，不分析具体的实现，所以不要问为什么，怎么升级的，后续文章会讲。

无锁状态

The lock-free state is the situation analyzed above.

Bias lock

/**
 * 演示偏向锁
 * jdk6默认开启偏向锁，但是是输入延时开启，也就是说：
 * 程序刚启动创建的对象是不会开启偏向锁的，几秒后后创建的对象才会开启偏向锁
 * 可以通过参数关闭延迟开启偏向锁
 * VM：-XX:BiasedLockingStartupDelay=0
 */
public class BiasedLockTest {

    static A a = new A();

    public static void main(String[] args) {
        synchronized (a) {
            System.out.println(ClassLayout.parseInstance(a).toPrintable());
        }
    }
}
复制代码

Test Results:

Lightweight lock

public class LightWeightLockTest {

    static A a = new A();

    public static void main(String[] args) {
        System.out.println("befre lock");
        System.out.println(ClassLayout.parseInstance(a).toPrintable());
        synchronized (a) {
            System.out.println("befre ing");
            System.out.println(ClassLayout.parseInstance(a).toPrintable());
        }
        System.out.println("after lock");
        System.out.println(ClassLayout.parseInstance(a).toPrintable());
    }
}
复制代码

Test Results:

heavyweight lock

public class HeavyWeightLockTest {

    static A a = new A();

    public static void main(String[] args) throws Exception {
        System.out.println("befre lock");
        System.out.println(ClassLayout.parseInstance(a).toPrintable());
        Thread t1 = new Thread() {
            @Override
            public void run() {
                synchronized (a) {
                    try {
                        Thread.sleep(5000);
                        System.out.println("name:" + Thread.currentThread().getName());
                        System.out.println(ClassLayout.parseInstance(a).toPrintable());
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        };

        t1.setName("t1");
        t1.start();
        System.out.println("name:t1");
        System.out.println(ClassLayout.parseInstance(a).toPrintable());//轻量锁
        lock();
    }

    /**
     * 资源竞争---mutex  重量锁
     */
    public static void lock() {
        synchronized (a) {//t1 locked  t2 ctlock
            System.out.println("name:" + Thread.currentThread().getName());
            System.out.println(ClassLayout.parseInstance(a).toPrintable());
        }
    }
}
复制代码

Test Results:

Summarize

This article mainly talks about the information in the object header, and also talks about the expansion of the lock, but the expansion of the lock is not in-depth. In fact, the expansion of the lock is mainly synchronizedthe realization, which will be discussed later.