[java] Java magic class: Unsafe application analysis

insert image description here

1 Overview

Reprint: Java Magic Class: Unsafe Application Analysis Reprinting this article is to find that this article is different from the previous one, and it is better to make up for each other.

[High Concurrency] JUC underlying tool class Unsafe

[java] unsafe of java

2. Preface

Unsafe is a class located in the sun.misc package. It mainly provides some methods for performing low-level and unsafe operations, such as direct access to system memory resources, self-management of memory resources, etc. These methods improve the operating efficiency of Java and enhance Java. It plays an important role in the ability to operate the underlying resources of the language. However, since the Unsafe class enables the Java language to have the ability to operate the memory space similar to the C language pointer, this undoubtedly increases the risk of related pointer problems in the program. Excessive and incorrect use of Unsafe classes in the program will increase the probability of program errors, making a safe language like Java no longer "safe", so the use of Unsafe must be cautious.

Note: This article introduces sun.misc.Unsafe public API functions and related application scenarios.

3. Basic introduction

As shown in the Unsafe source code below, the Unsafe class is a singleton implementation and provides the static method getUnsafe to obtain the Unsafe instance, which is legal if and only when the class that calls the getUnsafe method is loaded by the bootstrap class loader, otherwise a SecurityException is thrown.

public final class Unsafe {
    
    
  // 单例对象
  private static final Unsafe theUnsafe;

  private Unsafe() {
    
    
  }
  @CallerSensitive
  public static Unsafe getUnsafe() {
    
    
    Class var0 = Reflection.getCallerClass();
    // 仅在引导类加载器`BootstrapClassLoader`加载时才合法
    if(!VM.isSystemDomainLoader(var0.getClassLoader())) {
    
        
      throw new SecurityException("Unsafe");
    } else {
    
    
      return theUnsafe;
    }
  }
}

So if you want to use this class, how do you get its instance? There are two possible solutions as follows.

First, starting from the restrictions on the use of the method, the jar package path of the class A that calls the Unsafe related method is appended to the default path getUnsafethrough the Java command line command , so that A is loaded by the bootstrap class loader, so that it can be obtained safely through the method. Unsafe instance.-Xbootclasspath/abootstrapUnsafe.getUnsafe

java -Xbootclasspath/a: ${
    
    path}   // 其中path为调用Unsafe相关方法的类所在jar包路径 

Second, obtain the singleton object theUnsafe through reflection.

private static Unsafe reflectGetUnsafe() {
    
    
    try {
    
    
      Field field = Unsafe.class.getDeclaredField("theUnsafe");
      field.setAccessible(true);
      return (Unsafe) field.get(null);
    } catch (Exception e) {
    
    
      log.error(e.getMessage(), e);
      return null;
    }
}

4. Function introduction

insert image description here
As shown in the figure above, the APIs provided by Unsafe can be roughly divided into memory operations, CAS, Class-related, object operations, thread scheduling, system information acquisition, memory barriers, array operations, etc. The related methods and application scenarios will be described below. Details.

4.1 Memory Operations

This part mainly includes methods such as allocation, copy, release, and given address value operation of off-heap memory.

//分配内存, 相当于C++的malloc函数
public native long allocateMemory(long bytes);
//扩充内存
public native long reallocateMemory(long address, long bytes);
//释放内存
public native void freeMemory(long address);
//在给定的内存块中设置值
public native void setMemory(Object o, long offset, long bytes, byte value);
//内存拷贝
public native void copyMemory(Object srcBase, long srcOffset, Object destBase, long destOffset, long bytes);
//获取给定地址值,忽略修饰限定符的访问限制。与此类似操作还有: getInt,getDouble,getLong,getChar等
public native Object getObject(Object o, long offset);
//为给定地址设置值,忽略修饰限定符的访问限制,与此类似操作还有: putInt,putDouble,putLong,putChar等
public native void putObject(Object o, long offset, Object x);
//获取给定地址的byte类型的值(当且仅当该内存地址为allocateMemory分配时,此方法结果为确定的)
public native byte getByte(long address);
//为给定地址设置byte类型的值(当且仅当该内存地址为allocateMemory分配时,此方法结果才是确定的)
public native void putByte(long address, byte x);

Usually, the objects we create in Java are in the heap memory (heap). The heap memory is the Java process memory controlled by the JVM, and they follow the memory management mechanism of the JVM. The JVM will use the garbage collection mechanism to uniformly manage the heap. Memory. In contrast, off-heap memory exists in a memory area outside the control of the JVM. The operation of off-heap memory in Java depends on the native method provided by Unsafe to operate off-heap memory.

4.1.1 Reasons for using off-heap memory

Improvements to garbage collection pauses. Since off-heap memory is directly managed by the operating system rather than the JVM, when we use off-heap memory, we can keep the size of on-heap memory small. This reduces the impact of collection pauses on applications during GC.

Improve the performance of program I/O operations. Usually, during the I/O communication process, there is a data copy operation from the on-heap memory to the off-heap memory. For temporary data that requires frequent inter-memory data copying and has a short life cycle, it is recommended to store it in the off-heap memory.

4.1.2 Typical applications

DirectByteBufferIt is an important class used by Java to implement off-heap memory. It is usually used as a buffer pool in the communication process. It is widely used in NIO frameworks such as Netty and MINA. DirectByteBuffer's logic for creating, using, and destroying off-heap memory is implemented by the off-heap memory API provided by Unsafe.

The following figure shows the DirectByteBufferconstructor, 创建DirectByteBuffer的时候,通过Unsafe.allocateMemory分配内存、Unsafe.setMemory进行内存初始化,而后构建Cleaner对象用于跟踪DirectByteBuffer对象的垃圾回收,以实现当DirectByteBuffer被垃圾回收时,分配的堆外内存一起被释放.

I have a little question here? I allocate off-heap memory. After allocation, because this memory is not under the management of jvm, will this memory be overwritten by other services? Probably not, this may re-maintain a memory table in the Linux system, those addresses are allocated, and other programs will bypass this area when they want to allocate.

insert image description here
But here I still have a question, if I create a lot of DirectByteBuffer, then there are a lot of Cleaner objects, assuming that the jvm suddenly hangs directly for no reason, resulting in this function not being executed, then whether there is a leak of the off-heap memory. But for such an obvious problem, I guess jvm must have a special purpose for this.

So how to release off-heap memory by building a garbage collection tracking object Cleaner?

Cleaner继承自Java四大引用类型之一的虚引用PhantomReference(It is well known, 无法通过虚引用获取与之关联的对象实例,且当对象仅被虚引用引用时,在任何发生GC的时候,其均可被回收), 通常PhantomReference与引用队列ReferenceQueue结合使用,可以实现虚引用关联对象被垃圾回收时能够进行系统通知、资源清理等功能. As shown in the figure below, 当某个被Cleaner引用的对象将被回收时,JVM垃圾收集器会将此对象的引用放入到对象引用中的pending链表中,等待Reference-Handler进行相关处理。其中,Reference-Handler为一个拥有最高优先级的守护线程,会循环不断的处理pending链表中的对象引用,执行Cleaner的clean方法进行相关清理工作.

insert image description here
So when DirectByteBufferonly Cleanerreferenced (that 虚引用is), it can be in 任意GC时段被回收. When the DirectByteBufferinstance object is recycled, in Reference-Handler线程operation, 会调用Cleaner的clean方法根据创建Cleaner时传入的Deallocator来进行堆外内存的释放.

insert image description here

4.2 CAS related

As shown in the source code explanation below, this part is mainly the method of CAS related operations.

/**
	*  CAS
  * @param o         包含要修改field的对象
  * @param offset    对象中某field的偏移量
  * @param expected  期望值
  * @param update    更新值
  * @return          true | false
  */
public final native boolean compareAndSwapObject(Object o, long offset,  Object expected, Object update);

public final native boolean compareAndSwapInt(Object o, long offset, int expected,int update);
  
public final native boolean compareAndSwapLong(Object o, long offset, long expected, long update);

What is CAS? Compare and replace, a technique commonly used to implement concurrent algorithms. A CAS operation consists of three operands - 内存位置、预期原值及新值. 执行CAS操作的时候,将内存位置的值与预期原值比较,如果相匹配,那么处理器会自动将该位置值更新为新值,否则,处理器不做任何操作. We all know that CAS是一条CPU的原子指令(cmpxchg指令),不会造成所谓的数据不一致问题,Unsafe提供的CAS方法(如compareAndSwapXXX)底层实现即为CPU指令cmpxchg.

4.2.1 Typical applications

CAS has a very wide range of applications in the implementation of java.util.concurrent.atomic related classes, Java AQS, CurrentHashMap and so on. As shown in the figure below, AtomicIntegerin the implementation of , the static field valueOffsetis the memory offset address of the field value, and the value is obtained by the Unsafe method valueOffsetin the static code block when the AtomicInteger is initialized . objectFieldOffsetAmong AtomicIntegerthe thread-safe methods provided in, 通过字段valueOffset的值可以定位到AtomicInteger对象中value的内存地址,从而可以根据CAS实现对value字段的原子操作.

insert image description here
The following figure shows the memory diagram of an AtomicInteger object before and after the self-increment operation. The base address of the object is obtained baseAddress=“0x110000”by baseAddress+valueOffsetobtaining the memory address of the value valueAddress=“0x11000c”; then the atomic update operation is performed through CAS, and the return is successful, otherwise, continue to retry until the update is successful. .

insert image description here

4.3 Thread Scheduling

This part includes methods such as thread suspension, recovery, and lock mechanism.

//取消阻塞线程
public native void unpark(Object thread);
//阻塞线程
public native void park(boolean isAbsolute, long time);
//获得对象锁(可重入锁)
@Deprecated
public native void monitorEnter(Object o);
//释放对象锁
@Deprecated
public native void monitorExit(Object o);
//尝试获取对象锁
@Deprecated
public native boolean tryMonitorEnter(Object o);

As in the source code description above, 方法park、unpark即可实现线程的挂起与恢复,将一个线程进行挂起是通过park方法实现的,调用park方法后,线程将一直阻塞直到超时或者中断等条件出现;unpark可以终止一个挂起的线程,使其恢复正常.

4.3.1 Typical applications

The core class of the Java lock and synchronizer framework is to block and wake up threads AbstractQueuedSynchronizerby calling LockSupport.park()and implementing them .LockSupport.unpark()而LockSupport的park、unpark方法实际是调用Unsafe的park、unpark方式来实现

4.4 Class related

This part mainly provides methods related to the operation of Class and its static fields, including static field memory location, defining classes, defining anonymous classes, checking & ensuring initialization, etc.

//获取给定静态字段的内存地址偏移量,这个值对于给定的字段是唯一且固定不变的
public native long staticFieldOffset(Field f);
//获取一个静态类中给定字段的对象指针
public native Object staticFieldBase(Field f);
//判断是否需要初始化一个类,通常在获取一个类的静态属性的时候(因为一个类如果没初始化,它的静态属性也不会初始化)使用。 当且仅当ensureClassInitialized方法不生效时返回false。
public native boolean shouldBeInitialized(Class<?> c);
//检测给定的类是否已经初始化。通常在获取一个类的静态属性的时候(因为一个类如果没初始化,它的静态属性也不会初始化)使用。
public native void ensureClassInitialized(Class<?> c);
//定义一个类,此方法会跳过JVM的所有安全检查,默认情况下,ClassLoader(类加载器)和ProtectionDomain(保护域)实例来源于调用者
public native Class<?> defineClass(String name, byte[] b, int off, int len, ClassLoader loader, ProtectionDomain protectionDomain);
//定义一个匿名类
public native Class<?> defineAnonymousClass(Class<?> hostClass, byte[] data, Object[] cpPatches);

4.4.1 Typical applications

Starting with Java 8, the JDK uses invokedynamicand VM Anonymous Classcombines to implement Lambda expressions at the Java language level.

  • invokedynamic: invokedynamic is a new virtual machine instruction introduced by Java 7 to implement dynamic language running on the JVM. It can dynamically resolve the method referenced by the call-site qualifier at runtime, and then execute the method. The invokedynamic instruction The dispatch logic is determined by the boot method set by the user.
  • VM Anonymous Class: It can be regarded as a template mechanism. When the program dynamically generates many classes with the same structure and only a few different constants, you can first create a template class containing constant placeholders, and then Unsafe.defineAnonymousClassfill in the template class when you define a specific class through a method. Bitmaps generate concrete anonymous classes. The generated anonymous class is not explicitly hung under anything ClassLoader, as long as the class has no existing instance object and no strong reference to the Class object of the class, the class will be reclaimed by the GC. Therefore, compared to anonymous inner classes at the Java language level, VM Anonymous Class does not need to be loaded by ClassClassLoader and is easier to recycle.

In the implementation of Lambda expression, the call point is generated by calling the bootstrap method through the invokedynamic instruction. During this process, it will pass ASM动态生成字节码, and then use the Unsafe defineAnonymousClassmethod to define an anonymous class that implements the corresponding functional interface, and then instantiate the anonymous class, and Returns the invocation site associated with the method handle of the functional method in this anonymous class; this invocation site can then be used to invoke the logic defined by the corresponding Lambda expression. The following is an example of the Test class shown in the following figure.

insert image description here
The decompiled result of the compiled class file of the Test class is shown in Figure 1 below (deleting the parts that are meaningless to the description of this article), we can see the instruction implementation of the main method, invokedynamicthe guiding method of the instruction call BootstrapMethods, and the static method lambda$main$0( Implemented string printing logic in Lambda expressions) and so on. During the execution of the bootstrap method, Unsafe.defineAnonymousClassan anonymous class that implements the Consumer interface will be generated as shown in Figure 2 below. Among them, the accept method lambda$main$0implements the logic defined in the Lambda expression by calling the static method in the Test class. Then executing the statement consumer.accept("lambda")is actually calling the accept method of the anonymous class shown in Figure 2 below.

insert image description here

4.5 Object Operations

This part mainly includes related methods such as object member attribute related operations and unconventional object instantiation methods.

//返回对象成员属性在内存地址相对于此对象的内存地址的偏移量
public native long objectFieldOffset(Field f);
//获得给定对象的指定地址偏移量的值,与此类似操作还有:getInt,getDouble,getLong,getChar等
public native Object getObject(Object o, long offset);
//给定对象的指定地址偏移量设值,与此类似操作还有:putInt,putDouble,putLong,putChar等
public native void putObject(Object o, long offset, Object x);
//从对象的指定偏移量处获取变量的引用,使用volatile的加载语义
public native Object getObjectVolatile(Object o, long offset);
//存储变量的引用到对象的指定的偏移量处,使用volatile的存储语义
public native void putObjectVolatile(Object o, long offset, Object x);
//有序、延迟版本的putObjectVolatile方法,不保证值的改变被其他线程立即看到。只有在field被volatile修饰符修饰时有效
public native void putOrderedObject(Object o, long offset, Object x);
//绕过构造方法、初始化代码来创建对象
public native Object allocateInstance(Class<?> cls) throws InstantiationException;

4.5.1 Typical applications

  • 常规对象实例化方式: The way we usually use to create objects, in essence, is to create objects through newmechanisms. However, new机制有个特点就是当类只提供有参的构造函数且无显示声明无参构造函数时,则必须使用有参构造函数进行对象构造,而使用有参构造函数时,必须传递相应个数的参数才能完成对象实例change.

  • 非常规的实例化方式: And the Unsafe中提供allocateInstancemethod, 仅通过Class对象就可以创建此类的实例对象,而且不需要调用其构造函数、初始化代码、JVM安全检查等。it suppresses the modifier detection, that is, even if the constructor is yes private修饰, it can be instantiated through this method, and the corresponding object can be created just by mentioning the class object. Due to this feature, allocateInstance has corresponding applications in java.lang.invoke, Objenesis (which provides an object generation method that bypasses the class constructor), and Gson (used in deserialization).

As shown in the figure below, during Gson deserialization, if the class has a default constructor, the default constructor is called to create an instance through reflection, otherwise the object instance is constructed through UnsafeAllocator, which instantiates the object by calling Unsafe's allocateInstance , to ensure that deserialization is not enough when the target class has no default constructor.

insert image description here

4.6 Array correlation

This part mainly introduces the two methods, arrayBaseOffset and arrayIndexScale, which are related to data operations. The two methods can be used together to locate the location of each element in the array in memory.

//返回数组中第一个元素的偏移地址
public native int arrayBaseOffset(Class<?> arrayClass);
//返回数组中一个元素占用的大小
public native int arrayIndexScale(Class<?> arrayClass);

4.6.1 Typical applications

These two methods related to data operations AtomicIntegerArrayhave typical applications in the java.util.concurrent.atomic package (which can implement atomic operations on each element in an Integer array), as shown in the source code of AtomicIntegerArray in the following figure 通过Unsafe的arrayBaseOffset、arrayIndexScale分别获取数组首元素的偏移地址base及单个元素大小因子scale. 后续相关原子性操作,均依赖于这两个值进行数组中元素的定位, as shown in Figure 2 below, the getAndAdd method obtains the offset address of an array element through the checkedByteOffset method, and then implements the atomic operation through CAS.

insert image description here

4.7 Memory Barriers

Introduced in Java 8, it is used to define memory barriers (also known as memory barriers, memory barriers, barrier instructions, etc., is a type of synchronization barrier instruction, which is a synchronization point in the operation of the CPU or compiler in random access to memory, so that all read and write operations before this point can be executed before the operations after this point can be executed) to avoid code reordering.

//内存屏障,禁止load操作重排序。屏障前的load操作不能被重排序到屏障后,屏障后的load操作不能被重排序到屏障前
public native void loadFence();
//内存屏障,禁止store操作重排序。屏障前的store操作不能被重排序到屏障后,屏障后的store操作不能被重排序到屏障前
public native void storeFence();
//内存屏障,禁止load、store操作重排序
public native void fullFence();

4.7.1 Typical applications

A new mechanism for locks was introduced in Java 8 - StampedLock, which can be seen as an improved version of read-write locks. StampedLockprovided 一种乐观读锁的实现,这种乐观读锁类似于无锁的操作,完全不会阻塞写线程获取写锁,从而缓解读多写少时写线程“饥饿”现象. 由于StampedLock提供的乐观读锁不阻塞写线程获取读锁,当线程共享变量从主内存load到线程工作内存时,会存在数据不一致问题,所以当使用StampedLock的乐观读锁时,需要遵从如下图用例中使用的模式来确保数据的一致性.

Calculate the coordinate point Point object as shown in the above use case, including the point movement method move and the method distanceFromOrigin to calculate the distance from this point to the origin. In the method distanceFromOrigin, first, the optimistic read mark is obtained through the tryOptimisticRead method; then the coordinate value (x, y) of the point is loaded from the main memory; then the lock state is verified through the validate method of StampedLock, and the coordinate point (x, y) is determined. In the process of loading from the main memory to the thread working memory, whether the value of the main memory has been modified by other threads through the move method, if the return value of validate is true, it proves that the value of (x, y) has not been modified, and can participate in subsequent calculations; otherwise , you need to add a pessimistic read lock, load the latest value of (x, y) from the main memory again, and then perform the distance calculation. Among them, the operation of verifying the lock state is very important. It is necessary to judge whether the lock state has changed, so as to judge whether the value copied to the thread working memory before is inconsistent with the value of the main memory.

The following figure shows the source code implementation of the StampedLock.validate method. The lock status is verified by bit operation and comparison between the lock flag and the relevant constants. Before the verification logic, a load memory barrier will be added through Unsafe's loadFence method to avoid the above The problem of inaccurate lock state verification caused by reordering of the lock state verification operation in step ② and StampedLock.validate in the use case in the figure.

insert image description here

4.8 System related

This section contains two methods for obtaining information about the system.

//返回系统指针的大小。返回值为4(32位系统)或 8(64位系统)。
public native int addressSize();  
//内存页的大小,此值为2的幂次方。
public native int pageSize();

typical application

The code snippet shown in the figure below is a static method in the tool class Bits under java.nio to calculate the number of memory pages required for the memory to be applied for. It relies on the pageSize method in Unsafe to obtain the size of the system memory pages to implement subsequent calculation logic.

insert image description here

Epilogue

This article provides a basic introduction to the usage and application scenarios of sun.misc.Unsafe in Java. We can see that Unsafe provides many convenient and interesting API methods. Even so, because Unsafe contains a large number of methods for autonomously operating memory, if used improperly, it will bring many uncontrollable disasters to the program. Therefore, we need to be cautious about its use.

Complete interpretation of off-heap memory in JVM source code analysis

Guess you like

Origin blog.csdn.net/qq_21383435/article/details/127339791