Java Magic. Part 4: sun.misc.Unsafe(译)

Java Magic. Part 4: sun.misc.Unsafe

Java is a safe programming language and prevents programmer from doing a lot of stupid mistakes, most of which based on memory management. But, there is a way to do such mistakes intentionally, using Unsafe class.

Java是一种安全的编程语言,它可以阻止程序员犯很多低级错误,其中大部分是关于内存管理的。但是,也有一种途径故意的犯错。使用Unsafe类。

 

This article is a quick overview of sun.misc.Unsafe public API and few interesting cases of its usage.

这篇文章是对sun.misc.Unsafe公共的API快速的概述和少量有趣的使用案列。

 

Unsafe instantiation

Unsafe实例化

 

Before usage, we need to create instance of Unsafe object. There is no simple way to do it like Unsafe unsafe = new Unsafe(), because Unsafe class has private constructor. It also has static getUnsafe() method, but if you naively try to call Unsafe.getUnsafe() you, probably, get SecurityException. Using this method available only from trusted code.

在使用之前,我们需要创建Unsafe对象的实例。没有简单的途径创建Unsafe实例,比如 Unsafe unsafe = new Unsafe(),

因为Unsafe类的构造函数是私有的。虽然Unsafe有一个静态的getUnsafe()方法,但是如果你天真地调用Unsafe.getUnsafe(),你可能得到SecurityException异常。使用这个方法需要代码受信任

 

public static Unsafe getUnsafe() {
    Class cc = sun.reflect.Reflection.getCallerClass(2);
    if (cc.getClassLoader() != null)
        throw new SecurityException("Unsafe");
    return theUnsafe;
}
 

This is how java validates if code is trusted. It is just checking that our code was loaded with primary classloader.

这是java如何校验代码是否受信任。它只需要检查我们的代码被私有的类加载器加载。

 

We can make our code "trusted". Use option bootclasspath when running your program and specify path to system classes plus your one that will use Unsafe.

我们可以使我们的代码"受信任"。当运行我们的程序的时候,使用bootclasspath选项并且指定路径到系统classpath加上你使用了Unsafe的类

 

java -Xbootclasspath:/usr/jdk1.7.0/jre/lib/rt.jar:. com.mishadoff.magic.UnsafeClient

But it's too hard.

但是太难了

 

Unsafe class contains its instance called theUnsafe, which marked as private. We can steal that variable via java reflection.

Unsafe类包含它的实例被称为theUnsafe,theUnsafe被标识为私有的。我们能够通过Java反射窃取theUnsafe变量。

 

Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
Unsafe unsafe = (Unsafe) f.get(null);

 

Note: Ignore your IDE. For example, eclipse show error "Access restriction..." but if you run code, all works just fine. If the error is annoying, ignore errors on Unsafe usage in:

注释:不用管你的IDE,L例如,eclipse显示错误"访问受限..."但是如果你运行代码,一切都正常进行。如果错误令你感到不愉快,忽略Unsafe使用错误。

 

Preferences -> Java -> Compiler -> Errors/Warnings ->
Deprecated and restricted API -> Forbidden reference -> Warning

Unsafe API

Class sun.misc.Unsafe consists of 105 methods. There are, actually, few groups of important methods for manipulating with various entities. Here is some of them:

sun.misc.Unsafe类包含105个方法。事实上只有几组重要的方法操作各种各样的实体。这里是其中的一些:

Info. Just returns some low-level memory information.(仅仅返回一些底层的内存信息)

  • addressSize
  • pageSize

 

Objects. Provides methods for object and its fields manipulation.(提供用于操作对象和它的字段的方法)

  • allocateInstance
  • objectFieldOffset 

Classes.Provides methods for classes and static fields manipulation.(提供用于操作类和静态字段的方法)

  • staticFieldOffset
  • defineClass
  • defineAnonymousClass
  • ensureClassInitialized

Arrays. Arrays manipulation.(数组操作)

  • arrayBaseOffset
  • arrayIndexScale

Synchronization. Low level primitives for synchronization.(底层同步原语)

  • monitorEnter
  • tryMonitorEnter
  • monitorExit
  • compareAndSwapInt
  • putOrderedInt

Memory. Direct memory access methods.(直接内存访问方法)

  • allocateMemory
  • copyMemory
  • freeMemory
  • getAddress
  • getInt
  • putInt

Interesting use cases(比较有趣的使用案例)

Avoid initialization(避开初始化)

 

allocateInstance method can be useful when you need to skip object initialization phase or bypass security checks in constructor or you want instance of that class but don't have any public constructor. Consider following class

当需要跳过对象初始化过程或者想要在构造函数中绕过安全检查又或者是你想要实例化一个没有公共构造函数的类,可以使用allocateInstance方法。思考如下的类:

class A {
    private long a; // not initialized value

    public A() {
        this.a = 1; // initialization
    }

    public long a() { return this.a; }
}

 

 

Instantiating it using constructor, reflection and unsafe gives different results.

使用构造函数,反射和unsafe方式来实例化类产生不同的结果

A o1 = new A(); // constructor
o1.a(); // prints 1

A o2 = A.class.newInstance(); // reflection
o2.a(); // prints 1

A o3 = (A) unsafe.allocateInstance(A.class); // unsafe
o3.a(); // prints 0

 

 

Just think what happens to all your Singletons.(只需要思考所有的单列发生了什么)

 

Memory corruption(内存破坏)

This one is usual for every C programmer. By the way, its common technique for security bypass.

对于每一位C程序员来说是很寻常的事。随便说一下,和安全相关的普通的技术忽略

Consider some simple class that check access rules:

思考一些检查访问规则的简单类、

class Guard {
    private int ACCESS_ALLOWED = 1;

    public boolean giveAccess() {
        return 42 == ACCESS_ALLOWED;
    }
}

 

The client code is very secure and calls giveAccess() to check access rules. Unfortunately, for clients, it always returns false. Only privileged users somehow can change value of ACCESS_ALLOWED constant and get access.

用户代码是非常安全的并且调用giveAccess()来检查访问规则。很遗憾,对于用户,总是返回false。只有特权用户以某种方式能够改变ACCESS_ALLOWED常量的值并且获取访问。

 

In fact, it's not true. Here is the code demostrates it:

事实上,它不是true,这里的代码证明这一点

Guard guard = new Guard();
guard.giveAccess();   // false, no access

// bypass
Unsafe unsafe = getUnsafe();
Field f = guard.getClass().getDeclaredField("ACCESS_ALLOWED");
unsafe.putInt(guard, unsafe.objectFieldOffset(f), 42); // memory corruption

guard.giveAccess(); // true, access granted 

 

Now all clients will get unlimited access.

现在所有的用户将得到没有限制的访问.

 

Actually, the same functionality can be achieved by reflection. But interesting, that we can modify any object, even ones that we do not have references to.

实际上,同样的功能可以通过反射来实现。但是有趣的事,我们能够修改任意对象,甚至一个我们没有引用的对象

 

For example, there is another Guard object in memory located next to current guard object. We can modify its ACCESS_ALLOWED field with the following code

例如,在内存中有还有一个Guard对象位于当前guard对象之后。我们能够使用下面的代码修改它的ACCESS_ALLOWED字段

unsafe.putInt(guard, 16 + unsafe.objectFieldOffset(f), 42); // memory corruption

 

Note, we didn't use any reference to this object. 16 is size of Guard object in 32 bit architecture. We can calculate it manually or use sizeOf method, that defined... right now.

注意,我们没有使用任何引用指向这个对象。在32位架构中16指的是Guard对象的大小。我们能手动计算或者使用sizeOf方法计算Guard对象的大小,现在,立刻,马上定义...

 

sizeOf

Using objectFieldOffset method we can implement C-style sizeof function. This implementation returns shallow size of object:

使用objectFieldOffset方法我们能够实现C语言风格的sizeof功能。这个实现返回对象直接占用的内存大小:

public static long sizeOf(Object o) {
    Unsafe u = getUnsafe();
    HashSet<Field> fields = new HashSet<Field>();
    Class c = o.getClass();
    while (c != Object.class) {
        for (Field f : c.getDeclaredFields()) {
            if ((f.getModifiers() & Modifier.STATIC) == 0) {
                fields.add(f);
            }
        }
        c = c.getSuperclass();
    }

    // get offset
    long maxSize = 0;
    for (Field f : fields) {
        long offset = u.objectFieldOffset(f);
        if (offset > maxSize) {
            maxSize = offset;
        }
    }

    return ((maxSize/8) + 1) * 8;   // padding
}

  

 

Algorithm is the following: go through all non-static fields including all superclases, get offset for each field, find maximum and add padding. Probably, I missed something, but idea is clear.

运算规则如下:遍历所有的非静态字段包括所有超类的,获取每个字段的偏移量,找到最大值并且填充。也许,我忽略了一些东西,但是思维是清晰的。

 

Much simpler sizeOf can be achieved if we just read size value from the class struct for this object, which located with offset 12 in JVM 1.7 32 bit.

更加简单sizeOf被实现,如果我们只是从这个对象的类结构中读取大小值,在JVM1.7 32位中它的偏移量是12

public static long sizeOf(Object object){
    return getUnsafe().getAddress(
        normalize(getUnsafe().getInt(object, 4L)) + 12L);
}

 

normalize is a method for casting signed int to unsigned long, for correct address usage.

normalize是一个把有符号位的int转换成无符号位的long的方法,用于正确的地址计算

private static long normalize(int value) {
    if(value >= 0) return value;
    return (~0L >>> 32) & value;
}

 

Awesome, this method returns the same result as our previous sizeof function.

这个方法与之前的sizefof功能返回的结果相同

 

In fact, for good, safe and accurate sizeof function better to use java.lang.instrument package, but it requires specifyng agent option in your JVM.

事实上,一个好的,安全的和精确的sizeof函数最好是使用java.lang.instrument,但是它需要指定JVM的agent选项

 

 

Shallow copy(浅拷贝)

 

Having implementation of calculating shallow object size, we can simply add function that copy objects. Standard solution need modify your code with Cloneable, or you can implement custom copy function in your object, but it won't be multipurpose function.

有计算浅对象大小的实现,我们能够简单添加复制对象的函数。标准的解决方案需要使用Cloneable修改你的代码,或者你能够在你的对象中实现自定义的复制函数,但是它不会是多用途的功能

 

Shallow copy(浅拷贝):

static Object shallowCopy(Object obj) {
    long size = sizeOf(obj);
    long start = toAddress(obj);
    long address = getUnsafe().allocateMemory(size);
    getUnsafe().copyMemory(start, address, size);
    return fromAddress(address);
}

 

toAddress and fromAddress convert object to its address in memory and vice versa.

static long toAddress(Object obj) {
    Object[] array = new Object[] {obj};
    long baseOffset = getUnsafe().arrayBaseOffset(Object[].class);
    return normalize(getUnsafe().getInt(array, baseOffset));
}

static Object fromAddress(long address) {
    Object[] array = new Object[] {null};
    long baseOffset = getUnsafe().arrayBaseOffset(Object[].class);
    getUnsafe().putLong(array, baseOffset, address);
    return array[0];
}

 

This copy function can be used to copy object of any type, its size will be calculated dynamically. Note that after copying you need to cast object to specific type.

复制函数能够被用于复制任何类型的对象,它的大小将被动态地计算。注意在复制之后,你需要把对象转换成指定的类型。

 

 

Hide Password(隐藏密码)

 

One more interesting usage of direct memory access in Unsafe is removing unwanted objects from memory.

在Unsafe里面有一个直接访问内存更有趣的用法是从内存中删除不想要的对象

 

Most of the APIs for retrieving user's password, have signature as byte[] or char[]. Why arrays?

很多检索用户密码的APIs都是以字节码数组或者是字符数组形式存放数字证书。为什么是数组呢?

 

It is completely for security reason, because we can nullify array elements after we don't need them. If we retrieve password as String it can be saved like an object in memory and nullifying that string just perform dereference operation. This object still in memory by the time GC decide to perform cleanup.

这完全是出于安全因素考虑,因为我们能够在不需要数组元素之后作废数组元素。如果我们检索密码为字符串 它能像一个对象被存储在内存中并且执行解引用操作。

在GC时候仍然存在于内存中的对象才会执行清除

 

This trick creates fake String object with the same size and replace original one in memory:

这种技巧创建一个相同大小的冒充的字符串对象并且覆盖了原来的字符串内存空间

String password = new String("l00k@myHor$e");
String fake = new String(password.replaceAll(".", "?"));
System.out.println(password); // l00k@myHor$e
System.out.println(fake); // ????????????

getUnsafe().copyMemory(
          fake, 0L, null, toAddress(password), sizeOf(password));

System.out.println(password); // ????????????
System.out.println(fake); // ????????????

 

Feel safe.

感到安全

 

UPDATE: That way is not really safe. For real safety we need to nullify backed char array via reflection:

最新消息:这个方式不是真正的安全。真正的安全,我们需要通过反射使背后的字符串数组无效[字符串是通过字符数组实现的]

Field stringValue = String.class.getDeclaredField("value");
stringValue.setAccessible(true);
char[] mem = (char[]) stringValue.get(password);
for (int i=0; i < mem.length; i++) {
  mem[i] = '?';
}

 

Thanks to Peter Verhas for pointing out that.

感谢Peter Verhas指出

 

 

Multiple Inheritance(多继承)

 

There is no multiple inheritance in java.

在java里面没有多继承。

 

Correct, except we can cast every type to every another one, if we want.

是的,除了我们能将每种类型转换成每种我们想要类型外

long intClassAddress = normalize(getUnsafe().getInt(new Integer(0), 4L));
long strClassAddress = normalize(getUnsafe().getInt("", 4L));
getUnsafe().putAddress(intClassAddress + 36, strClassAddress);

 

This snippet adds String class to Integer superclasses, so we can cast without runtime exception.

这段代码将String类添加到Integer超类,因此我们能够转换,而不会报运行时异常。

(String) (Object) (new Integer(666))

 

One problem that we must do it with pre-casting to object. To cheat compiler.

还有一个问题是我们必须要先将它转换成Object。来骗过编译器。

 

Dynamic classes(动态类)

 

We can create classes in runtime, for example from compiled .class file. To perform that read class contents to byte array and pass it properly to defineClass method.

我们能够在运行时创建一个类,例如从编译好的类文件中。读取class文件中的内容到字节数组并且把字节数组正确地传递给defineClass方法。

byte[] classContents = getClassContent();
Class c = getUnsafe().defineClass(
              null, classContents, 0, classContents.length);
    c.getMethod("a").invoke(c.newInstance(), null); // 1

And reading from file defined as:

从一个被定义的文件中读取

private static byte[] getClassContent() throws Exception {
    File f = new File("/home/mishadoff/tmp/A.class");
    FileInputStream input = new FileInputStream(f);
    byte[] content = new byte[(int)f.length()];
    input.read(content);
    input.close();
    return content;
}

This can be useful, when you must create classes dynamically, some proxies or aspects for existing code.

这种方式是有用的,当你必须为以存在的代码动态的创建类,代理或者切面时

 

Throw an Exception(抛出异常)

Don't like checked exceptions? Not a problem.

是不是不喜欢受检查异常?没关系。

getUnsafe().throwException(new IOException());

This method throws checked exception, but your code not forced to catch or rethrow it. Just like runtime exception.

这个方法抛出受检查异常,而你的代码不需要强制catch或者rethrow这个异常。就像是运行时异常。

Big Arrays(大数组)

As you know Integer.MAX_VALUE constant is a max size of java array. Using direct memory allocation we can create arrays with size limited by only heap size.

如你所知,Integer.MAX_VALUE常量是java里面数组最大大小。使用直接分配内存方式,我们能创建只受堆大小限制的数组。

Here is SuperArray implementation:

这里是SuperArray的实现

class SuperArray {
    private final static int BYTE = 1;

    private long size;
    private long address;

    public SuperArray(long size) {
        this.size = size;
        address = getUnsafe().allocateMemory(size * BYTE);
    }

    public void set(long i, byte value) {
        getUnsafe().putByte(address + i * BYTE, value);
    }

    public int get(long idx) {
        return getUnsafe().getByte(address + idx * BYTE);
    }

    public long size() {
        return size;
    }
}

And sample usage:

简单使用:

long SUPER_SIZE = (long)Integer.MAX_VALUE * 2;
SuperArray array = new SuperArray(SUPER_SIZE);
System.out.println("Array size:" + array.size()); // 4294967294
for (int i = 0; i < 100; i++) {
    array.set((long)Integer.MAX_VALUE + i, (byte)3);
    sum += array.get((long)Integer.MAX_VALUE + i);
}
System.out.println("Sum of 100 elements:" + sum);  // 300

In fact, this technique uses off-heap memory and partially available in java.nio package.

事实上,这种技术使用在非推内存并且部分应用于java.nio包中

Memory allocated this way not located in the heap and not under GC management, so take care of it using Unsafe.freeMemory(). It also does not perform any boundary checks, so any illegal access may cause JVM crash.

这个方式的内存分配不是在推中进行的,也不受JVM的垃圾回收机制管理,因此使用Unsafe.freeMemory()来维护它。

它也没有进行任何边界检查,因此任何非法访问都可能造成JVM停止工作。

It can be useful for math computations, where code can operate with large arrays of data. Also, it can be interesting for realtime programmers, where GC delays on large arrays can break the limits.

它可以用于数学计算,代码可以操作大型数组的数据。它也能让对时效有要求的程序员感兴趣,在大数组上垃圾回收延迟能够突破限制

 

 

Concurrency(并发)

 

And few words about concurrency with Unsafe. compareAndSwap methods are atomic and can be used to implement high-performance lock-free data structures.

很少有话题聊和Unsafe相关的并发。compareAndSwap方法是原子操作的并且能够用于实现高性能无锁数据架构

For example, consider the problem to increment value in the shared object using lot of threads.

例如,思考一个问题,使用多线程增加共享对象中的值。

 

First we define simple interface Counter:

首先我们定义一个简单的接口Counter:

interface Counter {
    void increment();
    long getCounter();
}

Then we define worker thread CounterClient, that uses Counter:

然后我们定义一个工作线程CounterClient,它使用了Counter:

class CounterClient implements Runnable {
    private Counter c;
    private int num;

    public CounterClient(Counter c, int num) {
        this.c = c;
        this.num = num;
    }

    @Override
    public void run() {
        for (int i = 0; i < num; i++) {
            c.increment();
        }
    }
}

And this is testing code:

这是测试代码:

int NUM_OF_THREADS = 1000;
int NUM_OF_INCREMENTS = 100000;
ExecutorService service = Executors.newFixedThreadPool(NUM_OF_THREADS);
Counter counter = ... // creating instance of specific counter
long before = System.currentTimeMillis();
for (int i = 0; i < NUM_OF_THREADS; i++) {
    service.submit(new CounterClient(counter, NUM_OF_INCREMENTS));
}
service.shutdown();
service.awaitTermination(1, TimeUnit.MINUTES);
long after = System.currentTimeMillis();
System.out.println("Counter result: " + c.getCounter());
System.out.println("Time passed in ms:" + (after - before));

First implementation is not-synchronized counter:

首先实现一个非同步的Counter:

class StupidCounter implements Counter {
    private long counter = 0;

    @Override
    public void increment() {
        counter++;
    }

    @Override
    public long getCounter() {
        return counter;
    }
}

 

Working fast, but no threads management at all, so result is inaccurate. Second attempt, add easiest java-way synchronization:

运行速度很快,但是根本没对线程进行管理,因此结果是不准确的。第二次尝试,添加一个非常容易的java方式的同步:

class SyncCounter implements Counter {
    private long counter = 0;

    @Override
    public synchronized void increment() {
        counter++;
    }

    @Override
    public long getCounter() {
        return counter;
    }
}

 

Radical synchronization always work. But timings is awful. Let's try ReentrantReadWriteLock:

同步总是可以工作,但是运行时间是糟糕的。让我们尝试可重入锁:

class LockCounter implements Counter {
    private long counter = 0;
    private WriteLock lock = new ReentrantReadWriteLock().writeLock();

    @Override
    public void increment() {
        lock.lock();
        counter++;
        lock.unlock();
    }

    @Override
    public long getCounter() {
        return counter;
    }
}

 

Still correct, and timings are better. What about atomics?

结果仍旧正确,执行时间更好。原子操作是什么呢?

class AtomicCounter implements Counter {
    AtomicLong counter = new AtomicLong(0);

    @Override
    public void increment() {
        counter.incrementAndGet();
    }

    @Override
    public long getCounter() {
        return counter.get();
    }
}

 

AtomicCounter is even better. Finally, try Unsafe primitive compareAndSwapLong to see if it is really privilegy to use it.

AtomicCounter执行时间实际上更好。最终,尝试Unsafe原始的compareAndSwapLong方法看看使用它是否能够带来性能上提升。

 private volatile long counter = 0;
    
    private Unsafe unsafe;
    
    private long offset;
    
	
    public CASCounter() throws Exception{
    	Field f = Unsafe.class.getDeclaredField("theUnsafe");
    	f.setAccessible(true);
    	this.unsafe = (Unsafe) f.get(null);
    	this.offset = unsafe.objectFieldOffset(CASCounter.class.getDeclaredField("counter"));
    }

	@Override
	public void increment() {
		long before = counter;
        while (!unsafe.compareAndSwapLong(this, offset, before, before + 1)) {
            before = counter;
        }
	}

	@Override
	public long getCounter() {
		 return counter;
	}

 

Hmm, seems equal to atomics. Maybe atomics use Unsafe? (YES)

嗯...,执行时间视乎等于原子操作.或许原子操作就是使用了Unsafe?(是的)

 

In fact this example is easy enough, but it shows some power of Unsafe.

事实上这个例子是足够简单的,但是它展示了Unsafe的某些能力

As I said, CAS primitive can be used to implement lock-free data structures. The intuition behind this is simple:

随便说一句,CAS原语能够用于实现无锁数据结构。直觉让我感觉像后面的步骤一样简单

 

Have some state(有一些状态)

Create a copy of it(创建它的副本)

Modify it(修改它)

Perform CAS(执行CAS操作)

Repeat if it fails(失败重复)

 

Actually, in real it is more hard than you can imagine. There are a lot of problems like ABA Problem, instructions reordering, etc.

事实上,在实际中它比我们想象的更加复杂。有很多问题像ABA问题,指令重排序问题,等等

If you really interested, you can refer to the awesome presentation about lock-free HashMap

如果你真的感兴趣,你能够参考其他的关于无锁HashMap的非常不错的介绍

 

UPDATE: Added volatile keyword to counter variable to avoid risk of infinite loop.

最新情报:使用volatile关键字修饰counter变量能够避免无限循环的风险。

 

说明:在64位版本的JVM中测试,CAS运算的速度没有同步锁Synchronized快,可能是因为上文中使用的是32位版本的JVM的原因  

 

 

英语原文:http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/

猜你喜欢

转载自weigang-gao.iteye.com/blog/2324538