After learning ASM Tree api, you will no longer be afraid of hooks

background

After reading this chapter, you will learn to use ASM's tree api to perform hook operations on anonymous threads, and you will also be able to learn about asm-related operations and background knowledge! For ASM instrumentation, many people may be familiar with it, but most of them may stay on the core api. For some instrumentation libraries on the market, many of them are written in the tree api, because the tree api's Simple and clear features are increasingly becoming the choice of many open source libraries. (ASM has two sets of api types, core and tree)

image.png

Introduction to ASM

ASM is actually a tool that can compile bytecodes. For example, we will introduce a lot of class libraries in our daily development, right? Or our project is too big. When we want to modify a certain point, it is easy to make mistakes in unified modification (such as Privacy compliance issues, etc.), at this time, if there is a tool to edit the generated class file, it will be very convenient for us to carry out the follow-up work.

This chapter mainly introduces the tree api. The ASM mentioned below refers to the operation of the tree api . For the introduction of the core api, you can check the article Spider written by the author .

class file

The class file we often say is actually divided into the following parts from a binary point of view: As image.pngyou can see, a class file is actually composed of multiple parts in the above figure, and ASM is to carry out these structures. Further abstraction, for the class file, it is actually abstracted into the class node class in asm

image.pngFor a class file, it can be uniquely identified by the following: version (version), access (scope, such as modifiers such as private), name (name), signature (generic signature), superName ( parent class), interfaces (implemented interfaces), fields (current properties), methods (current methods) . So if we want to modify a class, we can modify the corresponding classNode

fields

Properties, also a very important part of the class, are defined in the bytecode as such

image.pngFor a property, ASM abstracts it as FieldNode

image.pngFor an attribute field, it can be uniquely identified by the following: access (scope, same as class structure, such as private modification), name (attribute name), desc (signature), signature (generic signature), value (the current corresponding value)

methods

Compared with attributes, our method structure is more complex image.png. Compared with the single attribute, a method may be composed of multiple instructions. The successful execution of a method also involves the cooperation of the local variable table and the operand stack. In ASM, the method is abstracted into such a definition method = method header + method body

  • Method header: that is to identify the basic attributes of a method, including: access (scope), name (method name), desc (method signature), signature (generic signature), exceptions (exceptions that the method can throw)

image.png

  • Method body: Compared with the method header, the concept of the method body is actually relatively simple. In fact, the method body is a collection of various instructions of the method, mainly including instructions (the instruction set of the method), tryCatchBlocks (the abnormal node set), maxStack (maximum depth of operand stack), maxLocals (maximum length of local variable table)

image.pngIt can be seen that the InsnList object in the method is an abstraction of the instruction set of the specific method, which is explained here.

InsnList

public class InsnList implements Iterable<AbstractInsnNode> {
    private int size;
    private AbstractInsnNode firstInsn;
    private AbstractInsnNode lastInsn;
    AbstractInsnNode[] cache;
    ...

It can be seen that the main objects are firstInsn, and lastInsn, which represent the head instruction and tail instruction of the method instruction set. Each instruction is actually abstracted into a subclass of AbstractInsnNode. AbstractInsnNode defines the most basic information of an instruction, we You can look at subclasses of this class

image.pngHere we take a look at our most commonly used methodInsnNode

public class MethodInsnNode extends AbstractInsnNode {

  /**
   * The internal name of the method's owner class (see {@link
   * org.objectweb.asm.Type#getInternalName()}).
   *
   * <p>For methods of arrays, e.g., {@code clone()}, the array type descriptor.
   */
  public String owner;

  /** The method's name. */
  public String name;

  /** The method's descriptor (see {@link org.objectweb.asm.Type}). */
  public String desc;

  /** Whether the method's owner class if an interface. */
  public boolean itf;

这个就是一个普通方法指令最根本的定义了,owner(方法调用者),name(方法名称),desc(方法签名)等等,他们都有着相似的结构,这个也是我们接下来会实战的重点。

Signature

嗯!我们最后介绍一下这个神奇的东西!不知道大家在看介绍的时候,有没有一脸疑惑,这个我解释为泛型签名,这个跟desc(函数签名)参数有什么区别呢?当然,这个不仅仅在函数上有出现,在属性,类的结构上都有出现!是不是非常神奇!

其实Signature属性是在JDK 1.5发布后增加到了Class文件规范之中,它是一个可选的定长属性, 可以出现于类、属性表和方法表结构的属性表中。我们想想看,jdk1.5究竟是发生什么了!其实就是对泛型的支持,那么1.5版本之前的sdk怎么办,是不是也要进行兼容了!所以java标准组就想到了一个折中的方法,就是泛型擦除,泛型信息编译(类型变量、参数化类型)之后 都通通被擦除掉,以此来进行对前者的兼容。那么这又导致了一个问题,擦除的泛型信息有时候正是我们所需要的,所以Signature就出现了,把这些泛型信息存储在这里,以提供运行时反射等类型信息的获取!实际上可以看到,我们大部分的方法或者属性这个值都为null,只有存在泛型定义的时候,泛型的信息才会被存储在Signature里面

实战部分

好啦!有了理论基础,我们也该去实战一下,才不是口水文!以我们线程优化为例子,在工作项目中,或者在老项目中,可能存在大多数不规范的线程创建操作,比如直接new Thread等等,这样生成的线程名就会被赋予默认的名字,我们这里先把这类线程叫做“匿名线程”!当然!并不是说这个线程没有名字,而是线程名一般是“Thread -1 ”这种没有额外信息含量的名字,这样对我们后期的线程维护会带来很大的干扰,时间长了,可能就存在大多数这种匿名线程,有可能带来线程创建的oom crash!所以我们的目标是,给这些线程赋予“名字”,即调用者的名字

解决“匿名”Thread

为了达到这个目的,我们需要对thread的构造有一个了解,当然Thread的构造函数有很多,我们举几个例子

public Thread(String name) {
    init(null, null, name, 0);
}
public Thread(ThreadGroup group, String name) {
    init(group, null, name, 0);
}

可以看到,我们Thread的多个构造函数,最后一个参数都是name,即Thread的名称,所以我们的hook点是,能不能在Thread的构造过程,调用到有name的构造函数是不是就可以实现我们的目的了!我们再看一下普通的new Thread()字节码

image.png 那么我们怎么才能把new Thread()的方式变成 new Thread(name)的方式呢?很简单!只需要我们把init的这条指令变成有参的方式就可以了,怎么改变呢?其实就是改变desc!方法签名即可,因为一个方法的调用,就是依据方法签名进行匹配的。我们在函数后面添加一个string的参数即可

node是methidInsnNode
def desc =
        "${node.desc.substring(0, r)}Ljava/lang/String;${node.desc.substring(r)}"
node.desc = desc

那么这样我们就可以完成了吗,非也非也,我们只是给方法签名对加了一个参数,但是这并不代表我们函数就是这么运行的!因为方法参数的参数列表中的string参数我们还没放入操作数栈呢!那么我们就可以构造一个string参数放入操作数栈中,这个指令就是ldc指令啦!asm为我们提供了一个类是LdcInsnNode,我们可以创建一个该类对象即可,构造参数需要传入一个字符串,那么这个就可以把当前方法的owner(解释如上,调用者名称)放进去了,是不是就达到我们想要的目的了!好啦!东西我们又了,我们要在哪里插入呢?

image.png 所以我们的目标很明确,就是在init指令调用前插入即可,asm也提供了insertBefore方法,提供在某个指令前插入的便捷操作。

method.instructions.insertBefore(
        node,
        new LdcInsnNode(klass.name)
)

我们看看最后插入后的字节码

image.pngOf course, we usually insert asm code in the Transform stage provided to us by android (the new version of agp has changed, but the general workflow is the same), so in order to avoid excessive interference to the class in transfrom, we also need to put unnecessary The stage is eliminated early! For example, if we only operate on new Thread, we can filter operations that are not Opcodes.INVOKESPECIAL. There is also a non-init stage (that is, a non-constructor stage) or if the owner is not a Thread class, it can be filtered in advance without participating in the change.

Then we see the complete code (the code that needs to be executed in Transform)

static void transform(ClassNode klass) {
    println("ThreadTransformUtils")
    // 这里只处理Thread
    klass.methods?.forEach { methodNode ->
        methodNode.instructions.each {
            // 如果是构造函数才继续进行
            if (it.opcode == Opcodes.INVOKESPECIAL) {
                transformInvokeSpecial((MethodInsnNode) it, klass, methodNode)
            }
        }
    }

}


private static void transformInvokeSpecial(MethodInsnNode node, ClassNode klass, MethodNode method) {
    // 如果不是构造函数,就直接退出
    if (node.name != "<init>" || node.owner != THREAD) {
        return
    }
    println("transformInvokeSpecial")
    transformThreadInvokeSpecial(node, klass, method)

}

private static void transformThreadInvokeSpecial(
        MethodInsnNode node,
        ClassNode klass,
        MethodNode method
) {
    switch (node.desc) {
    // Thread()
        case "()V":
            // Thread(Runnable)
        case "(Ljava/lang/Runnable;)V":
            method.instructions.insertBefore(
                    node,
                    new LdcInsnNode(klass.name)
            )
            def r = node.desc.lastIndexOf(')')
            def desc =
                    "${node.desc.substring(0, r)}Ljava/lang/String;${node.desc.substring(r)}"
            // println(" + $SHADOW_THREAD.makeThreadName(Ljava/lang/String;Ljava/lang/String;) => ${this.owner}.${this.name}${this.desc}: ${klass.name}.${method.name}${method.desc}")
            println(" * ${node.owner}.${node.name}${node.desc} => ${node.owner}.${node.name}$desc: ${klass.name}.${method.name}${method.desc}")
            node.desc = desc
            break
    }


}

at last

Seeing this, you should be able to understand the related usage and actual combat of asm tree api, I hope it can be helpful!

I am participating in the recruitment of the creator signing program of the Nuggets Technology Community, click the link to register and submit .

Guess you like

Origin juejin.im/post/7121643784638562317