Android reverse analysis (2) Smali instruction set and file detailed explanation

Instruction Set

  • Features

    • 1 Parameters take a target-to-source approach.

    • 2 Depending on the size and type of the bytecode, some bytecodes have name suffixes added to disambiguate

        ● 32位常规类型的字节码未添加任何后缀
      
        ● 64常规类型的字节码添加 -wide 后缀
      
        ● 特殊类型的字节码根据具体类型添加后缀。它们可以是 -boolean、-byte、-char、-short、 -int、-long、-float、-double、-object、-string、-void之一。
      
    • 3 Depending on the bytecode layout and options, some bytecodes have a bytecode suffix added to disambiguate. These suffixes are separated by adding a slash "/" to the suffix of the bytecode main name.

    • 4 In the description of the instruction set, each subtitle in the width value represents a width of 4 bits.

    For example: move-wide/from16 vAA, vBBBB

    This instruction means: move is the basic bytecode, indicating that this is the basic operation. wide is the name suffix, which identifies the data width (64 bits) of the instruction operation. from16 is the bytecode suffix (opcode suffix), which identifies the source as a 16-bit register reference variable. vAA is the destination register, it is always in front of the source, and the value range is v0~v255. vBBBB is the source register, the value range is v0~v65535. Most instructions in the instruction set use registers as destination operands or source operands, where A/B/C/D/E/F/G/H represents a 4-bit value that can be used to represent v0~v15 registers. AA/BB/.../HH represents an 8-bit value. AAAA/BBBB/.../HHHH represents a 16-bit value

  • data manipulation instructions

    The data manipulation instruction is move. The prototype of the move instruction is "move destination, source". The move instruction will be followed by different suffixes according to the size and type of the bytecode. eg:

      - “move vA, vB”:将vB寄存器的值赋给vA寄存器,源寄存器与目的寄存器都为4位。
      - "move /from 16 VAA,VBBBB":将VBBBB寄存器的值赋给VAA寄存器,源寄存器为16位,目标寄存器为8位
      - “move /from 16 VAAAA,VBBBB”:将VBBBB寄存器的值赋给VAAAA,源寄存器和目标寄存器都为16位
      - “move-wide vA, vB”:为4位的寄存器对赋值。源寄存器与目的寄存器都为4位
      - "move-object vA,vB":将vB寄存器中的对象引用赋值给vA寄存器,vA寄存器和vB寄存器都是4位
      - "move-result vAA":将上一个“invoke”(方法调用)指令,操作的单字(32位)
      - “move-result-wide vAA” :将上一个invoke指令操作的双字(64位)非对象结果赋值给vAA寄存器
      - “mvoe-result-object vAA”:将上一个invoke指令操作的对象结果赋值给vAA寄存器
      - “move-exception vAA”:保存上一个运行时发生的异常到vAA寄存器
    
  • data definition directive

    Data definition instructions are used to define constants, strings, classes and other data used in the program. Its underlying bytecode is const .

      - const/4 vA,#+B 将数值符号扩展为32位后赋给寄存器 vA
      - const/16 vAA,#+BBBB 将数值符号扩展为32位后赋给寄存器 vAA
      - const vAA,#+BBBBBBBB 将数值赋给寄存器vAA
      - const/high16 vAA,#+BBBB0000 将数值右边 0 扩展为32位后赋给寄存器vAA
      - const-wide/16 vAA,#+BBBB 将数值符号扩展64位后赋给寄存器对vAA
      - const-wide vAA,#+BBBBBBBBBBBBBBBB 将数值赋给寄存器对vAA
      - const-wide/high16 vAA,#+BBBB000000000000 将数值右边 0 扩展为64位后付赋值给寄存器 vAA
      - const-string vAA,string[@BBBB](https://my.oschina.net/u/205605) 通过字符串索引构造一个字符串并赋给寄存器对 vAA
      - const-string/jumbo vAA,string[@BBBBBBBB](https://my.oschina.net/u/2326784) 通过字符串索引(较大) 构造一个字符串并赋值给寄存器对vAA
      - const-class vAA,type[@BBBB](https://my.oschina.net/u/205605) 通过类型索引获取一个类引用并赋值给寄存器 vAA
      - const-class/jumbo vAAAA,type[@BBBBBBBB](https://my.oschina.net/u/2326784) 通过给定的类型那个索引获取一个类索引并赋值给寄存器vAAAA(这条指令占用两个字节,值为0x00ff,是Android4.0中新增的指令)
    
  • data return command

    The return instruction refers to the last instruction that was run at the end of the function. Its basic bytecode is return, and there are the following four return instructions:

      - "return-void":表示什么也不返回
      - “return vAA”:表示函数返回一个32位非对象类型的值
      - “return-wide vAA”:表示函数返回一个64位非对象类型的值
      - “return-object vAA”:表示函数返回一个对象类型的值
    
  • array manipulation instructions

    Array operations include operations such as reading the length of an array, creating an array, assigning an array, and obtaining and assigning values ​​to elements of an array.

      - array-length vA,vB 获取给定vB寄存器中数组的长度并将值赋给vA寄存器,数组长度指的是数组的条目个数。
      - new-array vA,vB,type[@CCCC](https://my.oschina.net/u/157616) 构造指定类型(type@CCCC)与大小(vB)的数组,并将值赋给vA寄存器。
      - new-array/jumbo vAAAA,vBBBB,type@CCCCCCCC 指令功能与上一条指令相同,只是寄存器与指令的索引取值范围更大(Android4.0中新增的指令)
      - filled-new-array {vC,vD,vE,vF,vG},type@BBBB 构造指定类型(type@BBBB)与大小(vA)的数组并填充数组内容。vA寄存器是隐含使用的,除了指定数组的大小外还制订了参数的个数,vC~vG是使用到的参数寄存器序列
      - filled-new-array/range {vCCCC, ... ,vNNNN},type@BBBB 指定功能与上一条指令相同,只是参数寄存器使用range字节码后缀指定了取值范围,vC是第一个参数寄存器, N=A+C-1。
      - filled-new-array/jumbo {vCCCC, ... ,vNNNN},type@BBBBBBBB 指令功能与上一条指令相同,只是寄存器与指令的索引取值范围更大(Android4.0中新增的指令)
      - fill-array-data vAA, +BBBBBBBB 用指定的数据来填充数组,vAA寄存器为数组引用,引用必须为基础类型的数组,在指令后面会紧跟一个数据表
      - arrayop vAA,vBB,vCC 对vBB寄存器指定的数组元素进入取值与赋值。vCC寄存器指定数组元素索引,vAA寄存器用来寄放读取的或需要设置的数组元素的值。读取元素使用aget类指令,元素赋值使用aput指令,元素赋值使用aput类指令,根据数组中存储的类型指令后面会紧跟不同的指令后缀,指令列表有aget、aget-wide、aget-object、aget-boolean、aget-byte、aget-char、aget-short、aput、aput-wide、aput-boolean、aput-byte、aput-char、aput-short。
    
  • Data conversion instructions

    Data conversion instructions are used to convert a value of one type to another, and its format is unop vA,vB . The vB register or vB register pair stores the data to be converted, and the converted result is stored in the vA register or vA register pair.

      neg-int 对整型数求补
      not-int  对整型数求反
      neg-long 对长整型求补
      not-long 对长整型求反
      neg-float 对单精度浮点型数求补
      neg-double 对双精度浮点型数求补
      int-to-long 将整型数转换为长整型
      int-to-float 将整型数转换为单精度浮点型
      int-to-double 将整型数转换为双精度浮点型
      long-to-int 将长整型数转换为整型
      long-to-float 将长整型数转换为单精度浮点型
      long-to-double 将长整型数转换为双精度浮点型
      float-to-int 将单精度浮点型数转换为整型
      float-to-long 将单精度浮点型数转换为长整型
      float-to-double 将单精度浮点型数转换为双精度浮点型
      double-to-int 将双精度浮点型数转换为整型
      double-to-long 将双精度浮点型数转换为长整型
      double-to-float 将双精度浮点型数转换为单精度浮点型
      int-to-byte 将整型转换为字节型
      int-to-char 将整型转换为字符串
      int-to-short 将整型转换为短整型
    
  • Data operation instructions

    Data operation instructions include arithmetic operation instructions and logical operation instructions. Arithmetic operation instructions mainly perform operations such as addition, subtraction, multiplication, division, modulo, and shift among numerical values, and logical operations mainly perform operations such as AND, OR, NOT, and XOR between numerical values. There are four types of data operation instructions (data operation may be performed between registers or register pairs, and the following instructions use registers to describe the function):

      binop vAA,vBB,vCC 将vBB寄存器与vCC寄存器进行运算,结果保存到vAA寄存器
      binop/2addr vA,vB 将vA寄存器与vB寄存器进行运算,结果保存到vA寄存器
      binop/lit16 vA,vB,#+CCCC 将vB寄存器与常量CCCC进行运算,结果保存到vA寄存器
      binop/lit8 vAA,vBB,#+CC 将vBB寄存器与常量CC进行运算,结果保存到vAA寄存器
    

    The latter three types of instructions have more instruction suffixes such as addr, lit16, and lit8 than the first type of instructions. In the four types of instructions, the basic bytecode is followed by a data type suffix. For example, -int or -long indicate that the data type of the operation is an integer and a long integer, respectively. Category 1 directives can be classified as follows:

      add-type     vBB寄存器与vCC寄存器值进行加法运算(vBB  + vCC)
      sub-type     vBB寄存器与vCC寄存器值进行减法运算(vBB  - vCC)
      mul-type     vBB寄存器与vCC寄存器值进行乘法运算(vBB  * vCC)
      div-type     vBB寄存器与vCC寄存器值进除法运算(vBB  / vCC)
      rem-type     vBB寄存器与vCC寄存器值进行模运算(vBB  % vCC)
    
      and-type     vBB寄存器与vCC寄存器值进行与运算(vBB  & vCC)
      or-type     vBB寄存器与vCC寄存器值进行或运算(vBB  | vCC)
      xor-type     vBB寄存器与vCC寄存器值进行异或运算(vBB  ^ vCC)
    
      shl-type     vBB寄存器(有符号数)左移vCC位(vBB << vCC)
      shr-type     vBB寄存器(有符号数)右移vCC位(vBB >> vCC)
      ushr-type     vBB寄存器(无符号数)右移vCC位(vBB >> vCC)
      其中基础字节码后面的-type可以是-int、-long、-float、-double。后面3类指令与之类似。
    
  • object manipulation instructions

    Operations related to object instances, such as object creation, object inspection, etc.

      - new-instance vAA,type@BBBB 构造一个指定类型对象的新实例,并将对象引用赋值给vAA寄存器,类型符号type指定的类型不能是数组类。
      - instance-of vA,vB,type@CCCC 判断vB寄存器中的对象引用是否可以转换成指定的类型,如果可以vA寄存赋值为1,否则vA寄存器为0
      - check-cast vAA,type@BBBB 将vAA寄存器中对象的引用转成指定类型,成功则将结果赋值给vAA,否则抛出ClassCastException异常.
    
  • jump instruction

    Jump instructions are used to jump from the current address to the specified offset. There are three kinds of jump instructions in the Dalvik instruction set: unconditional jump (goto), branch jump (switch) and conditional jump (if).

      goto +AA 无条件跳转到指定偏移处,偏移量AA不能为0
      goto/16 +AAAA 无条件跳转到指定偏移处,偏移量AAAA不能为0。
      goto/32 +AAAAAAAA 无条件跳转到指定偏移处。
      packed-switch vAA,+BBBBBBBB 分支跳转指令。vAA寄存器为switch分支中需要判断的值,BBBBBBBB指向一个packed-switch-payload格式的偏移表,表中的值是有规律递增的。
      sparse-switch vAA,+BBBBBBBB 分支跳转指令。vAA寄存器为switch分支中需要判断的值,BBBBBBBB指向一个sparse-switch-payload格式的偏移表,表中的值是无规律的偏移表,表中的值是无规律的偏移量。
    
      if-test vA,vB,+CCCC 条件跳转指令。比较vA寄存器与vB寄存器的值,如果比较结果满足就跳转到CCCC指定的偏移处。偏移量CCCC不能为0。if-test类型的指令有以下几条:
      	 ● if-eq 如果vA不等于vB则跳转。Java语法表示为 if(vA == vB)
           ● if-ne 如果vA不等于vB则跳转。Java语法表示为 if(vA != vB)
           ● if-lt 如果vA小于vB则跳转。Java语法表示为 if(vA < vB)
           ● if-le 如果vA小于等于vB则跳转。Java语法表示为 if(vA <= vB)
           ● if-gt 如果vA大于vB则跳转。Java语法表示为 if(vA > vB)
           ● if-ge 如果vA大于等于vB则跳转。Java语法表示为 if(vA >= vB)
    
      if-testz vAA,+BBBB 条件跳转指令。拿vAA寄存器与 0 比较,如果比较结果满足或值为0时就跳转到BBBB指定的偏移处。偏移量BBBB不能为0。 if-testz类型的指令有一下几条:
           ● if-nez 如果vAA为 0 则跳转。Java语法表示为 if(vAA == 0)
           ● if-eqz 如果vAA不为 0 则跳转。Java语法表示为 if(vAA != 0)
           ● if-ltz 如果vAA小于 0 则跳转。Java语法表示为 if(vAA < 0)
           ● if-lez 如果vAA小于等于 0 则跳转。Java语法表示为 if(vAA <= 0)
           ● if-gtz 如果vAA大于 0 则跳转。Java语法表示为 if(vAA > 0)
           ● if-gez 如果vAA大于等于 0 则跳转。Java语法表示为 if(vAA >= 0)
    
  • compare instruction

    ** The comparison instruction is used to compare the size of the values ​​in the two registers. The basic format is cmp+kind-type vAA, vBB, vCC, type indicates the type of the compared data, such as -long, -float, etc.; kind indicates the operation type, so there are three comparison instructions cmpl, cmpg, cmp. coml is the abbreviation of compare less, cmpg is the abbreviation of compare greater, so cmpl indicates whether the condition of vBB is less than the value in vCC is true, if it is, it returns 1, otherwise it returns - 1, return 0 for equality; cmpg indicates whether the condition of vBB is greater than the value in vCC is true, if it is, it returns 1, otherwise it returns -1, and returns 0 for equality. The semantics of cmp and cmpg are the same, that is, whether vBB is greater than the value in the vCC register If established, return 1 if established, otherwise return -1, if equal, return 0 **

    eg:

      cmpl-float vAA,vBB,vCC	比较两个单精度的浮点数.如果vBB寄存器中的值大于vCC寄存器的值,则返回-1到vAA中,相等则返回0,小于返回1
      cmpg-float vAA,vBB,vCC	比较两个单精度的浮点数,如果vBB寄存器中的值大于vCC的值,则返回1,相等返回0,小于返回-1
      cmpl-double vAA,vBB,vCC	比较两个双精度浮点数,如果vBB寄存器中的值大于vCC的值,则返回-1,相等返回0,小于则返回1
      cmpg-double vAA,vBB,vCC	比较双精度浮点数,和cmpl-float的语意一致
      cmp-double vAA,vBB,vCC	等价与cmpg-double vAA,vBB,vCC指令
    
  • Field operation instructions

    Field manipulation instructions represent setting and fetching of object fields, just like the set and get methods you have in your code. The basic instructions are iput-type, iget-type, sput-type, sget-type.type Represents the data type.

      *前缀是i的iput-type和iget-type指令用于普通字段的读写操作.*
      iget-byte vA,vB,filed_id	     读取vB寄存器中的对象中的filed_id字段值赋值给vA寄存器
      iput-byte vA,vB,filed_id	     设置vB寄存器中的对象中filed_id字段的值为vA寄存器的值
      iget-boolean vA,vB,filed_id	
      iput-boolean vA,vB,filed_id	
      iget-long vA,vB,filed_id	
      iput-long vA,vB,filed_id	
      前缀是s的sput-type和sget-type指令用于静态字段的读写操作
      sget-byte vA,vB,filed_id	
      sput-byte vA,vB,filed_id	
      sget-boolean vA,vB,filed_id	
      sput-boolean vA,vB,filed_id	
      sget-long vA,vB,filed_id	
      sput-long vA,vB,filed_id
    
  • method call instruction

    Most of the method instructions in Davilk are very similar to the middle instructions of the JVM. There are currently five instruction sets:

      invoke-direct{parameters},methodtocall	调用实例的直接方法,即private修饰的方法.此时需要注意{}中的第一个元素代表的是当前实例对象,即this,后面接下来的才是真正的参数.比如指令invoke-virtual {v3,v1,v4},Test2.method5:(II)V中,v3表示Test2当前实例对象,而v1,v4才是方法参数
      invoke-static{parameters},methodtocall	调用实例的静态方法,此时{}中的都是方法参数
      invoke-super{parameters},methodtocall	调用父类方法
      invoke-virtual{parameters},methodtocall	调用实例的虚方法,即public和protected修饰修饰的方法
      invoke-interface{parameters},methodtocall	调用接口方法
    
      这五种指令是基本指令,除此之外,你也会遇到invoke-direct/range,invoke-static/range,invoke-super/range,invoke-virtual/range,invoke-interface/range指令,该类型指令和以上指令唯一的区别就是后者可以设置方法参数可以使用的寄存器的范围,在参数多于四个时候使用.
    
      再此强调一遍对于非静态方法而言{}的结构是{当前实例对象,参数1,参数2,…参数n},而对于静态方法而言则是{参数1,参数2,…参数n}
    

    If you want to get the return value of the method execution, you need to get the execution result through the move-result instruction mentioned above.

  • Synchronization instruction

    Synchronizing a sequence of instructions is usually represented by the synchronized statement block in java. The JVM supports the semantics of the synchronized keyword through the monitorenter and monitorexit instructions, and Davilk also provides two similar instructions to support the synchronized semantics :

      monitor-enter vAA	为指定对象获取锁操作
      monitor-exit vAA	为指定对象释放锁操作
    
  • exception instruction

      throw vAA	抛出vAA寄存器中指定类型的异常
    

Samil file details

  • Each .smali decompiled by a decompilation tool corresponds to a class in java. Each smali file is composed of Davilk instructions and follows a certain structure. There are many instructions in smali to describe the corresponding java file. , all commands start with ".", the commonly used commands are as follows:

      .filed						定义字段
      .method…end method			定义方法
      .annotation…end annotation	定义注解
      .implements					定义接口指令
      .local						指定了方法内局部变量的个数
      .registers					指定方法内使用寄存器的总数
      .prologue					表示方法中代码的开始处
      .line						表示java源文件中指定行
      .paramter					指定了方法的参数
      .param						和.paramter含义一致,但是表达格式不同
    
  • Let's write a simple Hello World to explain

    The JAVA source code is as follows:

      public class MainActivity extends AppCompatActivity implements View.OnClickListener {
    
          private static final String TAG = "MainActivity";
          private TextView tvShowText;
    
          private static final String HELLO = "HELLO";
          private static final String WORLD = "WORLD";
    
          @Override
          protected void onCreate(Bundle savedInstanceState) {
              super.onCreate(savedInstanceState);
              setContentView(R.layout.activity_main);
              initView();
              setListener();
          }
    
          private void setListener() {
              tvShowText.setOnClickListener(this);
          }
    
          private void initView() {
              tvShowText = (TextView) findViewById(R.id.tv_show_text);
          }
    
          @Override
          public void onClick(View v) {
              Log.d(TAG, "onClick: TextView");
              tvShowText.setText(getText());
          }
    
          private String getText() {
              return HELLO + WORLD;
          }
      }
    
      反编译后smali文件如下
    
      #文件头描述
      .class public Lorg/professor/helloworld/MainActivity;
      #指定基类
      .super Landroid/support/v7/app/AppCompatActivity;
      #源文件名称
      .source "MainActivity.java"
    
      #表明实现了View.OnClickListener接口
      # interfaces
      .implements Landroid/view/View$OnClickListener;
    
      #定义String静态字段 
      # static fields
      .field private static final HELLO:Ljava/lang/String; = "HELLO"
    
      .field private static final TAG:Ljava/lang/String; = "MainActivity"
    
      .field private static final WORLD:Ljava/lang/String; = "WORLD"
    
      #定义TextView静态字段 
      # instance fields
      .field private tvShowText:Landroid/widget/TextView;
    
      #构造方法
      # direct methods
      .method public constructor <init>()V
          .locals 0 #表示函数中无局部变量
    
          .prologue #表示方法中代码正式开始
          .line 9   #表示对应与java源文件的第8行
    
      	#调用AppCompatActivity中的init()方法
          invoke-direct {p0}, Landroid/support/v7/app/AppCompatActivity;-><init>()V
    
		#调用返回指令,此处没有返回任何值
	    return-void
	.end method     #方法结束
	
	 
	.method private getText()Ljava/lang/String;
	    .locals 1
	
	    .prologue
	    .line 40

		#v0寄存器中赋值为HELLOWORLD
	    const-string v0, "HELLOWORLD"
		
		#调用返回指令,返回v0中的值
	    return-object v0 
	.end method
	
	.method private initView()V
	    .locals 1
	
	    .prologue
	    .line 30
		#v0寄存器赋值为0x7f0b005e
	    const v0, 0x7f0b005e  

	    #调用方法findViewById
	    invoke-virtual {p0, v0}, Lorg/professor/helloworld/MainActivity;->findViewById(I)Landroid/view/View;
	
	    move-result-object v0
		
		#寄存器中对象的引用转成指定类型
	    check-cast v0, Landroid/widget/TextView;
		#设置p0寄存器中的对象中tvShowText字段的值为v0寄存器的值
	    iput-object v0, p0, Lorg/professor/helloworld/MainActivity;->tvShowText:Landroid/widget/TextView;
	
	    .line 31
	    return-void
	.end method
	
	.method private setListener()V
	    .locals 1
	
	    .prologue
	    .line 26
		#设置v0寄存器中的对象为p0中tvShowText字段的值
	    iget-object v0, p0, Lorg/professor/helloworld/MainActivity;->tvShowText:Landroid/widget/TextView;
	    #调用 v0的setOnClickListener
	    invoke-virtual {v0, p0}, Landroid/widget/TextView;->setOnClickListener(Landroid/view/View$OnClickListener;)V
	
	    .line 27
	    return-void
	.end method
	
	
	# virtual methods
	.method public onClick(Landroid/view/View;)V
	    .locals 2   #表示函数中2局部变量
	    .param p1, "v"    # Landroid/view/View;
	
	    .prologue
	    .line 35
	    const-string v0, "MainActivity"
	
	    const-string v1, "onClick: TextView"
	
	    invoke-static {v0, v1}, Landroid/util/Log;->d(Ljava/lang/String;Ljava/lang/String;)I
	
	    .line 36
	    iget-object v0, p0, Lorg/professor/helloworld/MainActivity;->tvShowText:Landroid/widget/TextView;
	
	    invoke-direct {p0}, Lorg/professor/helloworld/MainActivity;->getText()Ljava/lang/String;
	
	    move-result-object v1
	
	    invoke-virtual {v0, v1}, Landroid/widget/TextView;->setText(Ljava/lang/CharSequence;)V
	
	    .line 37
	    return-void
	.end method
	
	.method protected onCreate(Landroid/os/Bundle;)V
	    .locals 1
	    .param p1, "savedInstanceState"    # Landroid/os/Bundle;   #参数savedInstancestate
	
	    .prologue
	    .line 19
		
		#调用父类方法onCreate()
	    invoke-super {p0, p1}, Landroid/support/v7/app/AppCompatActivity;->onCreate(Landroid/os/Bundle;)V
	
	    .line 20
		#v0寄存器赋值为0x7f04001b
	    const v0, 0x7f04001b
		
		#调用方法setContentView()
	    invoke-virtual {p0, v0}, Lorg/professor/helloworld/MainActivity;->setContentView(I)V
	
	    .line 21
	    invoke-direct {p0}, Lorg/professor/helloworld/MainActivity;->initView()V
	
	    .line 22
	    invoke-direct {p0}, Lorg/professor/helloworld/MainActivity;->setListener()V
	
	    .line 23
	    return-void
	.end method

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325547566&siteId=291194637