In-depth study of smali grammar

The author focuses on the field of Android security. Welcome to pay attention to my personal WeChat public account " Android Security Engineering " (click to scan the code to follow). The personal WeChat public account mainly focuses on the security protection and reverse analysis of Android applications, sharing various security attack and defense methods, Hook technology, ARM compilation and other Android-related knowledge. Follow my personal WeChat and witness the rise of Android security giants~

foreword

When injecting code into an apk file, we are often faced with the decompiled smali code instead of the direct Java source code file, so it is necessary to understand the basics of smali syntax. Here first introduce the Dalvik virtual machine: Dalvik is a virtual machine specially designed by Google for the Android platform. Although Android programs can be developed using the Java language, Dalvik VM and Java VM are two different virtual machines. The Dalvik VM is register-based, while the Java VM is stack-based. Dalvik VM has a special file execution format dex (Dalvik Executable), while Java VM executes Java bytecode. DVM is faster and takes less space than JVM.

smali file structure

The following smali code is taken from a test demo (obtained by decompiling the .apk file with apktool, here is an introduction to the smali syntax format), the purpose is to have a general understanding of the content structure of the smali file, which is beneficial to the later There is an overall grasp when explaining the grammar details.

.class public abstract Lcom/happy/learnsmali/BaseActivity;
.super Landroidx/appcompat/app/AppCompatActivity;
.source "BaseActivity.kt"

# interfaces
.implements Lcom/happy/learnsmali/action/ActivityAction;
.implements Lcom/happy/learnsmali/action/ClickAction;
.implements Lcom/happy/learnsmali/action/HandlerAction;
.implements Lcom/happy/learnsmali/action/BundleAction;
.implements Lcom/happy/learnsmali/action/KeyboardAction;


# annotations
.annotation system Ldalvik/annotation/MemberClasses;
    value = {
        Lcom/happy/learnsmali/BaseActivity$Companion;,
        Lcom/happy/learnsmali/BaseActivity$OnActivityCallback;
    }
.end annotation

.annotation system Ldalvik/annotation/SourceDebugExtension;
    value = "SMAP\nBaseActivity.kt\nKotlin\n*S Kotlin\n*F\n+ 1 BaseActivity.kt\ncom/happy/learnsmali/BaseActivity\n+ 2 fake.kt\nkotlin/jvm/internal/FakeKt\n*L\n1#1,179:1\n1#2:180\n*E\n"
.end annotation

# static fields
.field public static final Companion:Lcom/happy/learnsmali/BaseActivity$Companion;

.field public static final RESULT_ERROR:I = -0x2


# instance fields
.field private final activityCallbacks$delegate:Lkotlin/Lazy;


# direct methods
.method public static synthetic $r8$lambda$mAxgPA6JBXhjuhBfNvUeqmKUmlk(Lcom/happy/learnsmali/BaseActivity;Landroid/view/View;)V
    .locals 0

    invoke-static {p0, p1}, Lcom/happy/learnsmali/BaseActivity;->initSoftKeyboard$lambda-0(Lcom/happy/learnsmali/BaseActivity;Landroid/view/View;)V

    return-void
.end method

.method static constructor <clinit>()V
    .locals 2

    new-instance v0, Lcom/happy/learnsmali/BaseActivity$Companion;

    const/4 v1, 0x0

    invoke-direct {v0, v1}, Lcom/happy/learnsmali/BaseActivity$Companion;-><init>(Lkotlin/jvm/internal/DefaultConstructorMarker;)V

    sput-object v0, Lcom/happy/learnsmali/BaseActivity;->Companion:Lcom/happy/learnsmali/BaseActivity$Companion;

    return-void
.end method

.method public constructor <init>()V
    // ...
.end method

In the above code, if you are new to the smali code, it is normal if you are confused. I will analyze it below. Understanding the meaning of these symbols will help us inject code when we decompile the apk Time to achieve twice the result with half the effort.

Inheritance, interface, package information in smali

First, let's look at the first few lines:

.class public abstract Lcom/happy/learnsmali/BaseActivity; // .class 表示类路径 包名+类名
.super Landroidx/appcompat/app/AppCompatActivity;		   // .super 表示父类的路径
.source "BaseActivity.kt"								   // 表示源码文件名

# interfaces
.implements Lcom/happy/learnsmali/action/ActivityAction;
.implements Lcom/happy/learnsmali/action/ClickAction;
.implements Lcom/happy/learnsmali/action/HandlerAction;
.implements Lcom/happy/learnsmali/action/BundleAction;
.implements Lcom/happy/learnsmali/action/KeyboardAction;


# annotations
.annotation system Ldalvik/annotation/MemberClasses;
    value = {
        Lcom/happy/learnsmali/BaseActivity$Companion;,
        Lcom/happy/learnsmali/BaseActivity$OnActivityCallback;
    }
.end annotation

Lines 1-3 define basic information : indicates the smali file (third line) obtained by decompiling the source file BaseActivity.kt, the file path is located at com/happy/learnsmali/ (second line), inherited from androidx/appcompat/app/ AppCompatActivity (third line).

Lines 5-9 define interface information : Indicates that the interface classes implemented by the BaseActivity class are:

  • com/happy/learnsmali/action/ActivityAction
  • com/happy/learnsmali/action/ClickAction
  • com/happy/learnsmali/action/HandlerAction
  • com/happy/learnsmali/action/BundleAction
  • com/happy/learnsmali/action/KeyboardAction

Lines 11-16 define inner classes : Indicates that the BaseActivity class has two inner classes – Companion and OnActivityCallback.

After analyzing the file information at the beginning of smali, we can construct java code based on this:

class BaseActivity extends AppCompatActivity 
    implements ActivityAction, ClickAction, HandlerAction, BundleAction, KeyboardAction {
    
    
    
    class Companion {
    
    
        // ...
    }
    
    class OnActivityCallback {
    
    
        // ...
    }
}

Other methods

# virtual methods   //Representation is a virtual method
.method protected onCreate(Landroid/os/Bundle;)V
    .locals 1
    .param p1, "savedInstanceState"    # Landroid/os/Bundle;

    .line 10
    invoke-super {p0, p1}, Landroid/app/Activity;->onCreate(Landroid/os/Bundle;)V

    .line 11
    const/high16 v0, 0x7f050000

    invoke-virtual {p0, v0}, Lcom/justart/samlidemo/MainActivity;->setContentView(I)V

    .line 12
    return-void
.end method
  • The method .methodstarts with and .end methodends with ;
  • The last V in the first line indicates that the return type is void;
  • The method parameter Landroid/os/Bundle; indicates that the parameter of the method onCreate() is Bundle type;
  • .param indicates that the parameter name of the method is savedInstanceState;
  • Finally return-void indicates that the returned value type is void;

type of data

  • byte:B
  • char:C
  • double:D
  • float:F
  • int:I
  • long:J
  • short:S
  • void:V
  • boolean:Z
  • array:[XXX
  • Object:Lxxx/yyy

I believe that with a JNI foundation, the above data types will be easy to understand. Here are the last two items above:

array:[XXX

Add before the basic type [to indicate the array type, for example, int array and byte array are [I, [B.

Object:Lxxx/yyy

Types starting Lwith are represented as objects, such as String objects are represented as Ljava/lang/String;(the object type needs to be followed by a semicolon), where java/lang represents the java.lang package, and String represents an object under the package path.

There may be doubts about children's shoes here. If the class is represented by Ljava/lang/String;, how should the inner class be defined in smali? The symbol flashed through the mind of children's shoes who may have used Java reflection $. Yes, is also used in smali syntax Ljava/lang/String$xxx;to indicate that xxx is an internal class of the String class.

register

One of the biggest differences between the Dalvik VM and the JVM is that the Dalvik VM is register-based. What does register-based mean? Personal understanding is a bit similar to assembly language, which stores and transfers data through registers. In smali, local registers are represented by letters starting with v + numbers, such as v0, v1, v2, ..., while parameter registers are represented by starting p + numbers, such as p1, p2, p3, .... In particular, the p0 parameter register does not necessarily represent the first parameter. In non-static functions, p0 represents this, p1 represents the first parameter, and p2 represents the second parameter in the function. In the static function, p0 corresponds to the first parameter ( because Java's static method has no concept of object ). There is no limit to local registers, and theoretically they can be used arbitrarily.

Member variables

Let's continue to introduce the content about member variables:

# static field
.field private static final PREFS_INSTALLATION_ID:Ljava/lang/String; = "installationId"
//...

# instance field
.field private _activityPackageName:Ljava/lang/String;

Both the static field and the instance field defined above are member variables, and the format is:

.field pubilc/private [static] [final] varName:<类型>

Although both static field and instance field are member variables, they are still different. Of course, the most obvious difference is whether it is related to objects. Static field is a class-level concept, while instance field is an object-level concept.

The appearance of member variables means that there are variable assignments and values. In smali syntax, the value instruction includes: iget, sget, iget-boolean, sget-boolean, iget-object, sget-object, etc., and the assignment instruction includes: iput, sput, iput-boolean, sput-boolean, iput- object, sput-object, etc.

iget / iput represent the value and assignment of instance field member variables respectively;

sget / sput represent the value and assignment of static field member variables respectively;

Whether it is an instance field or a static field member's fetching and assignment instruction can be judged according to the prefix of the instruction . With -objectthe suffix, it means that the member variable is an object type, and without the suffix, it means that the basic data type is operated. In particular, the boolean primitive data type uses the -booleansuffix.

Here is an example:

const/4 v0, 0x0  
iput-boolean v0, p0, Lcom/disney/xx/XxActivity;->isRunning:Z

In the above example, the v0 local register is used, and 0x0 is passed to the v0 local register, and then the second sentence uses the iput-booleaninstruction to transfer the value in the v0 register to com.disney.xx.XxActivitythe member variable of isRunning. That is to say, it is equivalent to: this.isRunning = false;(As mentioned above, p0 is represented as an object instance in a non-static function this, but here it is represented as com.disney.xx.XxActivityan object instance of ).

static field member variable

sget-object v0, Lcom/disney/xx/XxActivity;->PREFS_INSTALLATION_ID:Ljava/lang/String;

Operation instructions sget-objectare used to obtain static member variables and save them in the immediate local parameter list. Here, the value of com.disney.xx.XxActivitythe static member located in the class PREFS_INSTALLATION_IDis passed to the local register v0.

instance field member variable

iget-object v0, p0, Lcom/disney/xx/XxActivity;->_view:Lcom/disney/common/WMWView;

Operation instructions iget-objectare also used to obtain class member variables and store them in the immediate local parameter list. Here, com.disney.xx.XxActivitythe object members in the class _vieware assigned to the local registers v0.

By observing the above static field static member variables and instance field class member variables , the following format can be summarized:

** <local register>, [<parameter register>], <class variable to which the variable belongs> ->varName:<variable type> **

The format of the put command is similar to that of the get command mentioned above, here you can directly look at the following example:

const/4 v3, 0x0  
sput-object v3, p0, Lcom/disney/xx/XxActivity;->globalIapHandler:Lcom/disney/config/GlobalPurchaseHandler;

Java code representation: this.globalIapHandler = null; (null = 0x0)

.local v0, wait:Landroid/os/Message;  
const/4 v1, 0x2  
iput v1, v0, Landroid/os/Message;->what:I

Java code representation: wait.what = 0x2; (wait is an instance of Message)

function call

The format of the function definition:

function (type1type2type3…)RetValue

It should be noted that the parameter type of the function needs to be defined as the type in the smali syntax, and there must be no other separators between the parameters. Examples are as follows:

helloSmali ()V
meansvoid helloSmali()

helloSmall ([BI)Z
displayboolean helloSmali(byte[], int)

helloSmali (ZLjava/lang/String;[I[I)V
displayvoid helloSmali(boolean, String, int[], int[])

In smali, functions and member variables are also divided into two types, but different from static field static member variables and instance field class member variables in member variables, functions are direct method and virtual method . So what is the difference between the direct method and the virtual method of the function? In simple terms, direct method is private function, and virtual method is public and protect function.

So when calling a function, there are several different instructions such as invoke-direct, , invoke-virtualand . At the same time, there is also an instruction, which is an instruction called when the number of parameters passed is greater than 4.invoke-staticinvoke-superinvoke-interfaceinvoke-XXX/range

invoke-static

invoke-static {}, Lcom/disney/xx/UnlockHelper;->unlockCrankypack()Z

invoke-static means calling a class static function. The Java code is expressed as: UnlockHelper.unlockCrankypack(), notice here that invoke-static {}is immediately followed by the instance + parameter list that calls the method . Since this method does not require parameters and is also a class static method, it is {}empty. Let’s look at another example :

const-string v0, "fmodex"  
invoke-static {v0}, Ljava/lang/System;->loadLibrary(Ljava/lang/String;)V

What is called here is static void System.loadLibrary(String)to load the so library, and v0 means to pass parameters fmodex.

invoke-super

Indicates the instruction used to call the parent class method, which can be seen in the overloaded method.

invoke-direct

Indicates the method of calling a private function, such as:

invoke-direct {p0}, Lcom/disney/xx/XxActivity;->getGlobalIapHandler()Lcom/disney/config/GlobalPurchaseHandler;

The GlobalPurchaseHandler getGlobalIapHandler() here means that getGlobalIapHandler() is a method defined in the XxActivity class with private permission.

invoke-virtual

Indicates that a protected or public function is called.

sget-object v0, Lcom/disney/xx/XxActivity;->shareHandler:Landroid/os/Handler;  
invoke-virtual {v0, v3}, Landroid/os/Handler;->removeCallbacksAndMessages(Ljava/lang/Object;)V

Here v0 can be expressed as shareHandler:Landroid/os/Handler, and v3 is expressed as the Ljava/lang/Object; type parameter of the removeCallbacksAndMessages method.

invoke-xxxxx/range

Indicates that when the method parameter >= 5, it needs to be added later /range.

Some children's shoes may notice that the above examples are all in 调用函数this operation, it seems that there is no operation to get the return value of the function? In the smali code, if the called function returns non-void, you also need to use move-result(return basic data type) and move-result-object(return object):

const/4 v2, 0x0  
invoke-virtual {p0, v2}, Lcom/disney/xx/XxActivity;->getPreferences(I)Landroid/content/SharedPreferences;  
move-result-object v1

v1 represents the object of type SharedPreferences returned by calling this.getPreferences(0) method.

invoke-virtual {v2}, Ljava/lang/String;->length()I  
move-result v2

v2 represents the int primitive type returned by String.length().

example analysis

The above preliminarily analyzes the function variables, method definitions, and calls. The following uses examples to further analyze the smali syntax:

.method protected onDestroy()V
    .locals 0

    .line 79
    invoke-super {p0}, Landroidx/appcompat/app/AppCompatActivity;->onDestroy()V

    .line 80
    invoke-virtual {p0}, Lcom/happy/learnsmali/BaseActivity;->removeCallbacks()V

    .line 81
    return-void
.end method

This is the onDestroy() function we are familiar with. First of all, we see the first sentence in the function: .locals 0, indicating the number of local registers used in this function . Here, the number of local registers is 0 because the called method does not use local local registers. If I add: this.isExited = true in that method, then the above method should be modified to:

.method protected onDestroy()V
    .locals 1

    .line 79
    invoke-super {p0}, Landroidx/appcompat/app/AppCompatActivity;->onDestroy()V

    .line 80
    invoke-virtual {p0}, Lcom/happy/learnsmali/BaseActivity;->removeCallbacks()V
    
    .line 81
    const/4 v0, 0x1
    iput-boolean v0, p0, Lcom/happy/learnsmali/BaseActivity;->exited:Z

    .line 82
    return-void
.end method

Because the modified onDestroy() function uses a local register v0, it is .locals 0changed to .locals 1. In addition, you may also notice the identifier .line, which indicates the line number of the line of code corresponding to smali in Java. Usually when we debug the program on Android Studio and crash, the line number of the code where the crash occurs in logcat is also the value. Of course, this identifier is not required, but it is recommended to keep it for the convenience of debugging.

data sharing

At the end of the article, the author shares the materials written and organized in the process of learning smali grammar with friends in need:

image-20220207205417745

image-20220207205710461

image-20220207205734440

Contains some more basic detailed operators of smali (can be used as a manual query), how to reverse the steps of an APP, etc.

Obtaining method: Search on WeChat, follow the public account Android Security Engineering , and then reply to the smali keyword to obtain.

Guess you like

Origin blog.csdn.net/HongHua_bai/article/details/122815018