Decompilation knowledge necessary for Java development

Reprinted from  the necessary decompilation knowledge for Java development

Programming language

    Before introducing compilation and decompilation, let's briefly introduce the programming language. Programming languages ​​are divided into low-level languages ​​and high-level languages.
    Machine language (Machine Language) and assembly language (Assembly Language) are low-level languages ​​that directly use computer instructions to write programs.
    C, C++, Java, Python, etc. belong to high-level languages. Programs are written with statements, which are abstract representations of computer instructions.
For example, the same statement is expressed in C language, assembly language and machine language as follows:

    Computers can only operate on numbers. Symbols, sounds, and images must be represented by numbers inside the computer, and instructions are no exception. The machine language in the table above is entirely composed of hexadecimal numbers. The earliest programmers used machine language to program directly, but it was very troublesome. It was necessary to look up a large number of tables to determine what each number meant. The programs they wrote were very unintuitive and prone to errors, so they had assembly language and put the machine A group of numbers in the language are represented by mnemonic (Mnemonic), directly use these mnemonics to write the assembler, and then let the assembler (Assembler) look up the table to replace the mnemonic with numbers, and then assemble Language is translated into machine language.

    However, assembly language is also more complicated to use, and later, high-level languages ​​such as Java, C, and C++ are derived.


what is compilation

    There are two languages ​​mentioned above, a low-level language and a high-level language. It can be simply understood as follows: the low-level language is the language recognized by the computer, and the high-level language is the language recognized by the programmer.
So how do you convert from a high-level language to a low-level language? This process is actually compiling.
    It can also be seen from the above example that there is no simple one-to-one correspondence between C language statements and low-level language instructions. An a=b+1; statement needs to be translated into three assembly or machine instructions. This process is called compilation. (Compile), completed by the compiler (Compiler), obviously the function of the compiler is much more complicated than that of the assembler. Programs written in C language must be compiled into machine instructions before they can be executed by the computer. Compilation takes some time. This is a disadvantage of programming in high-level languages, but it is more of an advantage. First, it's easier to program in C, and the code to write is more compact, more readable, and easier to correct when something goes wrong.
    The process of translating a source code program written in a high-level computer language that is easy for humans to write, read, and maintain into a program in a low-level machine language that can be interpreted and run by a computer is compilation. The tool that handles this process is called a compiler.
    Now we know what a compilation is, and we know what a compiler is. Different languages ​​have their own compilers. The compiler responsible for compiling in the Java language is a command: javac

    javac is the Java language compiler included in the JDK. This tool can compile a source file with a suffix of .java into a bytecode with a suffix of .class that can run on the Java virtual machine.

    When we have finished writing a HelloWorld.java file, we can use the javac HelloWorld.java command to generate the HelloWorld.class file. This class type file is a file that the JVM can recognize. Usually we think of this process as the compilation of the Java language. In fact, the class file is still not a language that the machine can recognize, because the machine can only recognize the machine language, and the JVM needs to convert the bytecode of this class file type into the machine language that the machine can recognize.


what is decompilation

    The process of decompilation is just the opposite of compilation, that is, to restore the compiled programming language to an uncompiled state, that is, to find the source code of the programming language. It is to convert the language that the machine can understand into the language that the programmer can understand. Decompilation in the Java language generally refers to converting class files into java files.
    With decompilation tools, we can do a lot of things, the main function is that with decompilation tools, we can read and understand the bytecode generated by the Java compiler. If you want to ask what is the use of reading bytecode, then I can tell you responsibly, the benefits are great. For example, several typical principle articles in my blog post are obtained by analyzing the decompiled code through decompilation tools. Such as the implementation principle of Synchronized, in-depth analysis of Java's enumeration types, the specific implementation details of Switch in Java for integer, character, and string types, and Java's type erasure. I recently wrote an article on Java syntactic sugar on GitChat (see the original article for details), most of which uses decompilation tools to gain insight into the principles behind syntactic sugar.


Java decompilation tool

This article mainly introduces three Java decompilation tools: javap , jad and CFR

javap

javapIt is a tool that comes with jdk, which can decompile the code and view the bytecode generated by the java compiler. javapThe biggest difference between it and the other two decompilation tools is that the files it generates are not javafiles, and it is not as easy to understand as the code generated by the other two tools. Take a simple code as an example. If we want to analyze switchhow Java 7 is supported String, we first have the following source code that can be compiled:

public class switchDemoString {
    public static void main(String[] args) {
        String str = "world";
        switch (str) {
            case "hello":
                System.out.println("hello");
                break;
            case "world":
                System.out.println("world");
                break;
            default:
                break;
        }
    }
}

Execute the following two commands:

javac switchDemoString.java
javap -c switchDemoString.class

The generated code is as follows:

public class com.hollis.suguar.switchDemoString {
  public com.hollis.suguar.switchDemoString();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: ldc           #2                  // String world
       2: astore_1
       3: aload_1
       4: astore_2
       5: iconst_m1
       6: istore_3
       7: aload_2
       8: invokevirtual #3                  // Method java/lang/String.hashCode:()I
      11: lookupswitch  { // 2
              99162322: 36
             113318802: 50
               default: 61
          }
      36: aload_2
      37: ldc           #4                  // String hello
      39: invokevirtual #5                  // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      42: ifeq          61
      45: iconst_0
      46: istore_3
      47: goto          61
      50: aload_2
      51: ldc           #2                  // String world
      53: invokevirtual #5                  // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      56: ifeq          61
      59: iconst_1
      60: istore_3
      61: iload_3
      62: lookupswitch  { // 2
                     0: 88
                     1: 99
               default: 110
          }
      88: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
      91: ldc           #4                  // String hello
      93: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      96: goto          110
      99: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
     102: ldc           #2                  // String world
     104: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
     107: goto          110
     110: return
}

我个人的理解,javap并没有将字节码反编译成java文件,而是生成了一种我们可以看得懂字节码。其实javap生成的文件仍然是字节码,只是程序员可以稍微看得懂一些。如果你对字节码有所掌握,还是可以看得懂以上的代码的。其实就是把String转成hashcode,然后进行比较。

个人认为,一般情况下我们会用到javap命令的时候不多,一般只有在真的需要看字节码的时候才会用到。但是字节码中间暴露的东西是最全的,你肯定有机会用到,比如我在分析synchronized的原理的时候就有是用到javap。通过javap生成的字节码,我发现synchronized底层依赖了ACC_SYNCHRONIZED标记和monitorentermonitorexit两个指令来实现同步。

jad

jad是一个比较不错的反编译工具,只要下载一个执行工具,就可以实现对class文件的反编译了。还是上面的源代码,使用jad反编译后内容如下:

命令:jad switchDemoString.class

public class switchDemoString
{
    public switchDemoString()
    {
    }
    public static void main(String args[])
    {
        String str = "world";
        String s;
        switch((s = str).hashCode())
        {
        default:
            break;
        case 99162322:
            if(s.equals("hello"))
                System.out.println("hello");
            break;
        case 113318802:
            if(s.equals("world"))
                System.out.println("world");
            break;
        }
    }
}

看,这个代码你肯定看的懂,因为这不就是标准的java的源代码么。这个就很清楚的可以看到原来字符串的switch是通过equals()hashCode()方法来实现的

但是,jad已经很久不更新了,在对Java7生成的字节码进行反编译时,偶尔会出现不支持的问题,在对Java 8的lambda表达式反编译时就彻底失败,比如会直接

CRF

jad很好用,但是无奈的是很久没更新了,所以只能用一款新的工具替代他,CFR是一个不错的选择,相比jad来说,他的语法可能会稍微复杂一些,但是好在他可以work。

如,我们使用cfr对刚刚的代码进行反编译。执行一下命令:

java -jar cfr_0_125.jar switchDemoString.class --decodestringswitch false

得到以下代码:

public class switchDemoString {
    public static void main(String[] arrstring) {
        String string;
        String string2 = string = "world";
        int n = -1;
        switch (string2.hashCode()) {
            case 99162322: {
                if (!string2.equals("hello")) break;
                n = 0;
                break;
            }
            case 113318802: {
                if (!string2.equals("world")) break;
                n = 1;
            }
        }
        switch (n) {
            case 0: {
                System.out.println("hello");
                break;
            }
            case 1: {
                System.out.println("world");
                break;
            }
        }
    }
}

通过这段代码也能得到字符串的switch是通过equals()hashCode()方法来实现的结论。

相比Jad来说,CFR有很多参数,还是刚刚的代码,如果我们使用以下命令,输出结果就会不同:

java -jar cfr_0_125.jar switchDemoString.class

public class switchDemoString {
    public static void main(String[] arrstring) {
        String string;
        switch (string = "world") {
            case "hello": {
                System.out.println("hello");
                break;
            }
            case "world": {
                System.out.println("world");
                break;
            }
        }
    }
}

所以--decodestringswitch表示对于switch支持string的细节进行解码。类似的还有--decodeenumswitch--decodefinally--decodelambdas等。在我的关于语法糖的文章中,我使用--decodelambdas对lambda表达式警进行了反编译。 源码:

public static void main(String... args) {
    List<String> strList = ImmutableList.of("Hollis", "公众号:Hollis", "博客:www.hollischuang.com");

    strList.forEach( s -> { System.out.println(s); } );
}

java -jar cfr_0_125.jar lambdaDemo.class --decodelambdas false反编译后代码:

public static /* varargs */ void main(String ... args) {
    ImmutableList strList = ImmutableList.of((Object)"Hollis", (Object)"\u516c\u4f17\u53f7\uff1aHollis", (Object)"\u535a\u5ba2\uff1awww.hollischuang.com");
    strList.forEach((Consumer<String>)LambdaMetafactory.metafactory(null, null, null, (Ljava/lang/Object;)V, lambda$main$0(java.lang.String ), (Ljava/lang/String;)V)());
}

private static /* synthetic */ void lambda$main$0(String s) {
    System.out.println(s);
}

CFR还有很多其他参数,均用于不同场景,读者可以使用java -jar cfr_0_125.jar --help进行了解。这里不逐一介绍了。


如何防止反编译

由于我们有工具可以对Class文件进行反编译,所以,对开发人员来说,如何保护Java程序就变成了一个非常重要的挑战。但是,魔高一尺、道高一丈。当然有对应的技术可以应对反编译咯。但是,这里还是要说明一点,和网络安全的防护一样,无论做出多少努力,其实都只是提高攻击者的成本而已。无法彻底防治。

典型的应对策略有以下几种:

  • 隔离Java程序

    • 让用户接触不到你的Class文件

  • 对Class文件进行加密

    • 提到破解难度

  • 代码混淆 

    • 将代码转换成功能上等价,但是难于阅读和理解的形式






Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325580558&siteId=291194637