Java compiler and decompiler

Programming language

Before introducing the compilation and decompilation, let's take a brief programming language (Programming Language). Programming Language (Programming Language) is divided into low-level language (Low-level Language) and high-level language (High-level Language).

Machine language (Machine Language) and assembly language (Assembly Language) belongs to the low-level language, the direct write programs in computer instructions.

The C, C ++, Java, Python, etc. belong to the high-level language, programming in Statement (Statement), the statement is an abstract representation of computer instructions.

For example, a similar statement in C language, assembly language and machine language are as follows:

The computer can only do digital operations, symbols, sounds, images must be a number inside the computer, the command is no exception, on the table in machine language entirely of hexadecimal digits. The earliest programmers directly in machine language programming, but a lot of trouble, need to find a large number of tables to determine each number represents what it means to write out the program is very intuitive, and error-prone, so with assembly language, the machine digital language mnemonic (. mnemonic) represents a group of a group, directly write the assembler mnemonic, then let the assembler (assembler) to replace the mnemonic into a digital look-up table, but also put compiled language translated into machine language.

However, assembly language with them the same complex, behind it spawned Java, C, C ++ and other high-level language.

What is compiled

There are two languages ​​mentioned above, a low-level language, a high-level language. Can such a simple understanding: low-level language is the language of computer knowledge, high-level language is the language programmers know.

So how do you switch from high-level languages ​​into low-level language? This process is actually compiled.

Can also be seen from the above example, is not a simple one to one relationship between the command C language statements and the low-level language, a a=b+1; statement to be translated into three in assembly or machine instructions, a process known as compilation (the Compile), the compiler (compiler) to complete, apparently compiler function is much more complex than assembler. Programs written in C language must be compiled into machine instructions to turn executed by a computer, the compiler takes some time, which is a disadvantage of high-level language programming, but more of an advantage. First, using C language programming easier, write code more compact and more readable and easier to correct the mistake.

The program will facilitate the process of preparation of people to read, maintain high-level computer language program written source code, the computer can translate to interpret, running low-level machine language is compiled. This process is responsible for processing tool called a compiler

Now we know what a compiler, but also know what a compiler. Different language has its own compiler, Java language compiler compiler is responsible for a command:javac

javac is included in the JDK in the Java language compiler. The tool can compile the source file suffix called suffix .java to .class byte code can run on the Java virtual machine.

When we finished a HelloWorld.javalater document, we can use the javac HelloWorld.javacommand to generate HelloWorld.classthe file, the classtype of file that JVM can identify files. Usually we think that this process is called Java language compiler. In fact, classthe file is still not able to identify the machine language, because the machine only recognizes machine language, then this also needs JVM classto convert the file type byte code into machine language can be recognized by the machine.

What is a decompiler

Decompilation process of compiling the contrary, is a programming language that has been compiled to restore uncompiled state, that is, to find the source code language. It is to convert the machine can understand the language to be able to understand the language of the programmer. Decompile Java language generally refers to classconvert the file into a javafile.

With decompiling tool, we can do many things, the most important function is to have a decompiler tool, we will be able to read it to understand Java byte code generated by the compiler. If you would like to read the bytecode dim, then I can responsibly tell you that the benefits greatly. For example, I blog a few typical principle articles are obtained by decompiling tool decompiled code analysis obtained. Such as in-depth understanding of multi-threaded (a) - Synchronized realization of the principle of in-depth analysis of Java enum types - thread safety and serialization issues enumerated, in Java Switch to integer, character, string type specific implementation details, Java's type erasure and so on. I recently wrote an article about GitChat Java syntax sugar, most of which have used anti-compiler tool to discern the principles of syntactic sugar behind.

Java decompiler

This paper describes three Java decompiler tools: javap, jad and cfr

Jvp

javapJdk is carrying a tool, the code can decompile java bytecode can view generated by the compiler. javapAnd two other decompiler biggest difference is that his generated file is not javaa file, unlike the other two tools generate code that is easier to understand. Take a simple code example, as we want to analyze in Java 7 switchis how to support String, let's have the source code can be compiled by the following:

public class switchDemoString {
    public static void main(String[] args) {
        String str = "world";
        switch (str) {
            case "hello":
                System.out.println("hello");
                break;
            case "world":
                System.out.println("world");
                break;
            default:
                break;
        }
    }
}

Execute the following two commands:

javac switchDemoString.java
javap -c switchDemoString.class

Generating the following code:

public class com.hollis.suguar.switchDemoString {
  public com.hollis.suguar.switchDemoString();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return
 
  public static void main(java.lang.String[]);
    Code:
       0: ldc           #2                  // String world
       2: astore_1
       3: aload_1
       4: astore_2
       5: iconst_m1
       6: istore_3
       7: aload_2
       8: invokevirtual #3                  // Method java/lang/String.hashCode:()I
      11: lookupswitch  { // 2
              99162322: 36
             113318802: 50
               default: 61
          }
      36: aload_2
      37: ldc           #4                  // String hello
      39: invokevirtual #5                  // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      42: ifeq          61
      45: iconst_0
      46: istore_3
      47: goto          61
      50: aload_2
      51: ldc           #2                  // String world
      53: invokevirtual #5                  // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      56: ifeq          61
      59: iconst_1
      60: istore_3
      61: iload_3
      62: lookupswitch  { // 2
                     0: 88
                     1: 99
               default: 110
          }
      88: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
      91: ldc           #4                  // String hello
      93: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      96: goto          110
      99: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
     102: ldc           #2                  // String world
     104: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
     107: goto          110
     110: return
}

I understand, javapand will not decompile bytecode javafile, but we can generate a bytecode can understand. In fact javap generated byte code file is still just able to understand some of the programmer can be a little. If you have mastered the byte code, you can still be able to understand more of the code. In fact, the String turn into hashcode, and compare.

Personally I think that, under normal circumstances we will use javapmuch time command, usually only when really need to look at the byte code will be used. But things exposed intermediate byte code is the most complete, you certainly have the opportunity to use, such as in my analysis synchronized, there is time to use the principles of javap. By javapbytecode generation, I found synchronizedthe bottom relied ACC_SYNCHRONIZEDnumerals and monitorenter, monitorexittwo instructions to achieve synchronization.

jad

jad is a relatively good decompiling tool, just download an executive tool, it can be achieved on classdecompile the file. Or the above source code, after recompilation jad follows:

command:jad switchDemoString.class

public class switchDemoString
{
    public switchDemoString()
    {
    }
    public static void main(String args[])
    {
        String str = "world";
        String s;
        switch((s = str).hashCode())
        {
        default:
            break;
        case 99162322:
            if(s.equals("hello"))
                System.out.println("hello");
            break;
        case 113318802:
            if(s.equals("world"))
                System.out.println("world");
            break;
        }
    }
}

Look, this code you must see to understand, because this is not the standard java source code for it. This can be seen very clearly in the original string of switch through equals()and hashCode()methods to achieve.

However, jad has not been updated for a long time, when the bytecode generated Java7 decompile, occasional problems that are not supported in Java lambda expressions to 8 decompile when it failed completely.

CFR

jad useful, but the frustration is not updated for a long time, we can only replace him with a new tool, CFR is a good choice, compared to jad, his grammar may be a little more complicated, but good he can work.

For example, we used to just cfr code to decompile. Execute the following command:

java -jar cfr_0_125.jar switchDemoString.class --decodestringswitch false

Get the following code:

public class switchDemoString {
    public static void main(String[] arrstring) {
        String string;
        String string2 = string = "world";
        int n = -1;
        switch (string2.hashCode()) {
            case 99162322: {
                if (!string2.equals("hello")) break;
                n = 0;
                break;
            }
            case 113318802: {
                if (!string2.equals("world")) break;
                n = 1;
            }
        }
        switch (n) {
            case 0: {
                System.out.println("hello");
                break;
            }
            case 1: {
                System.out.println("world");
                break;
            }
        }
    }
}

This code can be obtained by the switch through the string equals()and hashCode()achieved Conclusion.

Compared Jad speaking, CFR has many parameters, or just the code, if we use the following command, the output will be different:

java -jar cfr_0_125.jar switchDemoString.class
 
public class switchDemoString {
    public static void main(String[] arrstring) {
        String string;
        switch (string = "world") {
            case "hello": {
                System.out.println("hello");
                break;
            }
            case "world": {
                System.out.println("world");
                break;
            }
        }
    }
}

So --decodestringswitchexpressed support for the details of the switch string decoded. There are similar --decodeenumswitch, --decodefinally, --decodelambdasand so on. In my article on the syntactic sugar, I used --decodelambdasto be a police lambda expression decompile. Source:

public static void main(String... args) {
    List<String> strList = ImmutableList.of("Hollis", "公众号:Hollis", "博客:www.hollischuang.com");
 
    strList.forEach( s -> { System.out.println(s); } );
}

java -jar cfr_0_125.jar lambdaDemo.class --decodelambdas falseDecompile the code:

public static /* varargs */ void main(String ... args) {
    ImmutableList strList = ImmutableList.of((Object)"Hollis", (Object)"\u516c\u4f17\u53f7\uff1aHollis", (Object)"\u535a\u5ba2\uff1awww.hollischuang.com");
    strList.forEach((Consumer<String>)LambdaMetafactory.metafactory(null, null, null, (Ljava/lang/Object;)V, lambda$main$0(java.lang.String ), (Ljava/lang/String;)V)());
}
 
private static /* synthetic */ void lambda$main$0(String s) {
    System.out.println(s);
}

CFR There are many other parameters, are used in different scenarios, the reader can use java -jar cfr_0_125.jar --helpto understand. Here it is not introduced one by one.

How to prevent decompilation

Because we have the tools can Classfile decompile, therefore, for developers, how to protect Java program becomes a very important challenge. However, the magic goes, Road ridge. Of course, there is a corresponding technology can respond to decompile slightly. But here it is to point out, and as network security protection, no matter how much effort made, in fact, only increases the cost of the attacker only. Can not completely control.

A typical coping strategies are the following:

  • Isolation Java program
    • Allowing users of the reach of your Class Files
  • For Class file encryption
    • He mentioned the difficulty of guessing
  • Code obfuscation
    • Transcoding to functionally equivalent, but difficult to read and understand forms 

Guess you like

Origin www.cnblogs.com/zouwangblog/p/10984087.html