Rambling: how to explain to his girlfriend what is the compiler and decompiler

One day after work, I conducted a phone interview at home, the interviewer asked this question: "Do you know which way to use can decompile Java code?." But the interviewer's answer is not good, so I wrote in the evaluation of the interview: "The understanding of compiler theory of knowledge is not thorough." At this time, his girlfriend saw this sentence.

Computer Languages

Computer Language (Computer Language) refers to the language of communication between people and computers. Computer language is the medium to transfer information between people and computers.

Wherein the computer system is the largest by means of a command language communicated to the machine. In order to make the computer carry out all kinds of work, we need to have a digital writing computer programs, characters and grammar planning, a variety of computer instructions (or various statements) of these characters and grammar rules. These are the computer can accept the language.

Computer languages ​​include machine language, assembly language and high-level language three.

Machine language

Machine language is represented by a binary code computer can directly execute and identifying a set of machine instructions. Machine language is flexible, fast and direct execution speed and so on. But different types of machine language that the computer is not connected, pressing machine instructions a computer program of preparation can not be executed on another computer.

Because the machine language is binary representation, the program to compile the code for all such instructions 0 and 1.

The advantage of machine language that can be directly recognized by the computer and executed, more efficient, but it also has many disadvantages, such as:

  • 1, a machine known only 0 and 1, the programmer's hard to remember what each instruction turn into a combination of 0 and 1, the need to find a large number of tables to determine each digit What does it mean

  • 2, because it is written in the form of all "secret" code, so poor readability is not easy to exchange and cooperation.

  • 3, since it heavily depends on the particular computer, the portability is poor, poor reusability.

As the machine language there are so many drawbacks, so with assembly language.

Assembly language

Assembly language mnemonic (Mnemonics) instead of representing the operation of a particular low-level machine language.

Mnemonic (mnemonic) is for people to remember, and to describe the function and symbolic instruction instruction operands, the instruction mnemonic indicates that the function of English words or abbreviations. As represented by an adder ADD, MOV denotes transmission, SUB denotes subtraction and the like.

However, assembly language only allows users that programmers are easier to remember and use, the computer does not know assembly language, so you want the computer to execute the assembly code, the assembler will need to first convert them into executable machine language code. This process is called the assembly process.

Since the machine is closer to the assembler language, it is possible to operate directly on the hardware, the program generated compared to other languages ​​have a higher speed, takes up less memory, and therefore the longevity in some demanding procedures, many large the core module and industrial control program of a large number of applications.

Machine language and assembly language, the two are almost little or no make any abstract syntax, the language we commonly referred to as low-level language, which is more close to the hardware, but not between different hardware migration.

But with modern software systems increasingly large and complex, after a large number of high-level language, such as encapsulation of C / C ++, Pascal / Object Pascal also came into being. These new language allows programmers in the development process easier and more efficient, so that software developers to meet the demands of rapid software development.

High-level language

High-level language is highly encapsulate programming language, as opposed to low-level languages.

It is human language daily based on a programming language, using words most people readily accepted to represent (eg Chinese characters, English or other foreign languages ​​irregular), so that the writers, the program easier to write, may have a higher readability, to facilitate people - computer lighter also can probably understand its contents.

Such as the popular java, c, c ++, C #, pascal, python, lisp, prolog, FoxPro, easy language, learning Chinese language version of the C language, etc., these language syntax, command format is not the same.

With assembly language, high-level language from machine language far more, eh computer can not directly identify the high-level language. So, you want the computer to execute high-level language, you need to be translated into machine language.

Programming languages ​​from machine language to high-level abstract language, the main benefits mainly in the following areas:

  • 1, high-level language close to the algorithmic language, easy to learn, easy to master, general engineering and technical personnel as long as a few weeks of training you can do the job programmer;

  • 2, high-level language programmers to provide a structured programming environment and tools, making the design out of the good readability, maintainability, high reliability;

  • 3, away from the high-level language machine language, specific computer hardware has little to do, so programs written out of the portability, reusability rate;

  • 4, due to the complex trivial matters to the compiler to do, so a high degree of automation, short development cycle, and the programmers get relief, you can focus time and energy for them to engage in creative work is more important to improve the quality of the program.

Compile

There are two languages ​​mentioned above, a low-level language, a high-level language. Can such a simple understanding: low-level language is the language of computer knowledge, high-level language is the language programmers know.

So, how to write high-level language programmers converted into low-level knowledge of computer language and then let the computer to perform it?

This process is actually a compilation!

The main purpose is to compile facilitate people to write, to read, maintain high-level language source code writing program, translated into the computer can interpret, running low-level language program that is executable.

Java language compiler

Java language as a high-level language, want to be executed, you need to convert it into machine language by means of compiling.

Java language source file is a java file, a java file you want to convert a binary file to a total of two steps.

First, after the front end of the compiler, the java files compiled into intermediate code, this code is the intermediate class files, i.e., byte code file.

Then, after the rear end of the machine language compiler, the byte code class files compiled.

Java compiler front end are mainly javac, Eclipse JDT in incremental compiler ECJ like.

Java back-end compiler is mainly the major virtual machine implementation, such as the HotSpot JIT compiler.

Decompile

Recall that we can pass through compilers, the high-level language source code is compiled into low-level language, on the contrary, we can also reverse engineer by low-level language, access to its source code. This process is called decompilation.

Although it is difficult to machine language we decompile the source code, however, we can decompile the intermediate code. Although we can not just put through the machine language of the virtual machine compiler decompile, but we get the class javac compile decompile or feasible.

So, we say Java decompiler, generally is to convert the file into a java class files.

Decompile role

First, decompile is a good means for learning Java is.

Because Java as a programming language, provides a lot of syntactic sugar, such as generics, auto-boxing and unboxing, and these syntactic sugar Java virtual machine is not known, so javac compile time, it will be the solution of sugar while the class file after the code is obtained sugar solution, this time we put this class file decompile sugar solution, you can get a java file, java file from this, we can learn to these syntactic sugar in the end is how to achieve.

Secondly, with decompilation tool, we can put other people's code decompile, then learning someone else's code is how to achieve. Or you may find the bug by the source code, making the plug and the like.

Decompiling tool

There are a lot of Java decompiler tool, here are a few simple

Jvp

javap jdk is carrying a tool can decompile code, java bytecode can view generated by the compiler. javap resulting file is not java file, but the programmer can see class bytecode file understand.

jad

jad is a relatively good decompiling tool, just download an executive tool, it can be achieved on the anti-compiled class files.

jad file that can decompile java class files.

However, jad has not been updated for a long time, when the bytecode generated Java7 decompile, occasional problems that are not supported in Java lambda expressions to 8 decompile when it failed completely.

Address: http: //www.javadecompilers.com/jad

CFR

jad useful, but not updated for a long time, we can only replace him with a new tool, CFR is a good choice, compared to jad, his grammar may be a little more complicated, but fortunately he can work.

Address: http: //www.benf.org/other/cfr/index.html

JD-GUI

JD-GUI is a standalone graphical utility that displays Java source code ".class" files. You can use the JD-GUI browser reconstruction of the source code for instant access to methods and fields.

How to prevent decompilation

Because we have the tools can decompile Class files, so, for developers, how to protect Java program becomes a very important challenge.

However, the magic goes, Road ridge. Of course, there is a corresponding technology can respond to decompile.

But here it is to point out, and as network security protection, no matter how much effort made, in fact, only increases the cost of the attacker only. Can not completely control.

A typical coping strategies are the following:

  • Isolation Java program that allows users of the reach of your Class Files

  • For Class encrypt files, mentioned the difficulty of guessing

  • Code obfuscation, transcoding to functionally equivalent, but difficult to read and understand forms

Such as distributed transaction middleware Ali Baba to open the jar package is encrypted by obfuscation techniques, decompile reads as follows:



Guess you like

Origin juejin.im/post/5ceb4dd6e51d454fd8057b07