Take you step by step to understand the class file

 

Language independence of JVM

Platform independence is based on the operating system. Virtual machine vendors provide many virtual machines that can run on various platforms. They can all load and execute bytecode, so as to realize the "write once, run everywhere" of the program.

ByteCode, a program storage format uniformly used by virtual machines of various platforms and platforms, is the cornerstone of platform independence and the foundation of language independence. The Java virtual machine is not bound to any language including java, it is only associated with the specific binary file format "Class file". The Class file contains the Java virtual machine refers to the collection symbol and a number of other auxiliary information.

image.png

The evolution process and location of the Class file

image.png

Java technology has been able to maintain very good backward compatibility, which is indispensable for the stability of the Class file structure. Java has developed to version 14, but most of the memory of the class file structure has been defined in JDK1.2. Although the content of JDK1.2 is relatively old, the development of Java has gone through more than ten major versions, but each time it is basically only new content and extensions based on the original structure, and the undefined content is modified.

Any Class file corresponds to the definition information of only one class or interface, but on the other hand, the Class file does not necessarily exist in the form of a disk file (for example, it can be dynamically generated or sent directly to the class loader) . Class file is a set of binary streams based on 8-bit bytes.

The content of the Class file structure is less during the interview, but as a senior java developer, we must understand it.

Class file format

Now we know the meaning of the existence of the Class file and the location of the entire java running process. So, what is the structure of the Class file structure that we rarely care about?

Below, we start with a piece of code and take a look at the "True Face of Lushan Mountain" in the corresponding Class file.

image.png

First, give a simple java program and compile it. Find the compiled Class file address.

image.png

image.png

The above is the structure diagram of the class file that we opened with hexadecimal. The file is stored in binary system, with 8 bytes as a group, so it is displayed in hexadecimal.

The various data items are arranged in the Class file in strict order, without adding any separators, which makes the content stored in the entire Class file almost all necessary parameters for program operation, and there is no gap.

The Class file format uses a pseudo structure similar to the C language structure to store data. This pseudo structure has only two data types: unsigned numbers and tables.

Unsigned numbers belong to the basic data type. U1, u2, u4, and u8 represent 1 byte respectively (a byte is composed of two hexadecimal digits. For example, cafe babe: c is a hexadecimal, a is A hexadecimal and so on. CA constitutes a byte), 2 bytes, 4 bytes, 8 bytes of unsigned numbers, unsigned numbers can be used to describe numbers, index references, number values ​​or according to UTF-8 encoding constitutes a string value.

A table is a compound data type composed of multiple unsigned numbers or other tables as data items. Correspondence can refer to the figure below

image.png

All tables habitually end with "_info", that is, surrounded by two "_info" can be regarded as the data of a table. Tables are used to describe data with hierarchical relationships and composite structures. The entire Class file is essentially a combination of tables one by one.

Class file format detailed analysis

The structure of the Class file is not like a description language such as XML. Since it does not have any separators, the data items in it, no matter the order or quantity, are strictly limited. Which byte represents what the meaning, length, and order are not allowed to be changed. Include in order:

Magic number

image.png

The first 4 bytes (U4) of each Class file is called the Magic Number, and its only function is to determine whether the file is a Class file that can be accepted by the virtual machine. The use of magic numbers instead of extensions (extension.java extension.class extension.jar) for identification is mainly based on security considerations. Because the extension can be changed at will. (The implication is that cafe babe is the only sign that proves that this is a class file).

version

The next four bytes, the first U2 (the 5th and 6th bytes) is the minor version number (MinorVersion), and the second U2 (the 7th and 8th bytes) is the major version number (MajorVersion). The Java version is remembered from 45. After JDK1.1, the major version number of each JDK major version is increased by +1. The higher version can be backward compatible with the previous version of the Class file, but cannot run the later version of the Class file, even if the file format has not changed, the virtual machine will Must refuse to execute Class files exceeding its version number. image.pngIt represents JDK1.8 (34 in hexadecimal, 52 JDK1.1---45 JDK1.8---52 when replaced by decimal).

Constant pool

The number of constants in the constant pool is not fixed, so a u2 type data needs to be placed at the entrance of the constant pool, which represents the constant pool capacity count value (constant_pool_count). Unlike the Java habit, this capacity counter starts from 1 instead of 0. That is, 1 represents no constant, and 2 has one. 0 can indicate that some data pointing to the index value of the constant pool needs to express the meaning of "not referencing any constant pool item" under certain circumstances.

It can be seen from the figure that the actual value of 16 here is 16-1=15.

image.png

Use Javap -v to decompile the result

There are 15 data in the constant pool.

Each type of constant in the constant pool is a table. As of JDK13, there are 17 different types of constants in the constant table.

There are two main types of constants stored in the constant pool: Literal and Symbolic References.

Literals are closer to the concept of constants in the Java language, such as strings, constant values ​​declared as final, and so on.

We use the Jclasslib tool to decompile this Class file and check the data in the constant pool.

image.png

Symbolic references again

We know that when our program is running, the objects in our heap need to call methods by directly referencing the type pointer stored in the object header to find the address of the specific method in the method area.

Then, in the class loading process, we need to load the Class file into our runtime data area, we need to use our symbol reference. Understand the concept of symbolic reference, so what details of the specific symbolic reference help us find the address?

Symbol references include fully qualified names of classes and interfaces (Fully Qualified Name), field names and descriptors (Descriptor), method names and descriptors.

Access flag (what keyword is modified to identify the class)

Access information used to identify some classes or interface levels, including: whether this Class is a class or an interface: whether it is defined as a public type, whether it is defined as an abstract type: if it is a class, whether it is declared as final, etc. Recognition class modifier

Class index, parent class index and interface index collection

These three data determine the inheritance relationship of this class.

The class index is used to determine the fully qualified name of this class, and the parent class index is used to determine the fully qualified name of the parent class of this class. Since the Java language does not allow multiple inheritance, there is only one parent class index. All classes except java.lang.Object have a parent class. Therefore, except for java.lang.Object, the parent class index of all Java classes is not 0. The interface index collection is used to describe which interfaces this class implements. These implemented interfaces are based on the implements statement (if the class itself If it is an interface, it should be an extends statement) The subsequent interface sequence is arranged in this interface index set from left to right.

Field table collection

Describe the variables declared in the interface or class. Fields include class-level variables (global-level variables or static variables, which need to be modified with the static keyword) and instance-level variables (member variables, memory space will be allocated only after instantiation). access).

The modifiers that a field can include include field scope (public, private, protected). Is it a member or a class variable (static), whether it is final (final), concurrency visibility (volatile), and whether it can be serialized (transient). This information is in the form of Boolean values ​​in the field table. With this modifier, it is 1, and without it, it is 0.

What is the name of the field and what data type the field is defined as, these cannot be fixed. Therefore, it can only be described by referring to the constants in the constant pool.

The field table collection will not list the fields inherited from the superclass or the parent interface, but it is possible to list the fields that do not exist in the original Java code. For example, in order to maintain accessibility to the external class in the internal class, a field pointing to an instance of the external class will be automatically added.

Method table collection

The description of the method and the description of the fields in the Class file storage format are almost completely consistent. The structure of the method table is the same as the field table. Including access flag, name index, descriptor index, and attribute collection table items at one time.

It describes the definition of the method, but the Java code in the method is compiled into bytecode instructions by the compiler and stored in an attribute named "Code" in the method attribute table set in the attribute table set. It is similar to the field table collection. If the parent class method is not overridden in the child class (Override), there will be no method information from the parent class in the method set. But in the same way, there may be methods automatically added by the compiler. The most typical ones are the class constructor "<client>" method and the instance constructor <"init">

Attribute table collection

Store Class files, field tables, and party publications all have their own collection of attribute tables, which are used to describe specific information for certain scenarios. For example, the code of the method is stored in the Code attribute table.

Bytecode instruction

Bytecode instruction

Bytecode instructions belong to the content of the method table.

image.png

image.png

The instruction of the Java virtual machine consists of a byte-length number that represents the meaning of a specific operation (called Opcode), followed by zero to more parameters that represent the operation required (called Operands, Operands). ) And constitute.

Since the length of the opcode of the Java virtual machine is limited to one byte (ie 0~255), this means that the total number of opcodes in the instruction set cannot exceed 256.

Most instructions contain the data type information corresponding to their operations. E.g:

The iload instruction is used to load int type data from the local variable table to the operand stack, while the fload instruction loads float type data (not listed here).

Most of the instructions do not support the integer types byte, char, and short, and no instructions even support the boolean type. Most operations on boolean, byte, short and char type data actually use the corresponding int type as the operation type

Reading bytecode is a basic skill for understanding the Java virtual machine, and you can master common instructions if necessary.

Bytecode mnemonic code explanation address: https://cloud.tencent.com/developer/article/1333540

 

Guess you like

Origin blog.csdn.net/weixin_47184173/article/details/109733873