java class file format parsing

foreword

About 5 years ago, I wanted to study the relevant class libraries for bytecode operations such as javaassistant, cglib, etc. to enhance the class. When it came time to operate the bytecode, I found that I couldn't continue, and I had to give up.

To learn the jvm code, you need to understand the composition of the class, and have a better understanding of the assembly and operation stack. I have no choice but to re-learn the compilation principle, assembly and other knowledge, and then look at the jvm specification. It is much easier to understand now.

Class file specification

Code that is compiled and executed by the Java Virtual Machine uses a platform neutrality (hardware and operating system independent)

binary format, and is often (but not always) stored as a file, so this format is called a Class

file format. The class and interface representations are precisely defined in the Class file format, including in the platform-specific object file format.

conventions in some details

Related Documentation

Chapter 4. The class File Format

ClassFile {
    u4             magic;
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    u2             access_flags;
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

Next, we start to analyze how each field is identified

What does u4, u2 mean?

u indicates how many bytes are occupied by the number after the unsigned number

u4 occupies 4 bytes

u2 occupies 2 bytes

  1. magic takes 4 bytes, (ca fe ba be )

  1. minor_version subversion number, 2-byte number

  2. major_version The major version is a 2-byte number

  1. constant_pool_count constant pool number 2 bytes number

  1. constant_pool[constant_pool_count-1] constant pool array

  1. access_flags access flag 2 bytes number
  2. this_class index of the class name,
  3. super_class the name index of the superclass
  4. interfaces_count Number of interfaces
  5. interfaces[interfaces_count] array of interfaces
  6. fields_count number of fields
  7. fields[fields_count] array of fields
  8. methods_count the number of methods
  9. methods[methods_count] array of methods
  10. attributes_count the number of attributes
  11. attributes[attributes_count] array of attributes

How to unpack a class file by yourself

I believe that most of the time when they see the above protocol for the first time, they can look at it, but they have to parse out the meaning of each field by themselves.

I can't start,

  1. Read class file
FileInputStream in= new FileInputStream("d:/my.class");
  1. Read magic, (magic u4 occupies 4 bytes)
byte[] bytes=new byte[4];
       in.read(bytes);
  1. Reading minor_version u2 takes 2 bytes
byte[] minorByte=new byte[2];
       in.read(minorByte);
  1. Reading major_version u2 takes 2 bytes
byte[] majorVersion=new byte[2];
       in.read(majorVersion);

Seeing the above analysis, do you understand? In fact, it is still very regular, as long as you carefully read the protocol document (read it many times)

The final parsing class document is like this

ClassFile classFile = new ClassFile();

        PcBufferInputStream in = new PcBufferInputStream(new FileInputStream(fileName));
        classFile.setMagic(readMagic(in));
        classFile.setMinorVersion(readMinorVersion(in));
        classFile.setMajorVersion(readMajorVersion(in));
        classFile.setConstantPoolCount(readConstantPoolCount(in));
        classFile.setCpInfo(readCpInfo(in));
        classFile.setAccessFlags(readAccessFlags(in));
        classFile.setThisClass(readThisClass(in));
        classFile.setSuperClass(readSuperClass(in));
        classFile.setInterfacesCount(readInterfacesCount(in));
        // u2 interfaces interfaces_count
        classFile.setInterfaces(readInterfaces(in));
        // u2 fields_count
        classFile.setFieldsCount(readFieldsCount(in));
        // field_info fields fields_count
        classFile.setFields(readFields(in));
        // u2 methods_count 1
        // method_info methods methods_count
        classFile.setMethodsCount(readMethodsCount(in));
        classFile.setMethods(readMethods(in));
        // u2 attribute_count 1
        classFile.setAttributeCount(readAttributeCount(in));
        // attribute_info attributes attributes_count
        classFile.setAttributes(readAttributes(in));
        classFile.setPcRecord(recordMap);
        return classFile;

Guess you like

Origin blog.csdn.net/Trouvailless/article/details/124259190