Android C language _init function and constructor attribute and .init / .init_array section exploration

Programmers who understand the C language know that there are two ways to allow some code to be executed before any other function when the so or executable file is loaded. One is to define a void _init (void) function, and the other is to The constructor property is declared after the function. So is there any difference in the execution of these two methods? What is the order? People who understand the ELF file format will ask what is the difference in their position in the file? This article will answer these questions.

First of all, you need to know about the ELF file format, it will not be wordy here, people who do not understand can search it.

Here is an example, add the following lines to the C / C ++ code in your Android project:

........

#ifdef __cplusplus
extern "C" {
#endif

void _init(void){mlog_info("_init enter");}

#ifdef __cplusplus
}
#endif

void __attribute__((constructor)) myConstructor(void){mlog_info("myConstructor enter\n");}

........

I compiled the libcheckcert.so file here and put it on the phone to run the result:

........

12-13 11:04:46.603: I/BRIAN(12203): _init enter
12-13 11:04:46.603: I/BRIAN(12203): myConstructor enter

........
The _init function runs first, why is this happening? Anyone who understands ELF files knows that there are two sections, .init and .init_array, which are used for initialization when the ELF file is loaded, so what is the relationship between them and the _init function and constructor property? Below we need to use readelf and IDA pro to view, first readelf -d libcheckcert.so to view the dynamic section of ELF:

BriansdeMacBook-Pro:armeabi-v7a brian$ arm-linux-androideabi-readelf -d libcheckcert.so 

Dynamic section at offset 0x19b80 contains 27 entries:
  Tag        Type                         Name/Value
 0x00000003 (PLTGOT)                     0x1ad84
 0x00000002 (PLTRELSZ)                   1248 (bytes)
 0x00000017 (JMPREL)                     0x4200
 0x00000014 (PLTREL)                     REL
 0x00000011 (REL)                        0x31a8
 0x00000012 (RELSZ)                      4184 (bytes)
 0x00000013 (RELENT)                     8 (bytes)
 0x6ffffffa (RELCOUNT)                   390
 0x00000006 (SYMTAB)                     0x148
 0x0000000b (SYMENT)                     16 (bytes)
 0x00000005 (STRTAB)                     0x1028
 0x0000000a (STRSZ)                      6825 (bytes)
 0x00000004 (HASH)                       0x2ad4
 0x00000001 (NEEDED)                     Shared library: [liblog.so]
 0x00000001 (NEEDED)                     Shared library: [libc.so]
 0x00000001 (NEEDED)                     Shared library: [libm.so]
 0x00000001 (NEEDED)                     Shared library: [libstdc++.so]
 0x00000001 (NEEDED)                     Shared library: [libdl.so]
 0x0000000e (SONAME)                     Library soname: [libcheckcert.so]
 0x0000000c (INIT)                       0x4f9c
 0x0000001a (FINI_ARRAY)                 0x1a658
 0x0000001c (FINI_ARRAYSZ)               8 (bytes)
 0x00000019 (INIT_ARRAY)                 0x1a660
 0x0000001b (INIT_ARRAYSZ)               20 (bytes)
 0x0000001e (FLAGS)                      BIND_NOW
 0x6ffffffb (FLAGS_1)                    Flags: NOW
 0x00000000 (NULL)                       0x0
You can see that the addresses of the INIT and INIT_ARRAY sections are 0x4f9c and 0x1a660, open IDA pro to view the code at the corresponding location:

.text:00004F9C ; =============== S U B R O U T I N E =======================================
.text:00004F9C
.text:00004F9C ; Attributes: bp-based frame
.text:00004F9C
.text:00004F9C                 EXPORT _init
.text:00004F9C _init
.text:00004F9C
.text:00004F9C var_8           = -8
.text:00004F9C var_4           = -4
.text:00004F9C
.text:00004F9C                 STMFD           SP!, {R11,LR}
.text:00004FA0                 MOV             R11, SP
.text:00004FA4                 SUB             SP, SP, #8
.text:00004FA8                 LDR             R0, =(_GLOBAL_OFFSET_TABLE_ - 0x4FB4)
.text:00004FAC                 ADD             R0, PC, R0 ; _GLOBAL_OFFSET_TABLE_
.text:00004FB0                 MOV             R1, #4
.text:00004FB4                 LDR             R2, =(aBrian_1 - 0x1AD84)
.text:00004FB8                 ADD             R2, R2, R0 ; "BRIAN"
.text:00004FBC                 LDR             R3, =(a_initEnter - 0x1AD84)
.text:00004FC0                 ADD             R0, R3, R0 ; "_init enter"
.text:00004FC4                 STR             R0, [SP,#8+var_4]
.text:00004FC8                 MOV             R0, R1
.text:00004FCC                 MOV             R1, R2
.text:00004FD0                 LDR             R2, [SP,#8+var_4]
.text:00004FD4                 BL              __android_log_print
.text:00004FD8                 STR             R0, [SP,#8+var_8]
.text:00004FDC                 MOV             SP, R11
.text:00004FE0                 LDMFD           SP!, {R11,PC}
.text:00004FE0 ; End of function _init

init_array:0001A660 ; ===========================================================================
.init_array:0001A660
.init_array:0001A660 ; Segment type: Pure data
.init_array:0001A660                 AREA .init_array, DATA
.init_array:0001A660                 ; ORG 0x1A660
.init_array:0001A660                 DCD _Z13myConstructorv  ; myConstructor(void)
.init_array:0001A664                 DCD sub_4E90
.init_array:0001A668                 DCD sub_4EA8
.init_array:0001A66C                 DCD sub_4F04
.init_array:0001A670                 DCB    0
.init_array:0001A671                 DCB    0
.init_array:0001A672                 DCB    0
.init_array:0001A673                 DCB    0
.init_array:0001A673 ; .init_array   ends

It can be seen that the above code executes the function we defined. The .init section is the code of the _init function, and the .init_array section is an array of pointers. Each item corresponds to a block of code, which can do a series of initialization operations . So why is the code of the .init section executed before the code of the .init_array section? This depends on the linker code, which is located in the bionic / linker directory of AOSP. Here is only a short excerpt of the code:

void soinfo::CallConstructors() {

   ........

   // DT_INIT should be called before DT_INIT_ARRAY if both are present.
   CallFunction("DT_INIT", init_func);
   CallArray("DT_INIT_ARRAY", init_array, init_array_count, false);
}
You can see that the code in the .init section is executed first, and then each code block in the .init_array is executed in sequence.

At this point, everyone should understand the _init function, the constructor property, and the corresponding situation of the .init section and the .init_array section very clearly.

Here is a place I do n’t quite understand. Use readelf to view all the symbol information in the ELF. You can see that both .rel.dyn and .rel.plt have the myConstructor symbol, one type is R_ARM_ABS32 and one is R_ARM_JUMP_SLOT. In addition, if you look at myConstructor in IDA pro, you can find that its code body is in the .text section, but you can also find that there are also myConstructor definitions in the .plt and .got sections. In this case, every time you explicitly call myConstructor, you need to jump through PLT and then find the real address of myConstructor in the TEXT section from the GOT table to execute. But the address in .init_array is its real address in the TEXT section. Calling myConstructor during initialization does not need to pass the PLT and GOT tables. Don't understand why this is? Leave it for later.

Update: The above problem is due to the problem of the compiler. The ELF files compiled by different compilers are not the same. The situation I mentioned above is compiled by the LLVM compiler. If you use arm-linux-androideabi -* If the myConstructor symbol is only present in the .text section, it will not appear in .rel.dyn and .rel.plt.





Published 60 original articles · Like 44 · Visits 340,000+

Guess you like

Origin blog.csdn.net/beyond702/article/details/53607212