Mach-O file format analysis

Recently wrote a pure Swift very well-known reference Aspects Aspect , is based on a method of exchanging Runtime, just before heard by fishhook dynamically modify the C language function, all you study a little, however, to understand fishhook, you need to Learn Mach-O, this one has been my knowledge blind spot, this time together simply take some time to digest it. Apple's source code to see here .

Mach-O Profile

Mach-O, is Mach object file format acronym, is an executable files, object code, shared libraries, and dynamically load code core dump. A.out format is an alternative. Mach-O provides greater scalability and improve the access speed of the symbol table information.

*
 * Constants for the filetype field of the mach_header
 */
#define	MH_OBJECT	0x1		/* relocatable object file */
#define	MH_EXECUTE	0x2		/* demand paged executable file */
#define	MH_FVMLIB	0x3		/* fixed VM shared library file */
#define	MH_CORE		0x4		/* core file */
#define	MH_PRELOAD	0x5		/* preloaded executable file */
#define	MH_DYLIB	0x6		/* dynamically bound shared library */
#define	MH_DYLINKER	0x7		/* dynamic link editor */
#define	MH_BUNDLE	0x8		/* dynamically bound bundle file */
#define	MH_DYLIB_STUB	0x9		/* shared library stub for static */
					/*  linking only, no section contents */
#define	MH_DSYM		0xa		/* companion file with only debug */
					/*  sections */
#define	MH_KEXT_BUNDLE	0xb		/* x86_64 kexts */

复制代码

We see the Mach-O has a variety of file types, common formats:

  1. executable file

  2. objcet

    • o files (object files)
    • .a static library file is actually a set of N .o files
  3. DYLIB: dynamic library files

    • dylib
    • framework
  4. Dynamic linker, dynamic linker

  5. DSYM: Analysis of APP crash information

C File -> executable file

Highly recommended Mach-O file a , this is brought from this article.

  1. test.c C files

     int main(){
         return 0;
     }
    复制代码
  2. Compiler that clang -c test.cgenerates test.o file

  3. By the order to view the file file test.o, you can see, test.o as Mach-O files, object files test.o: Mach-O 64-bit object x86_64

  4. What the target file test.o by clang links clang test.o, text.c it into an executable file a.out of

  5. Execute ./a.out, execute the conversion process

  6. Perform clang -o test1 test.o, link test.0 object file, an executable file of test1

  7. Performing clang -o test2 test.c, at one time, the source file is an executable file test2

Mach-O structure

By the figure, it can be seen Mach-O consists of the following three parts:

  • Mach-O head (Mach Header): Description of the Mach-O cpu architecture, file type and load commands.
  • Load command (load command): described specific organizational structure of the data file, using the different data types represent different load command.
  • Data: Data Data in each segment (segment) are stored here, a similar concept with the concept of the middle section of the ELF file. Each segment has one or more Section, they are stored specific data and code, comprising the main code, data, symbol table, for example, the dynamic symbol table and the like.

** MachO to use MachOView verify the file structure of this example: **

Simply browse mach-o executable file, specifically divided into several parts:

  • Header mach64 Header
  • Load command Load Commands
  • Text segment __TEXT
  • Data segment __TEXT
  • Dynamic load library information Dynamic Loader Info
  • Entry function Function Starts
  • Symbol Table Symbol Table
  • Dynamic library symbol table Dynamic Symbol Table
  • String Table String Table

mach_header_64

/*
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
	uint32_t	reserved;	/* reserved */
};

复制代码
  • Magic : magic number, the system loader quickly determine whether the file is used by the 32-bit field or64.
  • cputype : tag CPU architectures, such as ARM, X86, i386, etc., this field ensures that the system can be adapted to run binaries in the current architecture.
  • cpusubtype: : specific CPU type, to distinguish between different versions of processors, such as arm64, armv7.
  • filetype: : Description of the mach-o file type (executable files, libraries, the core dump file, the kernel extension, DYSM file, a dynamic library).
  • ncmds : LoadCommands number, each LoadCommands represents a way to load the Segment.
  • sizeofcmds : the total byte size of all the Load commands.
  • flags : identifies the binary file support functions, mainly related to the system load, related links.
  • Reserved : reserved field.

Load commands

Load commands following the mach_header. The total size of all the commands given by the sizeofcmds mach_header field. oad commands must have the first two fields cmd and cmdsize. cmd command type field filled with the constant. Each type has a special command for its structure. cmdsize load command field is a particular configuration of the size in bytes plus follow any part of it, which is part of the load command (i.e. yoke structure, string, etc.). To advance to the next load command, cmdsize may be added to the current load command offset or pointer. Cmdsize 32-bit architecture must be a multiple of 4 bytes, for a 64-bit architecture must be a multiple of eight bytes (which is always the maximum alignment any load command). Padded bytes must be zero. All tables in the target file must follow these rules, so that files can be memory mapped. Otherwise, the pointer of these tables do not work or do not work properly on some machines. All objects like zero padding will compare byte by byte.

/*
 * The segment load command indicates that a part of this file is to be
 * mapped into the task's address space.  The size of this segment in memory,
 * vmsize, maybe equal to or larger than the amount to map from this file,
 * filesize.  The file is mapped starting at fileoff to the beginning of
 * the segment in memory, vmaddr.  The rest of the memory of the segment,
 * if any, is allocated zero fill on demand.  The segment's maximum virtual
 * memory protection and initial virtual memory protection are specified
 * by the maxprot and initprot fields.  If the segment has sections then the
 * section structures directly follow the segment command and their size is
 * reflected in cmdsize.
 */
struct segment_command { /* for 32-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT */
	uint32_t	cmdsize;	/* includes sizeof section structs */
	char		segname[16];	/* segment name */
	uint32_t	vmaddr;		/* memory address of this segment */
	uint32_t	vmsize;		/* memory size of this segment */
	uint32_t	fileoff;	/* file offset of this segment */
	uint32_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};
复制代码
  • cmd : the Load Commands types, all of which are used by the command to load the kernel loader. Common are the following categories:
    • LC_SEGMENT: that this is a segment load command, you need to load it onto the corresponding process space.
    • LC_LOAD_DYLIB: This is a required dynamic link library loading, it uses dylib_command structure representation.
    • LC_MAIN: recording the main function of the executable file () position, which indicates the use entry_point_command structure.
    • LC_CODE_SIGNATURE: a load command code signing, a code signing information described in Mach-O, it belongs to the link information, using linkedit_data_command structure represented.
  • cmdsize : the Command of the Load size.
  • segname [16] : Name 16 byte segments.
  • vmaddr : virtual memory starting address of the segment.
  • vmsize : virtual memory size segment.
  • fileoff : Offset section in the file.
  • filesize : the size of the segment in the file.
  • maxprot enables : maximum segment pages require memory protection (4 = r, 2 = w , 1 = x).
  • initprot : page initial segment of memory protection.
  • nsects : the number of segments contained in the section.
  • flags : identifier.

Section data

Segment portion (mainly referring __TEXT and the __DATA) may be further decomposed into Section. The reason accordance Segment -> Section organization structure, because in the same Segment under the Section, can control the same privileges, may not be entirely in accordance with their memory Page size, space-saving memory. The overall exposure of foreign Segment, stage maps loaded into a complete virtual memory in the program, do a better memory alignment.

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint64_t	addr;		/* memory address of this section */
	uint64_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};

复制代码
  • sectname [16] : such __text, stubs, is a first __text, it is the main program code.
  • segname [16] : This section belongs segment, such __TEXT.
  • addr : This section starting location in memory of Kai.
  • size : the size of the section.
  • offset : the section file offset.
  • align = left : Size byte alignment.
  • reloff : relocation entry file offset.
  • nreloc : number of entries need to re-positioning.
  • flags : section contains the type and attributes.
  • Reserved1 : reserved field 1 (for offset or index).
  • RESERVED2 : reserved field 2 (for count or sizeof).
  • Reserved3 : 3 reserved field.

Naming segment is followed by an underscore two capital letters (such as __TEXT), and section name is an underscore followed by two lowercase letters (__text).

Listed below in paragraph section may contain:

  • __TEXT段:
    __text, __cstring, __picsymbol_stub, __symbol_stub, __const, __litera14, __litera18;
  • __DATA segment

__data, __la_symbol_ptr, __nl_symbol_ptr, __dyld, __const, __mod_init_func, __mod_term_func, __bss, __commom;

  • __IMPORT segment

__jump_table, __pointers;

Wherein the __TEXT __text segment is actual code portion; __data DATA __ segment is actual initial data.

About Mach-o file format to finish, and if the program can be seen from the load to be interested in the implementation process Mach-O file format and loaded into the execution process from the program and interesting exploration Mach-O: loading process , said very detailed.

Detailed MachO file structure
MachO a file
MachO file format and loaded into the execution program from
iOS base MachO inverse document (1)
to explore the file format MachO

Reproduced in: https: //juejin.im/post/5d060880f265da1b860885d7

Guess you like

Origin blog.csdn.net/weixin_34402090/article/details/93176114