最近用纯Swift参照非常知名的Aspects写了个Aspect，是基于Runtime进行方法交换，正好之前听说过可以通过fishhook动态修改 C 语言函数，所有就研究了一下，但是要想看懂fishhook，需要先了解Mach-O，这一块一直是我的知识盲点，这一次索性花些时间一并消化一下。苹果源码查看这里。

Mach-O简介

Mach-O，是Mach object文件格式的缩写，是一种可执行文件、目标代码、共享程序库、动态加载代码和核心dump。是a.out格式的一种替代。Mach-O提供了更强的扩展性，并提升了符号表中信息的访问速度。

*
 * Constants for the filetype field of the mach_header
 */
#define	MH_OBJECT	0x1		/* relocatable object file */
#define	MH_EXECUTE	0x2		/* demand paged executable file */
#define	MH_FVMLIB	0x3		/* fixed VM shared library file */
#define	MH_CORE		0x4		/* core file */
#define	MH_PRELOAD	0x5		/* preloaded executable file */
#define	MH_DYLIB	0x6		/* dynamically bound shared library */
#define	MH_DYLINKER	0x7		/* dynamic link editor */
#define	MH_BUNDLE	0x8		/* dynamically bound bundle file */
#define	MH_DYLIB_STUB	0x9		/* shared library stub for static */
					/*  linking only, no section contents */
#define	MH_DSYM		0xa		/* companion file with only debug */
					/*  sections */
#define	MH_KEXT_BUNDLE	0xb		/* x86_64 kexts */

复制代码

我们看到Mach-O有多种文件类型，常见的格式:

可执行文件
objcet
- o 文件(目标文件)
- .a 静态库文件.其实就是N个.o文件的集合
DYLIB: 动态库文件
- dylib
- framework
动态连接器，dynamic linker
DSYM：分析APP崩溃信息

C 文件 —> 可执行文件

非常推荐 Mach-O 文件一，这个也是从这篇文章拿来的。

test.c 的 C 文件

 int main(){
     return 0;
 }
复制代码

编译一下 clang -c test.c，生成 test.o 文件
通过 file 命令查看一下 file test.o，可以看到，test.o 为 Mach-O 文件，object 文件 test.o: Mach-O 64-bit object x86_64
通过 clang 链接一下目标文件test.o clang test.o，text.c 就转变成一个 a.out 的可执行文件
执行./a.out，转换执行过程
执行clang -o test1 test.o，链接 test.0 目标文件，生成 test1 的可执行文件
执行clang -o test2 test.c，直接一次性将源文件生成 test2 的可执行文件

Mach-O结构

通过上图，可以看出Mach-O主要由以下三部分组成：

Mach-O头部（Mach Header）：描述了Mach-O的cpu架构、文件类型以及加载命令等信息。
加载命令（load command）：描述了文件中数据的具体组织结构，不同的数据类型使用不同的加载命令表示。
Data：Data中的每个段（segment）的数据都保存在这里，段的概念与ELF文件中段的概念类似。每个段都有一个或多个Section，它们存放了具体的数据与代码，主要包含代码、数据，例如符号表，动态符号表等等。

** 来用MachOView验证一下该示例的MachO文件结构：**

简单浏览mach-o可执行文件，具体可以分为几个部分：

文件头 mach64 Header
加载命令 Load Commands
文本段 __TEXT
数据段 __TEXT
动态库加载信息 Dynamic Loader Info
入口函数 Function Starts
符号表 Symbol Table
动态库符号表 Dynamic Symbol Table
字符串表 String Table

mach_header_64

/*
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
	uint32_t	magic;		/* mach magic number identifier */
	cpu_type_t	cputype;	/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;	/* type of file */
	uint32_t	ncmds;		/* number of load commands */
	uint32_t	sizeofcmds;	/* the size of all the load commands */
	uint32_t	flags;		/* flags */
	uint32_t	reserved;	/* reserved */
};

复制代码

magic：魔数，系统加载器通过该字段快速，判断该文件是用于32位or64位。
cputype：标识CPU的架构，比如ARM，X86，i386等等，该字段确保系统可以将适合的二进制文件在当前架构下运行。
cpusubtype：：具体的CPU类型，区分不同版本的处理器，比如arm64、armv7。
filetype：：说明该mach-o文件类型（可执行文件，库文件，核心转储文件，内核扩展，DYSM文件，动态库）。
ncmds：LoadCommands数量，每个LoadCommands代表了一种Segment的加载方式。
sizeofcmds：所有Load commands的总字节大小。
flags：标识二进制文件支持的功能，主要与系统的加载、链接有关。
reserved：保留字段。

Load commands

Load commands跟在mach_header之后。所有命令的总大小由mach_header中的sizeofcmds字段给出。oad commands必须有前两个字段cmd和cmdsize。cmd字段以该命令类型的常量填充。每个命令类型都有专门针对它的结构。cmdsize字段是特定加载命令结构的字节大小加上跟随它的任何一部分，这是加载命令（即节结构、字符串等）的一部分。为了前进到下一个加载命令，cmdsize可以被添加到当前加载命令的偏移量或指针中。32位架构的cmdsize必须是4字节的倍数，对于64位架构必须是8字节的倍数（这些永远是任何加载命令的最大对齐）。填充的字节必须为零。目标文件中的所有表也必须遵循这些规则，以便文件可以进行内存映射。否则，这些表的指针在某些机器上无法正常工作或根本无法正常工作。所有padding归零像对象将比较逐字节。

/*
 * The segment load command indicates that a part of this file is to be
 * mapped into the task's address space.  The size of this segment in memory,
 * vmsize, maybe equal to or larger than the amount to map from this file,
 * filesize.  The file is mapped starting at fileoff to the beginning of
 * the segment in memory, vmaddr.  The rest of the memory of the segment,
 * if any, is allocated zero fill on demand.  The segment's maximum virtual
 * memory protection and initial virtual memory protection are specified
 * by the maxprot and initprot fields.  If the segment has sections then the
 * section structures directly follow the segment command and their size is
 * reflected in cmdsize.
 */
struct segment_command { /* for 32-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT */
	uint32_t	cmdsize;	/* includes sizeof section structs */
	char		segname[16];	/* segment name */
	uint32_t	vmaddr;		/* memory address of this segment */
	uint32_t	vmsize;		/* memory size of this segment */
	uint32_t	fileoff;	/* file offset of this segment */
	uint32_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};
复制代码

cmd：Load commands的类型，所有这些加载命令由系统内核加载器直接使用。常见的有下面几种：
- LC_SEGMENT：表示这是一个段加载命令，需要将它加载到对应的进程空间上。
- LC_LOAD_DYLIB：这是一个需要动态加载的链接库，它使用dylib_command结构体表示。
- LC_MAIN：记录了可执行文件的主函数main()的位置，它使用entry_point_command结构体表示。
- LC_CODE_SIGNATURE：代码签名的加载命令，描述了Mach-O的代码签名信息，它属于链接信息，使用linkedit_data_command结构体表示。
cmdsize：Load command的大小。
segname[16]：16字节的段名字。
vmaddr：段的虚拟内存起始地址。
vmsize：段的虚拟内存大小。
fileoff：段在文件中的偏移量。
filesize：段在文件中的大小。
maxprot：段页面所需要的最高内存保护（4=r,2=w,1=x）。
initprot：段页面初始的内存保护。
nsects：段中包含section的数量。
flags：标识符。

Section数据

部分的 Segment （主要指的 __TEXT 和 __DATA）可以进一步分解为 Section。之所以按照 Segment -> Section 的结构组织方式，是因为在同一个 Segment 下的 Section，可以控制相同的权限，也可以不完全按照 Page 的大小进行内存对其，节省内存的空间。而 Segment 对外整体暴露，在程序载入阶段映射成一个完整的虚拟内存，更好的做到内存对齐。

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint64_t	addr;		/* memory address of this section */
	uint64_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};

复制代码

sectname[16]：比如__text、stubs，第一个是__text，就是主程序代码。
segname[16]：该section所属的segment，比如__TEXT。
addr：该section在内存的启始位置。
size：该section的大小。
offset：该section的文件偏移。
align：字节大小对齐。
reloff：重定位入口的文件偏移。
nreloc：需要重定位的入口数量。
flags：包含section的type和attributes。
reserved1：保留字段1 (for offset or index)。
reserved2：保留字段2 (for count or sizeof)。
reserved3：保留字段3。

段的命名规则是两个下划线紧跟着大写字母（如__TEXT），而section的命名则是两个下划线紧跟着小写字母（__text）。

下面列出段中可能包含的section：

__TEXT段:
__text, __cstring, __picsymbol_stub, __symbol_stub, __const, __litera14, __litera18;

__DATA段

__data, __la_symbol_ptr, __nl_symbol_ptr, __dyld, __const, __mod_init_func, __mod_term_func, __bss, __commom;

__IMPORT段

__jump_table, __pointers;

其中__TEXT段中的__text是实际上的代码部分；__DATA段的__data是实际的初始数据。

关于Mach-o文件格式就讲完了，如果对程序从加载到执行过程感兴趣可以看Mach-O文件格式和程序从加载到执行过程和趣探 Mach-O：加载过程，讲的很详细。

MachO 文件结构详解
 Mach-O 文件一
 Mach-O文件格式和程序从加载到执行过程
 iOS逆向基础Mach-O文件（1）
Mach-O 文件格式探索

转载于:https://juejin.im/post/5d060880f265da1b860885d7

Mach-O文件格式分析