iOS 逆向----了解最基础的Mach-O文件

本文参考了这篇文章：https://mp.weixin.qq.com/s?__biz=MjM5NTQ2NzE0NQ==&mid=2247483817&idx=1&sn=9088b0d9b74c9c32e410e7cd74d7498e&chksm=a6f95b4f918ed259e502a87716267ebe29211cb18efe6e46569447479c0d2b32a4130f6760ac&scene=21#wechat_redirect
原创：老峰在微信公众号：iOSTips 上发表的。

在学习iOS逆向的过程中，发现在解密可执行文件dumpdecrypted砸壳原理时需要用到Mach-O相关知识，在动态库注入过程中也需要理解Mach-O可执行文件的文件结构，那么有必要系统学习记录Mach-O文件的组成结构。

Mach-O，是Mach object文件格式的缩写，是一种可执行文件、目标代码、共享程序库、动态加载代码和核心dump。是a.out格式的一种替代。Mach-O提供更多的可扩展性和更快的符号表信息存取。Mach-O应用在基于Mach核心的系统上，目前NeXTSTEP、Darwin、Mac OS X（iPhone）都是使用这种可执行文件格式。熟悉Mach-O文件格式，有助于了解苹果底层软件运行机制，更好的掌握dyld加载Mach-O的步骤，为自己动手开发Mach-O相关的加解密工具注入工具打下基础。

Mach-O主要由以下三部分组成：

Mach-O头部（Mach Header）。描述了Mach-O的cpu架构、文件类型以及加载命令等信息。
加载命令（load command）。描述了文件中数据的具体组织结构，不同的数据类型使用不同的加载命令表示。
Data。Data中的每个段（segment）的数据都保存在这里，段的概念与ELF文件中段的概念类似。每个段都有一个或多个Section，它们存放了具体的数据与代码，主要包含代码、数据，例如符号表，动态符号表等等。

Mach-O头部

与Mach-O文件格式有关的结构体，都可以直接或间接的在”mach-o/loader.h“文件中找到。具体方法，使用xcode，快速进入文件mach-o/loader.h。
针对32位与64位架构的cpu，分别使用了mach_header与mach_header_64结构体来描述Mach-O头部。
mach_header结构体的定义如下：

/*
 * The 32-bit mach header appears at the very beginning of the object file for
 * 32-bit architectures.
 */
struct mach_header {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t   cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
};

/* Constant for the magic field of the mach_header (32-bit architectures) */
#define MH_MAGIC    0xfeedface  /* the mach magic number */
#define MH_CIGAM    0xcefaedfe  /* NXSwapInt(MH_MAGIC) */

mach_header_64结构体的定义如下：

/*
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t   cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
    uint32_t    reserved;   /* reserved */
};

/* Constant for the magic field of the mach_header_64 (64-bit architectures) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */

64比32位的多了一个reserved字段，具体作用苹果官方文档中也没有给出，但是从字面意思上来讲，应该是预留字段之类的。

64位的magic字段官方文档给出的解释(32位同理)：
An integer containing a value identifying this file as a 64-bit Mach-O file. Use the constant MH_MAGIC_64 if the file is intended for use on a CPU with the same endianness as the computer on which the compiler is running. The constant MH_CIGAM_64 can be used when the byte ordering scheme of the target machine is the reverse of the host CPU.
一个整数，包含一个将该文件标识为64位MACH-O文件的值。如果文件是用于与编译器运行的计算机具有相同的字节数的CPU，则使用常数MH_MAGIC_64。当目标机的字节排序方案与主机CPU相反时，可以使用常数MH_CIGAM_64。

cputype与cpusubtype字段的值可以直接在”mach/machine.h”文件中找到全部取值，都是宏定义，写的也很清楚。这里就简单列举几个：

#define CPU_TYPE_ANY        ((cpu_type_t) -1)

#define CPU_TYPE_VAX        ((cpu_type_t) 1)
/* skip             ((cpu_type_t) 2)    */
/* skip             ((cpu_type_t) 3)    */
/* skip             ((cpu_type_t) 4)    */
/* skip             ((cpu_type_t) 5)    */
#define CPU_TYPE_MC680x0    ((cpu_type_t) 6)
#define CPU_TYPE_X86        ((cpu_type_t) 7)
#define CPU_TYPE_I386       CPU_TYPE_X86        /* compatibility */
#define CPU_TYPE_X86_64     (CPU_TYPE_X86 | CPU_ARCH_ABI64)

/* skip CPU_TYPE_MIPS       ((cpu_type_t) 8)    */
/* skip             ((cpu_type_t) 9)    */
#define CPU_TYPE_MC98000    ((cpu_type_t) 10)
#define CPU_TYPE_HPPA           ((cpu_type_t) 11)
#define CPU_TYPE_ARM        ((cpu_type_t) 12)
#define CPU_TYPE_ARM64          (CPU_TYPE_ARM | CPU_ARCH_ABI64)
#define CPU_TYPE_MC88000    ((cpu_type_t) 13)
#define CPU_TYPE_SPARC      ((cpu_type_t) 14)
#define CPU_TYPE_I860       ((cpu_type_t) 15)
/* skip CPU_TYPE_ALPHA      ((cpu_type_t) 16)   */
/* skip             ((cpu_type_t) 17)   */
#define CPU_TYPE_POWERPC        ((cpu_type_t) 18)
#define CPU_TYPE_POWERPC64      (CPU_TYPE_POWERPC | CPU_ARCH_ABI64)

ncmds指明了Mach-O文件中加载命令（load commands）的数量。

sizeofcmds字段指明了Mach-O文件加载命令（load commands）所占的总字节大小。

flags字段表示文件标志，它是一个含有一组位标志的整数，指明了Mach-O文件的一些标志信息。

扫描二维码关注公众号，回复： 2705170 查看本文章

这里给出官方文档地址，但是大部分字段都没有过多的解释。
https://developer.apple.com/documentation/kernel/mach_header?language=objc
https://developer.apple.com/documentation/kernel/mach_header_64?language=objc

我们可以随便找一个可执行文件，然后使用otool命令查看他的mach_header

otool -h 可执行文件
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedface      12          9  0x00           2    94       8524 0x00210085
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777228          0  0x00           2    94       9392 0x00210085

这里的输出了两个header，一个是32位的，一个是64位的。可以根据magic判断谁是谁。

Load Command加载命令

在mach_header之后的是Load Command加载命令，这些加载命令在Mach-O文件加载解析时，被内核加载器或者动态链接器调用，基本的加载命令的数据结构如下(该结构体也在mach-o/loader.h文件中)：

/*
* The load commands directly follow the mach_header. The total size of all
* of the commands is given by the sizeofcmds field in the mach_header. All
* load commands must have as their first two fields cmd and cmdsize. The cmd
* field is filled in with a constant for that command type. Each command type
* has a structure specifically for it. The cmdsize field is the size in bytes
* of the particular load command structure plus anything that follows it that
* is a part of the load command (i.e. section structures, strings, etc.). To
* advance to the next load command the cmdsize can be added to the offset or
* pointer of the current load command. The cmdsize for 32-bit architectures
* MUST be a multiple of 4 bytes and for 64-bit architectures MUST be a multiple
* of 8 bytes (these are forever the maximum alignment of any load commands).
* The padded bytes must be zero. All tables in the object file must also
* follow these rules so the file can be memory mapped. Otherwise the pointers
* to these tables will not work well or at all on some machines. With all
* padding zeroed like objects will compare byte for byte.
*/
struct load_command {
uint32_t cmd; /* type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};

如果学习过计算机组成原理知识的同学，这段英文注释应该不难理解。

加载命令直接跟随mach_header。所有命令的总大小由mach_header中的sizeofcmds字段给出。所有的加载命令必须具有它们的前两个字段cmd和cmdsize。cmd字段以该命令类型的常量填充。每个命令类型都有专门针对它的结构。cmdsize字段是特定加载命令结构的字节大小加上跟随它的任何一部分，这是加载命令（即节结构、字符串等）的一部分。为了前进到下一个加载命令，cmdsize可以被添加到当前加载命令的偏移量或指针中。32位架构的cmdsize必须是4字节的倍数，对于64位架构必须是8字节的倍数（这些永远是任何加载命令的最大对齐）。填充的字节必须为零。对象文件中的所有表也必须遵循这些规则，因此该文件可以是内存映射的。否则，这些表的指针将不能很好地工作在某些机器上。所有填充零的类对象将比较字节为字节。

segment_command结构体，64位跟32位的字段一模一样，只是结构体的名称不同：

struct segment_command { /* for 32-bit architectures */
    uint32_t    cmd;        /* LC_SEGMENT */
    uint32_t    cmdsize;    /* includes sizeof section structs */
    char        segname[16];    /* segment name */
    uint32_t    vmaddr;     /* memory address of this segment */
    uint32_t    vmsize;     /* memory size of this segment */
    uint32_t    fileoff;    /* file offset of this segment */
    uint32_t    filesize;   /* amount to map from the file */
    vm_prot_t   maxprot;    /* maximum VM protection */
    vm_prot_t   initprot;   /* initial VM protection */
    uint32_t    nsects;     /* number of sections in segment */
    uint32_t    flags;      /* flags */
};

cmd 是load command的类型。常见的几个有：
LC_SEGMENT:表示这好似一个段加载命令，需要将它加载到对应的进程空间上
LC_LOAD_DYLIB:这是一个需要动态加载的链接库，它使用dylib_command结构体表示
LC_MAIN:记录了可执行文件的主函数main()的位置，它使用entry_point_command结构体表示
LC_CODE_SIGNATURE:代码签名的加载命令，描述了Mach-O的代码签名信息，它属于链接信息，使用linkedit_data_command结构体表示
cmdsize 代表load command的大小(0×58个字节)。
segname 16字节的段名字，当前是__PAGEZERO。
vmaddr 段的虚拟内存起始地址
vmsize 段的虚拟内存大小
fileoff 段在文件中的偏移量
filesize 段在文件中的大小
maxprot 段页面所需要的最高内存保护（4=r,2=w,1=x）
initprot 段页面初始的内存保护
nsects 段中包含section的数量
flags 其他杂项标志位

使用下面的命令，可以查看全部的Load Command:

otool -l 可执行文件

Section数据

当一个段包含多个节区时，节区信息会以数组形式紧随着存储在段加载命令后面。节区使用结构体section表示（64位使用section_64表示），定义如下：

struct section { /* for 32-bit architectures */
    char        sectname[16];    /* name of this section */
    char        segname[16];    /* segment this section goes in */
    uint32_t    addr;        /* memory address of this section */
    uint32_t    size;        /* size in bytes of this section */
    uint32_t    offset;        /* file offset of this section */
    uint32_t    align;        /* section alignment (power of 2) */
    uint32_t    reloff;        /* file offset of relocation entries */
    uint32_t    nreloc;        /* number of relocation entries */
    uint32_t    flags;        /* flags (section type and attributes)*/
    uint32_t    reserved1;    /* reserved (for offset or index) */
    uint32_t    reserved2;    /* reserved (for count or sizeof) */
};

sectname 第一个是__text ,就是主程序代码
segname 该section所属的 segment名，第一个是__TEXT
addr 该section在内存的启始位置，0xa588。
size 该section的大小，0x84a
offset 该section的文件偏移，28116 0x6dd4
align 字节大小对齐，4
reloff 重定位入口的文件偏移，0
nreloc 需要重定位的入口数量，0
flags 包含section的type和attributes
结构中的最后2项保留用。

段的命名规则是两个下划线紧跟着大写字母（如__TEXT），而section的命名则是两个下划线紧跟着小写字母（__text）。

下面列出段中可能包含的section：
__TEXT段:
__text, __cstring, __picsymbol_stub, __symbol_stub, __const, __litera14, __litera18;

__DATA段
__data, __la_symbol_ptr, __nl_symbol_ptr, __dyld, __const, __mod_init_func, __mod_term_func, __bss, __commom;

__IMPORT段
__jump_table, __pointers;

其中__TEXT段中的__text是实际上的代码部分；__DATA段的__data是实际的初始数据。