Mach-O calculating the symbol table in the memory address, the address of the string table

KSCrash is to capture a frame for collapse iOS platform, recently read some of its source, in KSDynamicLinkera file, the function code is as follows:

/** Get the segment base address of the specified image.
 *
 * This is required for any symtab command offsets.
 *
 * @param idx The image index.
 * @return The image's base address, or 0 if none was found.
 */
static uintptr_t segmentBaseOfImageIndex(const uint32_t idx)
{
    const struct mach_header* header = _dyld_get_image_header(idx);
    
    // Look for a segment command and return the file image address.
    uintptr_t cmdPtr = firstCmdAfterHeader(header);
    if(cmdPtr == 0)
    {
        return 0;
    }
    for(uint32_t i = 0;i < header->ncmds; i++)
    {
        const struct load_command* loadCmd = (struct load_command*)cmdPtr;
        if(loadCmd->cmd == LC_SEGMENT)
        {
            const struct segment_command* segmentCmd = (struct segment_command*)cmdPtr;
            if(strcmp(segmentCmd->segname, SEG_LINKEDIT) == 0)
            {
                return segmentCmd->vmaddr - segmentCmd->fileoff;
            }
        }
        else if(loadCmd->cmd == LC_SEGMENT_64)
        {
            const struct segment_command_64* segmentCmd = (struct segment_command_64*)cmdPtr;
            if(strcmp(segmentCmd->segname, SEG_LINKEDIT) == 0)
            {
                return (uintptr_t)(segmentCmd->vmaddr - segmentCmd->fileoff);
            }
        }
        cmdPtr += loadCmd->cmdsize;
    }
    
    return 0;
}

This function is invoked so:

const uintptr_t segmentBase = segmentBaseOfImageIndex(idx) + imageVMAddrSlide;

0 confused scene

There will be more of a image segment, the parameter idxpassed is image of the index, if the return is a segment base, then which segment?

Some would say that the comment was not to say that non-voice returns 0, it means that the image base. But in principle vmaddr - fileoffwe do not receive image base (later have to explain).

While being invoked, the plus shift caused by the ASLR, assigned to the segmentBase.

In fishhook , there is such a line of code:

uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;

Not to consider the slide caused by ASLR, then the top is mentioned vmaddr - fileoff, where the variables are named linkedit_base.

The so-called KSCrash segmentBaseand fishhook so-called linkedit_base, in the end what is meant? If you refer to the __LINKEDIT end real address in memory that should be vmaddr + ASLR偏移my son.

In the process of searching for information, I read a lot of blog, information, explanation for this piece, either did not mention, either in passing or wrong. Some believe that this value is __LINKEDIT segment base address in memory, some believe that the current image in memory base address.

1 Unveiled

1.1 Pre-knowledge

In understanding what is before this value in the end, we need some pre-knowledge.

  • Mach-O file structure
  • Virtual Memory
  • ASLR

Below we simply say Mach-O file.

Mach-O

We know that the process is the result of an executable file in memory load obtained, but is a kind of macOS Mach-O executable file format platform.

Mach-O file is divided into three areas Header, Load commands, Data. Load commands where the command area of ​​guidance on how to set up and load the binary data. Below are a few we care about 32-bit platforms:

instruction Data structure corresponding description
LC_SEGMENT segment_command Defines a segment of this document, Mach-O when the file is loaded into this segment will be mapped to a corresponding address space. Need to pay attention, segment_commandthere is a segname, through segnameto find the specified segment.
LC_SYMTAB symtab_command Specifies the symbol table of this document. symtab_commandSymbol table contains the offset in the file, the number of offset symbols in the string table file, the size of the string table.

segment_command code is as follows:

struct segment_command { /* for 32-bit architectures */
    uint32_t    cmd;        /* LC_SEGMENT */
    uint32_t    cmdsize;    /* includes sizeof section structs */
    char        segname[16];    /* segment name */
    uint32_t    vmaddr;     /* memory address of this segment */
    uint32_t    vmsize;     /* memory size of this segment */
    uint32_t    fileoff;    /* file offset of this segment */
    uint32_t    filesize;   /* amount to map from the file */
    vm_prot_t   maxprot;    /* maximum VM protection */
    vm_prot_t   initprot;   /* initial VM protection */
    uint32_t    nsects;     /* number of sections in segment */
    uint32_t    flags;      /* flags */
};

For each segment, the process virtual memory setting process is the appropriate content loaded into memory, is preloaded at filesize bytes of virtual memory address to vmaddr from fileoff Mach-O files occupy vmsize bytes. ** need to pay attention, some segment of it, vmsize may be greater than the filesize, as __Data, __ LINKEDIT. **

In the discussion behind, we need to care about is segnamefor the __LINKEDITsegment. __LINKEDIT segment used by the dyld, contains a symbol table, string table, and other data.

symtab_command code is as follows:

struct symtab_command {
    uint32_t    cmd;        /* LC_SYMTAB */
    uint32_t    cmdsize;    /* sizeof(struct symtab_command) */
    uint32_t    symoff;     /* symbol table offset */
    uint32_t    nsyms;      /* number of symbol table entries */
    uint32_t    stroff;     /* string table offset */
    uint32_t    strsize;    /* string table size in bytes */
};

In symtab_command, the symoffsymbol table in the Mach-O file offset, stroffit is offset in the string table Mach-O file.

1.2 Secret

We can use MachOView to open a Mach-O files, observe LC_SEGMENT (__ LINKEDIT), LC_SYMTAB. Due to space limitations, not here screenshots observed. But you should note that the symbol table, the position of the Mach-O string table file located __LINKEDIT paragraph, which also verified introduction __LINKEDIT upper segment.

Let backwards from the symbol table address in the virtual memory on top of that so-called segmentBase, linkedit_baseto see

Mach-O Map

Let's ignore ASLR, a dark gray background indicates that the figure is virtual memory, __ TEXT segment, __ DATA segment we do not care, the figure does not reflect.

sym_vmaddr refers to the symbol table in virtual memory address and the offset in segment __LINKEDIT symbol table in the virtual memory, i.e. sym_vmaddr - vmaddr, its offset MachO file, i.e. symoff - fileoffequal.

That is sym_vmaddr - vmaddr = symoff - fileoff,
vmaddr to the right, that is,sym_vmaddr = symoff - fileoff + vmaddr

What you found?

Then push the top:
subtracting the symbol table offset symoff: sym_vmaddr - symoff = vmaddr - fileoff(Formula 1),
part of the right side of the equal sign of Formula 1 plus the offset Slide ASLR: vmaddr - fileoff + slide, so-called segmentBase, linkedit_base.

At this point, the truth.

reference

  • Depth analysis of Mac OS X & iOS operating system

  • In-depth understanding of computer systems

  • Mach-O File Format

Guess you like

Origin www.cnblogs.com/xjshi/p/11595234.html