Detailed explanation of the principle of dyld loading application startup

We all know that the entry function of APP is main(), and before the main() function is called, what is the loading process of APP? Next, let's analyze the APP loading process together.

1. Use breakpoints to track

  • First, we create a project, do not write any code, breakpoint at the main() function, you will see the situation as shown below:

     

    01

  1. From the above figure, we can see that in the call stack, we only see star and main, and the main thread is turned on, but nothing else. How can I see the detailed information of the call stack? We all know that there is a method that is called earlier than the main() function, that is, the load() function. At this time, write a load function in the controller and run it at a breakpoint, as shown in the following figure:

02

  1. Through the above figure, we can see a more detailed function call sequence, from _dyld_start on line 13 to dyld:notifySingle on line 3. This dyld guy appears most frequently, so what is dyld? What is it doing? Simply put, dyld is a dynamic linker that loads all libraries and executable files. Next, we will use the call relationship shown in Figure 2 to track where dyld is?

Two, dyld loading process analysis

1. First download the dyld source code .

2. Open the dyld source code project, and search for the start method called in dyldbootstrap according to the dyldbootstrap:start keyword in line 12 of Figure 2, as shown below:

 

3. The source code of the method is as follows. Next, we will analyze the key parts of the method:

uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[], intptr_t slide)
{
    // 读取macho文件的头部信息
    const struct macho_header* dyldsMachHeader =  (const struct macho_header*)(((char*)&_mh_dylinker_header)+slide);
    
    // 滑块,设置偏移量,用于重定位
    if ( slide != 0 ) {
        rebaseDyld(dyldsMachHeader, slide);
    }
    
    uintptr_t appsSlide = 0;
        
    // 针对偏移异常的监测
    dyld_exceptions_init(dyldsMachHeader, slide);
    
    // 初始化machO文件
    mach_init();

    // 设置分段保护,这里的分段下面会介绍,属于machO文件格式
    segmentProtectDyld(dyldsMachHeader, slide);
    
    //环境变量指针
    const char** envp = &argv[argc+1];
    
    // 环境变量指针结束的设置
    const char** apple = envp;
    while(*apple != NULL) { ++apple; }
    ++apple;

    // 在dyld中运行所有c++初始化器
    runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
    
    // 如果主可执行文件被链接-pie,那么随机分配它的加载地址
    if ( appsMachHeader->flags & MH_PIE )
        appsMachHeader = randomizeExecutableLoadAddress(appsMachHeader, envp, &appsSlide);
    
    // 传入头文件信息,偏移量等。调用dyld的自己的main函数(这里并不是APP的main函数)。
    return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple);
}

  • 3.1 In the parameters of the function, we see a parameter of macho_header. What is this? Mach-O is actually the abbreviation of Mach Object file format. It is an executable file format in mac and iOS, and has its own file format directory. The mach file given by Apple is as shown below:

     

    04

  • 3.2 First, we click into the macho_header structure to see its definition as follows:

struct mach_header_64 {
    uint32_t    magic;      /* 区分系统架构版本 */
    cpu_type_t  cputype;    /*CPU类型 */
    cpu_subtype_t   cpusubtype; /* CPU具体类型 */
    uint32_t    filetype;   /* 文件类型 */
    uint32_t    ncmds;      /* loadcommands 条数,即依赖库数量*/
    uint32_t    sizeofcmds; /* 依赖库大小 */
    uint32_t    flags;      /* 标志位 */
    uint32_t    reserved;   /* 保留字段,暂没有用到*/
};

 

  • 3.3 Here macho_header is to read the header information of the macho file. The header will contain some information about the binary file: such as byte order, architecture type, number of load instructions, etc. It can be used to quickly confirm some information, such as whether the current file is used for 32-bit or 64-bit, file type, etc. So where can I find the macho file? As shown below, we find macho and use MachOView to view:

     

    05

  • 3.4 The dark one above is the macho file, which is an executable file. Let's take a look at what header information it loads? This information will be passed to the next function. Here is a brief description of the Number of Load Commands number 22, which represents 22 library files. In LoadCommands, there is a corresponding relationship for loading libraries. Section is our data DATA, which contains code, constants and other data.

     

    06

  • 3.5 Summary: The star function is mainly to read the header information of the macho file and set the virtual address offset. The offset here is mainly used for redirection. The next step is to initialize the macho file for subsequent loading of library files and DATA data, then run the C++ initializer, and finally enter the main function of dyly.

  • 4. Next, we continue to trace. According to the call stack in Figure 2, we know that the dyld::_main method is called in the dyldbootstrap:star method, which is the main program that enters the dyld we mentioned above, as shown in the following figure:

    07

  • 4.1 We enter the method to continue tracking, intercept some of the sources as shown in the figure below, we found that there are several if judgments, here are setting environment variables, that is, if these environment variables are set, Xcode will print the relevant detailed information on the console :
    if ( sProcessIsRestricted )
            pruneEnvironmentVariables(envp, &apple);
        else
            checkEnvironmentVariables(envp, ignoreEnvironmentVariables);
        if ( sEnv.DYLD_PRINT_OPTS ) 
            printOptions(argv);
        if ( sEnv.DYLD_PRINT_ENV ) 
            printEnvironmentVariables(envp);
        getHostInfo();  
    
  • 4.2 When we set the relevant environment variables, Xcode will print the program-related directory, user level, inserted dynamic library, dynamic library path, etc., as shown in the following diagram:
  • 08

  • 4.3 After setting the environment variables, getHostInfo() will be called next to get the machO header to get the information of the current running architecture. The function code is as follows:
  • static void getHostInfo()
    {
    #if 1
        struct host_basic_info info;
        mach_msg_type_number_t count = HOST_BASIC_INFO_COUNT;
        mach_port_t hostPort = mach_host_self();
        kern_return_t result = host_info(hostPort, HOST_BASIC_INFO, (host_info_t)&info, &count);
        if ( result != KERN_SUCCESS )
            throw "host_info() failed";
        
        sHostCPU        = info.cpu_type;
        sHostCPUsubtype = info.cpu_subtype;
    #else
        size_t valSize = sizeof(sHostCPU);
        if (sysctlbyname ("hw.cputype", &sHostCPU, &valSize, NULL, 0) != 0) 
            throw "sysctlbyname(hw.cputype) failed";
        valSize = sizeof(sHostCPUsubtype);
        if (sysctlbyname ("hw.cpusubtype", &sHostCPUsubtype, &valSize, NULL, 0) != 0) 
            throw "sysctlbyname(hw.cpusubtype) failed";
    #endif
    }
    
  • 4.4 Next, look down, here will instantiate the macho file:
  •     try {
            // 实例化主程序,也就是machO这个可执行文件
            sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
            sMainExecutable->setNeverUnload();
            gLinkContext.mainExecutable = sMainExecutable;
            gLinkContext.processIsRestricted = sProcessIsRestricted;
            // 加载共享缓存库
            checkSharedRegionDisable();
        #if DYLD_SHARED_CACHE_SUPPORT
            if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion )
                mapSharedCache();
        #endif
    
  • 4.5 Enter the instantiated main program code as follows. After loading, an ImageLoader image loading class will be returned. This is an abstract class used to load classes in a specific executable file format. The dependent libraries and plug-in libraries required in the program will be created A corresponding image object, link these images, call the initialization method of each image, etc., including the initialization of the runtime.
  • {
        // isCompatibleMachO 是检查mach-o的subtype是否是当前cpu可以支持
        if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
            ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
    //将image添加到imagelist。所以我们在Xcode使用image list命令查看的第一个便是我们的machO
            addImage(image);
            return image;
        }
        
        throw "main executable not a known format";
    }
    
  • 4.6 Use the image list command to demonstrate the following figure. The first address 0x000000010401c000 you see is the address of the executable file macho.

     

     

  • 4.7 After instantiating the macho file, you will see a checkSharedRegionDisable() method, where the shared cache library is loaded. What is this shared cache library? In fact, we can understand it as a dynamic library shared by the system (Apple prohibits third parties from using dynamic libraries). For example, the most commonly used UIKit framework is in the shared cache library. For example, WeChat, QQ, Alipay, Tmall and other apps will use the UIKit framework. If every app loads UIKit, it will inevitably lead to memory shortage. So in fact, these apps will share a set of UIKit framework, and the corresponding methods in the UIKit framework are used in the apps, and dyld will use the corresponding resources for these apps. The following figure shows the framework library in the System library of the jailbroken phone, which also proves this point:

     

    Shared cache library

  • 5. Insert library: Let's continue to look at the remaining source code in this method. All inserted libraries will be loaded here. The code injection in the reverse direction is completed in this step. For the detailed code injection process of the framework, please see my article . There is an operation of sAllImages.size()-1, which actually excludes the main program.

  •     // load any inserted libraries
            if  ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
                for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib) 
                    loadInsertedDylib(*lib);
            }
            // record count of inserted libraries so that a flat search will look at 
            // inserted libraries, then main, then others.
            sInsertedDylibCount = sAllImages.size()-1;
    
    
    

    6. Link the main program: internally call the link method through the instance object of imageLoader to recursively load the dependent system libraries and third-party libraries.

            // link main executable
            gLinkContext.linkingMainExecutable = true;
            link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, ImageLoader::RPathChain(NULL, NULL));
            gLinkContext.linkingMainExecutable = false;
            if ( sMainExecutable->forceFlat() ) {
                gLinkContext.bindFlat = true;
                gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
            }
            result = (uintptr_t)sMainExecutable->getMain();
    

    7. Initialization function

    10

    8. Run the initialization program:

     

  • 8.1 Recursion: Load the dependent system libraries and third-party libraries we need.

     

    12

  • 9. The notifySingle function, which is a key function to establish contact with the runtime:

    13

  • 9.1 We found that the load_images method was called in the notifySingle function, clicked in and found that this is a function pointer, and the call to load_images was not found in it. Through the global search of the dyld file, it was not found. So at this time we infer that it is called at runtime, just because the objc runtime code is also open source, then we download the objc source code for analysis.
    void     (*notifySingle)(dyld_image_states, const ImageLoader* image);
    
  • 9.2 In objc_init we will find the call, here load_images.
  •   _dyld_objc_notify_register(&map_images, load_images, unmap_image);
    

    14

  • 9.3 Complete the call_load_methods call in load_images, here is the load method to load all class files and classification files:
  • load_images(const char *path __unused, const struct mach_header *mh)
    {
        // 如果这里没有+load方法,则返回时不带锁
        if (!hasLoadMethods((const headerType *)mh)) return;
    
        recursive_mutex_locker_t lock(loadMethodLock);
    
        // 发现load方法
        {
            mutex_locker_t lock2(runtimeLock);
            prepare_load_methods((const headerType *)mh);
        }
    
        // 加载所有load方法
        call_load_methods();
    }
    
  • 9.4 call_load_methods method call, in call_load_methods, call call_class_loads to load the load method of each class through the doWhile loop, and then load the classified loads method.
  • void call_load_methods(void)
    {
        static bool loading = NO;
        bool more_categories;
    
        loadMethodLock.assertLocked();
    
        // Re-entrant calls do nothing; the outermost call will finish the job.
        if (loading) return;
        loading = YES;
    
        void *pool = objc_autoreleasePoolPush();
    
        do {
            // 1. 循环调用所有类文件的laod方法
            while (loadable_classes_used > 0) {
                call_class_loads();
            }
    
            // 2.调用所有分类方法
            more_categories = call_category_loads();
    
            // 3. Run more +loads if there are classes OR more untried categories
        } while (loadable_classes_used > 0  ||  more_categories);
    
        objc_autoreleasePoolPop(pool);
    
        loading = NO;
    }
    
  • 9.5 According to the above calling sequence, we know that the load method in the class file is loaded first, and then the load method in the class file is loaded, as shown in the demonstration:

  • 15

  • 10. After calling notifySigin, we found that we continued to call doInitialization, doModInitFunctions will call the function of the _mod_init_func section of the machO file, which is the global C++ constructor we defined in the file.

    // let objc know we are about to initalize this image
    fState = dyld_image_state_dependents_initialized;
    oldState = fState;
    context.notifySingle(dyld_image_state_dependents_initialized, this);
    
    // initialize this image
    this->doInitialization(context);
    
  • 10.1 So through the calling sequence of the above code, we know that first class file load, then class file load, then C++ constructor, and finally enter our main program! The demonstration is as follows:

     

  • Through the above analysis, we start from the breakpoint, check the stack call sequence of the method, and trace the loading process of dyld step by step, which will undoubtedly expose the mystery before the main function call. You can also trace the APP yourself according to the above steps. The loading process will be even more impressive!

    Summary: Before the main() function is called, a lot of preparatory work is actually done, mainly the dynamic linker dyld is in charge. The core process is as follows:

    1. Program execution starts from _dyld_star

  • 1.1. Read the macho file information and set the virtual address offset for redirection.
  • 1.2. Call the dyld::_main method to enter the main program of the macho file.

2. Configure some environment variables

  • 2.1. The set environment variables are convenient for us to print out more information.
  • 2.1. Call getHostInfo() to get the machO header to get information about the current running architecture.

3. Instantiate the main program, which is the macho executable file.

4. Load the shared cache library.

5. Insert the dynamic cache library.

6. Link the main program.

7. Initialization function.

  • 7.1. After a series of initialization functions, the notifSingle function is finally called.
  • 7.2. This callback is a function load_images assigned when initialized by runtime _objc_init
  • 7.3. The call_load_methods function is executed in load_images, and the load method of the used class and classification is called cyclically.
  • 7.4. The doModInitFunctions function internally calls the constructor of the global C++ object, that is, functions such as _ _ attribute_ _((constructor)).

8. Return to the entry function of the main program, and start to enter the main() function of the main program.



Link: https://www.jianshu.com/p/1b9ca38b8b9f

Guess you like

Origin blog.csdn.net/wangletiancsdn/article/details/104740563