In-depth understanding of the linux kernel-system startup

Prehistoric Times: BIOS

A computer is almost useless the moment it is powered on because the RAM chips contain random data and there is no operating system running yet. At the beginning of boot, there is a special hardware circuit that generates a RESET logic value on a pin of the CPU. After RESET is generated, some registers of the processor (including cs and eip) are set to fixed values ​​and the code found at physical address 0xfffffffO is executed. The hardware maps this address to a read-only, persistent memory chip, which is usually called ROM (Read-Only Memory). The assembly stored in ROM is usually called the Basic Input/Output System (BIOS) in the 80x86 system because it includes several interrupt-driven low-level processes. All operating systems must initialize computer hardware devices through these processes when starting up. Some operating systems, such as Microsoft's MS-DOS, rely on the BIOS to implement most system calls.

Once Linux enters protected mode (see the "Segmentation in Hardware" section of Chapter 2), it no longer uses the BIOS, but provides its own device driver for each hardware device on the computer. In fact, because the BIOS process must run in real mode, functions cannot be shared between the two, even if it would be beneficial. The BIOS uses real-mode addresses because these are the only ones available when the computer powers up. A real-mode address consists of a seg segment and an off offset. The corresponding physical address can be calculated like this: seg *16+off.
Therefore, the CPU addressing circuit does not need a global descriptor table, a local descriptor table, or a page table to convert logical addresses into physical addresses. Obviously, the code that initializes the GDT, LDT, and page tables must be run in real mode. Linux must use the BIOS during the boot phase. At this time, Linux must obtain the kernel image from the disk or other external device. The BIOS boot process actually performs the following 4 operations:

  1. A series of tests performed on computer hardware to determine what equipment is present and whether it is working properly.
    This stage is usually called POST (Power-OnSelf-Test, power-on self-test). During this stage, some information will be displayed, such as the BIOS version number. Today's 80x86, AMD64 and Itanium computers use the Advanced Configuration and Power Interface (ACPI) standard. In an ACPI-compliant BIOS, the startup code will create several tables to describe the hardware devices in the current system. The format of these tables is independent of the device manufacturer and can be read by the operating system to obtain information on how to call these devices.
  2. Initialize hardware devices. This stage is very important in modern PCI-based architectures because it ensures that all hardware device operations will not cause conflicts with IRQ lines and I/O ports. At the end of this stage, a list of all PCI devices installed in the system will be displayed.
  3. Search for an operating system to boot. In fact, depending on the BIOS settings, this process may attempt to access (in a user-defined order) the first sector (boot sector) of the floppy disk, hard disk, and CD-ROM in the system.
  4. As long as a valid device is found, the contents of the first sector are copied to the RAM starting from physical address 0x00007c00, and then jump to this address to start executing the code just loaded. The rest of this appendix will take you through the entire process from the most primitive starting point to running a Linux system.

Ancient Times: Bootloader

The boot loader is a program called by the BIOS to load the operating system's kernel image into RAM. Let's briefly sketch how the boot loader works in IBM's PC architecture. In order to boot from a floppy disk, the instructions stored in the first sector must be loaded into RAM and executed; these instructions then copy all other sectors containing the kernel image into RAM.

The implementation of booting from the hard drive is a little different. The first sector of the hard disk is called the Master Boot Record (MBR). This sector includes the partition table (Note 1) and a small program. This small program is used to load the partition where the booted operating system is located. first sector.

Operating systems such as Microsoft Windows 98 use an active flag contained in the partition table to identify this partition (Note 2). According to this method, only those operating systems whose kernel images are stored in the active partition can be booted. As we will see later, Linux is much more flexible in that it replaces the imperfect program in the MBR with a clever boot loader that allows the user to choose which operating system to boot.

Kernel images in earlier versions of Linux (up to the 2.4 series) had a minimal bootloader in the first 512 bytes, so copying a kernel image in the first sector would make the floppy bootable. But in Linux 2.6 there is no such boot loader, so to boot from a floppy disk, a suitable boot loader must be stored in the first disk sector. Now booting from a floppy disk is very similar to booting from a hard disk or CD-ROM.

Boot Linux from disk

Booting the Linux kernel from disk requires a two-step boot loader. In the 80x86 system, the well-known Linx boot loader is called LInux LOader (LILO). There are indeed some bootloaders for the 80x86 system, such as the widely used GRand Unified Bootloader (GRUB). GRUB is more advanced than LILO because it can recognize multiple disk-based file systems and can read parts of the boot program from files. Of course, there are dedicated boot loaders for all the architectures supported by Linux.

LILO may be installed on the MBR (instead of the small program that loads the active boot sector), or it may be installed on the boot sector of each disk partition. In both cases, the end result is the same: when the loader is executed during the boot process, the user can choose which operating system to load. In fact, the LILO bootloader is split into two parts because otherwise it would be too large to fit into a single sector. The MBR or partition boot sector contains a small boot loader program that is loaded by the BIOS into RAM starting at address 0×00007c00. This small program then moves itself to address 0x00096a00, builds the real mode stack (0x00098000~0x000969ff), and loads the second part of LILO into RAM starting at address 0x00096c00.

The second part sequentially reads the mapping table of available operating systems from the disk and provides the user with a prompt so that the user can select an operating system. Finally, the user selects the kernel to be loaded (or after a delay allows LILO to select a default value). The boot loader can copy the boot sector of the corresponding partition to RAM and execute it, or copy the kernel image directly to RAM. Assuming that the Linux kernel image must be imported, the LILO bootloader relies on BIOS routines, which mainly perform the following steps:

  1. Calling a BIOS process displays "Loading" information.
  2. Call a BIOS procedure to load the initial part of the kernel image from disk, that is, store the first 512 bytes of the kernel image into RAM starting at address 0x00090000, and store the code for the setup() function (see below) starting at address 0x00090200 into RAM.
  3. Call a BIOS procedure to load the rest of the kernel image from the disk and place the kernel image from low address 0x00010000 (for small kernel images compiled with make zImage) or from high address 0x00100000 (for small kernel images compiled with make bzImage) large kernel image) in RAM. In the following discussion, we will refer to the kernel image as being "low-loaded" into RAM or "high-loaded" into RAM, respectively. Although support for large kernel images is essentially the same as other boot modes, it places data at different physical memory addresses to avoid the ISA black hole problem introduced in the "Physical Memory Layout" section of Chapter 2.
  4. Jump to setup() code.

Medieval Times: setup() function

The code for the setup() assembly language function is placed by the linker at offset 0x200 in the kernel image file. The bootloader can therefore easily locate the setup() code and copy it into RAM starting at physical address 0x00090200. The setup() function must initialize the hardware devices in the computer and establish an environment for the execution of the kernel program.

Although the BIOS has initialized most hardware devices, Linux does not rely on the BIOS, but re-initializes the device in its own way to enhance portability and robustness. setup() essentially does the following

  1. On ACPI-compliant systems, it calls a BIOS routine to build the system's physical memory layout table in RAM (the table can be seen in the boot kernel information by searching for the "IOS-e820" tag). On earlier systems, it called a BIOS routine that returned the system's available memory.
  2. Set the keyboard repeat delay and rate (when the user presses a key for more than a certain time, the keyboard device repeatedly sends the corresponding keyboard code to the CPU).
  3. Initialize the video card.
  4. Reinitialize the disk controller and detect hard disk parameters.
  5. Check the IBM Micro Channel Bus (MCA).
  6. Check PS/2 pointing device (bus mouse).
  7. Check for Advanced Power Management (APM) BIOS support.
  8. If the BIOS supports Enhanced Disk Drive Service (EDD), it calls the corresponding BIOS process to establish the system's available hard disk table in RAM (the information in the table can be viewed through the firmwareledd directory of the sysfs special file system).
  9. If the kernel image is low-loaded into RAM (at physical address 0x00010000), move it to physical address 0x00001000.
    On the other hand, if the kernel image is loaded into RAM, it does not need to be moved. This step is necessary because in order to store the kernel image on the floppy disk and save boot time, the kernel image stored on the disk is compressed, and the decompression program needs some free space as a temporary buffer (in RAM next to the kernel image location).
  10. Set the A20 pin of the 8042 keyboard controller. The A20 pin was introduced in the 80286 system for compatibility with the physical address of the ancient 8088 microprocessor. Unfortunately, the A20 pin must be set correctly before switching to protected mode, otherwise bit 21 of every physical address will be seen as 0 by the CPU. Setting the A20 pin is a nuisance.
  11. Create a temporary interrupt descriptor table (IDT) and a temporary global descriptor table (GDT).
  12. If necessary, reset the floating point unit (FPU).
  13. Rewrite the Programmable Interrupt Controller (PIC) to mask all interrupts but retain IRQ2, which is a cascaded interrupt between the two PICs.
  14. Switch the CPU from real mode to protected mode by setting the PE bit in the cr0 status register. The PG bit in the cr0 status register is cleared to 0, so paging is not enabled yet.
  15. Jump to startup_32() assembly language function.

Renaissance: startup_32() function

There are two different startup_32() functions, we are referring here to the one implemented in the arch/i386/boot/compressed/head.S file. After setup() ends, startup_32() has been moved to physical address 0x00100000 or 0x00001000, depending on whether the kernel image is loaded high or low into RAM. This function does the following:

  1. Initialize segment registers and a temporary stack.
  2. Clear all bits in the eflags register.
  3. Fill the kernel's uninitialized data areas identified by the _edata and _end symbols with zeros (see the "Physical Memory Layout" section in Chapter 2).
  4. Call the decompress_kernel() function to decompress the kernel image. First the "UncompressingLinux..." message is displayed. After the kernel image is decompressed, the message "OK, booting the kernel." is displayed. If the kernel image is low-loaded, the unpacked kernel is placed at physical address 0x00100000. Otherwise, if the kernel image is highly loaded, the decompressed kernel is placed in a temporary buffer located after the compressed image. The decompressed image is then moved to its final location starting at physical address 0x00100000.
  5. Jump to physical address 0x00100000.
    The unpacked kernel image starts with another startup_32() function contained in arch/i386/kernel/head.S. Using the same name for these two functions will not cause any problems (except for confusing the reader), because both functions will jump to their own starting physical address for execution.

The second startup_32() function establishes the execution environment for the first Linux process (process 0). This function does the following:

  1. Initialize segment registers to final values
  2. Fill the bss segment of the kernel with 0 (see the section "Program Segment and Process Memory Area" in Chapter 20).
  3. Initialize the temporary kernel page table contained in swapper_pg_dir, and initialize pg0 so that the linear address maps consistently to the same physical address, which has been explained in the "Kernel Page Table" section of Chapter 2.
  4. Store the address of the page global directory in the cr3 register and enable paging by setting the PG bit in the cr0 register.
  5. Establish a kernel-mode stack for process 0 (see the "Kernel Threads" section in Chapter 3).
  6. This function clears all bits of the eflags register again.
  7. Call setup_idt() to fill the IDT with an empty interrupt handler (see the "Preliminary Initialization of IDT" section in Chapter 4).
  8. Put the system parameters obtained from the BIOS and the parameters passed to the operating system into the first page frame (see the "Physical Memory Layout" section in Chapter 2).
  9. Identify the processor model.
  10. Fill the gdt r and idt r registers with the addresses of the GDT and IDT tables.
  11. Jump to the start_kernel() function.

Modern: start_kernel() function

The start_kernel() function completes the initialization of the Linux kernel. Almost every day kernel components are initialized by this function, we only mention a few of them:

  1. Call the sched_init() function to initialize the scheduler (see Chapter 7).
  2. Call the build_all_zonelists() function to initialize the memory management area (see the "Memory Management Area" section in Chapter 8).
  3. Call the page_alloc_init() function to initialize the buddy system allocator (see the "Buddy System Algorithm" section in Chapter 8).
  4. Call the trap_init() function (see the "Exception Handling" section in Chapter 4) and the init_IRQ() function (see the "IRQ Data Structure" section in Chapter 4) to complete the IDT initialization.
  5. Call the softirq_init() function to initialize TASKLET_SOFTIRQ and HI_SOFTIRQ (see the "Softirq" section in Chapter 4).
  6. Call the time_init() function to initialize the system date and time (see the "Linux Timing Architecture" section in Chapter 6).
  7. Call the kmem_cache_init() function to initialize the slab allocator (see the "General and Private Cache" section in Chapter 8).
  8. Call the calibrate_delay() function to determine the speed of the CPU clock (see the "Delay Function" section in Chapter 6).
  9. Call the kernel_thread() function to create a kernel thread for process 1. As we have described in the "Kernel Threads" section of Chapter 3, this kernel thread will create other kernel threads and execute the /sbin/init program. After start_kernel() starts executing, the "Linux version2.6.11..." message will be displayed. In addition, many other messages will be displayed at the final stage of the init program and kernel thread execution. Finally, a familiar login prompt will appear on the console (if the XWindow system is started at startup, the login prompt will appear in a graphical window), informing the user that the Linux kernel has been started and is now running. .

module

Do you want to use modules?

When systems programmers want to add new functionality to the Linux kernel, they face a dilemma: Should they write new code and compile it as a module, or should they statically link the code into the kernel? Typically, systems programmers tend to implement new code as a module. Because modules can be linked as needed, the kernel does not become bloated with hundreds of rarely used programs, as we will see later. Almost every high-level component of the Linux kernel—file systems, device drivers, executable formats, network layers, and more—can be compiled as a module.

The distribution version of Linux makes full use of modules to comprehensively support a variety of hardware devices. For example, dozens of sound card driver modules will be placed in a certain directory in the release version, but only one of the sound card driver modules will be effectively loaded on a certain computer. However, some Linux code must be statically linked, meaning that the corresponding components are either included in the kernel or not compiled at all. Typically, this occurs when a component requires modification to some data structure or function that is statically linked in the kernel. For example, suppose a component must introduce a new field in a process descriptor. Linking a module does not modify an already defined data structure such as task_struct, because even if the module uses a modified version of its data structure, all statically linked code still sees the original version, making data corruption prone to data corruption. A partial solution to this problem is to "statically" add new fields to the process descriptor so that the kernel component can use these fields regardless of how the component is linked. However, if the kernel component is never used, duplicating these extra fields in every process descriptor is a waste of memory. If the new kernel component significantly increases the size of the process descriptor, then it is possible to obtain better system performance by adding the required fields in this data structure only if the new kernel component is statically linked to the kernel.

As another example, consider a kernel component that replaces statically linked code. Obviously, such a component cannot be compiled as a module, since the kernel cannot modify the machine code already in RAM when linking the module. For example, it is impossible for the system to link a module that changes the method of page frame allocation, because the companion system functions are always statically linked into the kernel (Note 1). The kernel has two main tasks for module management. The first task is to ensure that other parts of the kernel can access the module's global symbols, such as the entry point to the module's main function. The module must also know the addresses of these symbols in the kernel and other modules. Therefore, when linking modules, the reference relationships between modules must be resolved.

The second task is to record the usage of the module so that the module cannot be unloaded while it is being used by other modules or other parts of the kernel. The system uses a simple reference counter to record the number of references to each module.

Module license

The Linux kernel license (GPL, version 2) does not restrict users and enterprises from using its source code, but it strictly prohibits the release of related source code under a non-GPL license, and these codes originate or largely originate from Linux code. That is, kernel developers ensure that their code and its derivatives are freely usable by all users. However, modules pose a threat to this mechanism. Maybe someone just distributes a module in binary format for the Linux kernel;

For example, a manufacturer may release a hardware driver only as a module in binary format. Today, this is less common. In theory, the features and functionality of the Linux kernel can be significantly changed by modules that are only available in binary format, thereby turning a Linux-based kernel into a commercial product. Therefore, the Linux kernel developer community does not accept modules that are only in binary format. The implementation of Linux modules reflects this. In general, each module developer should indicate the license type in the module source code using the MODULE_LICENSE macro.

If the license is non-GPL compatible (or not marked at all), the module cannot use many of the kernel's core functions and data structures. Moreover, using modules with non-GPL licenses "taints" the kernel, which means that kernel developers no longer consider possible flaws in the kernel.

module implementation

Modules are stored in the file system as ELF object files and are linked to the kernel by executing the insmod program (see the "Linking and Cancellation of Modules" section later). For each module, the system allocates a memory area containing the following data:

  1. a module object
  2. A nul1-terminated string representing the module name (all modules must have unique names)
  3. Code to implement module functionality

The module object describes a module, and its fields are shown in Table B-1. A two-way circular list stores all module objects. The head of the linked list is stored in the modules variable, and pointers to adjacent cells are stored in the list field of each module object.
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
The state field records the internal status of the module, which can be: MODULE_STATE_LIVE (the module is active), MODULE_STATE_COMING (the module is being initialized) and MODULE_STATE_GOING (the module is being unloaded). As we already introduced in the "Dynamic Address Checking: Fixing the Code" section of Chapter 10, each module has its own exception table. The table includes (if any) the address of the module's correction code. When linking a module, the table is copied to RAM and its starting address is stored in the extable field of the module object.

Module usage counter

Each module has a set of usage counters, one for each CPU, stored in the ref field of the corresponding module object. This counter is incremented when the operation involved in the module function begins and decremented at the end of the operation. A module can be unlinked only if the sum of all usage counters reaches 0.

For example, assume that the MS-DOS file system layer is compiled as a module, and this module is linked at run time. Initially, the module's reference counter is 0. If the user loads an MS-DOS floppy disk, one of the module reference counters is incremented by one. Conversely, when the user unmounts the floppy disk, one of the counters is decremented by 1 (not even the one that was just incremented). The module's total reference counter is the sum of all CPU counters.

Export symbols

When linking a module, all global kernel symbols (variables and functions) referenced in the module object code must be replaced with their appropriate addresses. This operation is very similar to the operation performed by the linker when compiling the program in user mode (see the "Library" section of Chapter 20). This is delegated to the insmod external program (will be discussed later in the "Module Linking and Module" section). described in the Cancel section). The kernel uses some specialized kernel symbol tables to save symbols accessed by modules and corresponding addresses.
They are divided into three sections in the kernel code section:

  1. __kstrtab section (save symbol name)
  2. __ksymtab section (symbol addresses available to all modules)
  3. __ksymtab_gpl section (symbol addresses that can be used by modules released under a GPL-compatible license).

When used to statically link kernel code, the EXPORT_SYMBOL and EXPORT_SYMBOL_GPL macros cause the C compiler to add a dedicated symbol to the __ksymtab and __ksymtab_gpl sections respectively. Only kernel symbols actually used by an existing module are kept in this table. If a system programmer needs to access a kernel symbol that has not yet been exported in some modules, he only needs to add the corresponding EXPORT_SYMBOL_GPL macro to the Linux source code.

Of course, if the license is not GPL-compatible, he cannot legally export a new symbol for the module. Linked modules can also export their own symbols so that other modules can access them. The module symbol table is stored in the __ksymtab, __ksymtab_gpl, and __kstrtab sections of the module code section. To export a subset of symbols from a module, programmers can use the EXPORT_SYMBOL and EXPORT_SYMBOL_GPL macros described above. When a module is linked, the module's exported symbols are copied into two memory arrays, and their addresses are stored in the syms and gpl_syms fields of the module object.

Module dependencies

A module (B) can reference symbols exported by another module (A); in this case, we say that B is loaded on top of A, or that A is used by B. In order to link module B, module A must first be linked; otherwise, references to symbols exported by module A would not be properly linked into B. In short, there is a dependency between these two modules.

The modules_which_use_me field of the A module object is the head of a dependency list that saves all modules used by A. Each element in the linked list is a small module_use descriptor, which holds a pointer to the adjacent element in the linked list and a pointer to the corresponding module object. In this example, the module_use descriptor pointing to the B module object will appear in A's modules_which_use_me linked list. As long as there is a module loaded on A, the modules_which_use_me linked list must be dynamically updated. If A's dependency list is not empty, module A cannot be uninstalled. Of course, in addition to A and B, there will be other modules © loaded onto B, and so on. The stacking of modules is an effective method to modularize the kernel source code, with the purpose of accelerating kernel development.

Linking and canceling modules

Users can link a module into the running kernel by executing the insmod external program. This program does the following:

  1. Read the module name to link from the command line.
  2. Determine the location in the system directory tree of the file containing the module object code. The corresponding files are usually in a subdirectory of /lib/modules.
  3. Read the file containing the module object code from disk.
  4. Call the init_module() system call and pass in the parameters: the user-mode buffer address where the module target code is stored, the target code length, and the user-mode memory area where the parameters required by the insmod program are stored.
  5. Finish.

The sys_init_module() service routine is the actual executor. The main operation steps are as follows:

  1. Check whether the user is allowed to link the module (the current process must have the CAP_SYS_MODULE capability). Security is critical whenever you add functionality to the kernel that gives it access to all data and processes in the system.
  2. Allocate a temporary memory area for the module object code, and then copy in the user-mode buffer data as the first parameter of the system call.
  3. Verify whether the data in the memory area is valid and represents the ELF object of the module. If not, an error code is returned.
  4. Allocate a memory area for the parameters passed to the insmod program and store the data in the user mode buffer. The buffer address is the third parameter passed in by the system call.
  5. Look up the modules list to verify that the module is not linked. This check is performed by comparing the module name (name field of the module object).
  6. Allocate a memory area for the module core executable code and store the contents of the corresponding section of the module.
  7. Allocate a memory area for the module initialization code and store the contents of the corresponding section of the module.
  8. Determine the module object address for the new module, the object image is saved in the gnu.linkonce.this_module section of the module ELF file, and the module object is saved in the memory area in step 6.
  9. Store the memory area address allocated in steps 6 and 7 into the module_code and module_init fields of the module object.
  10. Initialize the modules_which_use_me linked list of module objects. The counter of the currently executing CPU is set to 1, while the reference counters of all other modules are set to 0.
  11. Set the license_gplok flag of the module object according to the module object license type.
  12. Use the kernel symbol table and module symbol table to reset the module object code. This means replacing the instance values ​​of all external and global symbols with the corresponding logical address offsets.
  13. Initialize the syms and gpl_syms fields of the module object so that they point to the in-memory symbol table exported by the module.
  14. The module exception table (see the "Exception Table" section in Chapter 10) is stored in the __ex_table section of the module ELF file, so it has been copied into the memory area in step 6, and its address is stored in the extable field of the module object.
  15. Parses the parameters of the insmod program and sets the values ​​of module variables accordingly.
  16. Register the kobject object in the mkobj field of the module object so that there is a new subdirectory in the module directory of the sysfs special file system (see the "kobject" section in Chapter 13).
  17. Release the temporary memory area allocated in step 2.
  18. Append the module object to the modules list.
  19. Set module status to MODULE_STATE_COMING.
  20. If the module object's init method is defined, execute it.
  21. Set module status to MODULE_STATE_LIVE.
  22. Ends and returns 0 (success).

In order to unlink a module, the user needs to call the rmmod external program, which does the following:

  1. Read the name of the module to be canceled from the command line.
  2. Open the /proc/modules file, which lists all modules linked to the kernel, and check whether the module to be canceled is effectively linked.
  3. Call the delete_module() system call, passing it the name of the module to be uninstalled.
  4. Finish.

The corresponding sys_delete_module() service routine performs the following operations:

  1. Check whether the user is allowed to cancel the module link (the current process must have the CAP_SYS_MODULE capability).
  2. Store the module name in the kernel buffer.
  3. Find the module object of the module from the modules linked list.
  4. Check the modules_which_use_me dependency list of the module and return an error code if it is not empty.
  5. Check the module status. If it is not MODULE_STATE_LIVE, return an error code.
  6. If the module has a custom init method, the function will check whether there is a custom exit method. If the module cannot be unloaded without a custom exit method, an exit code will be returned.
  7. To avoid race conditions, halt all CPUs in the system except the CPU running the sys_delete_module() service routine.
  8. Set module status to MODULE_STATE_GOING.
  9. If the cumulative value of all module reference counters is greater than 0, an error code is returned.
  10. If the module's exit method is defined, execute it.
  11. Delete the module object from the modules list and unregister the module from the sysfs special file system.
  12. Delete the module object from the module dependency list just used.
  13. Release the corresponding memory area, which contains module executable code, module objects and related symbols and exception tables.
  14. Returns 0 (success).

Link modules as needed

Modules can be automatically linked when the system requires the functionality they provide, and can be automatically deleted afterwards. For example, assume that an MS-DOS file system is neither statically nor dynamically linked. If the user attempts to mount an MS-DOS file system, the mount() system call usually fails with an error code because MS-DOS is not included in the file_systems list of registered file systems. However, if the kernel has been configured to support dynamic linking of modules, then Linux attempts to link the MS-DOS module and then scans the list of registered file systems. If the module is successfully linked, the mount() system call can continue as if the MS-DOS file system had existed from the beginning.

modprobe program

In order to automatically link modules, the kernel creates a kernel thread to execute the modprobe external program (Note 2), which takes into account all possible factors caused by module dependencies. Module dependencies have been introduced before: a module may require one or more other modules, and these modules may require other modules. For example, the MS-DOS module requires another module called fat, which contains some code common to all file systems based on the File Allocation Table (FAT).

Therefore, if the fat module is not already in the system, the fat module must be dynamically linked into the running kernel when the system requests the MS-DOS module. The operations of resolving module dependencies and searching for modules are best implemented in user mode, because this requires finding and accessing module object files in the file system. The modprobe external program is similar to insmod in that it links against a module specified on the command line. However, modprobe can also recursively link all modules used by modules on the command line.

For example, if the user calls modprobe to link the MS-DOS module, modprobe will link the fat module after the MS-DOS module when needed. In fact, modprobe only checks module dependencies, and the actual linking work of each module is achieved by creating a process and executing the insmod command. How does modprobe know the dependencies between modules? Another external command called depmod is executed at system startup. This program looks for all modules compiled for the running kernel, which are usually located in the /lib/modules directory. It then writes all inter-module dependencies into a file called modules.dep. In this way, modprobe can compare the information stored in the file with the linked module list generated by the /proc/modules file.

request_module() function

In some cases, the kernel can call the request_module() function to attempt to automatically link a module. Consider again the situation where a user attempts to mount an MS-DOS file system. If the get_fs_type() function finds that the file system has not been registered, it calls the request_module() function, hoping that MS-DOS has been compiled into a module. If the request_module() function successfully links to the requested module, get_fs_type() can continue to execute as if the module had always existed. Of course, this is not the case in all cases; in our case, the MS-DOS module might not have been compiled at all. In this case, get_fs_type() returns an error code.

The request_module() function receives the name of the module to be linked as a parameter. This function calls kernel_thread() to create a new kernel thread and wait until this kernel thread ends. This kernel thread receives the module name to be linked as a parameter, and calls the execve() system call to execute the modprobe external program (Note 3), passing it the module name. The modeprobe program then actually links the requested module and any modules on which this module depends. The program name and path name executed by exec_modprobe() can be customized by writing to the /proc/sys/kernel/modprobe file.

Guess you like

Origin blog.csdn.net/x13262608581/article/details/132507228