Preface: The Linux kernel is also one of the "big three" in Linux driver development . The startup process of the Linux kernel is much more complicated than that of uboot , and it involves more content. However, adhering to the learning attitude of "knowing what it is and knowing why it is" , the author will give readers and friends a general overview of the startup process of the Linux kernel. ( Considering the inconsistency between the hardware platform and the Linux kernel version, the actual situation may be slightly different )
Experimental hardware: imx6ull; Linux kernel version: 4.1.15
If you want to dig into the startup process of the Linux kernel, you need to start with the Linux link script (the entry of the program: the first instruction executed by the program is called the entry of the program, and this entry is usually specified in the link script), open the arch/ arm The file /kernel/vmlinux.lds (download the Linux source code and compile it to get the lds link script):
/* ld script to make ARM Linux kernel
* taken from the i386 version by Russell King
* Written by Martin Mares <[email protected]>
*/
#include <asm-generic/vmlinux.lds.h>
#include <asm/cache.h>
#include <asm/thread_info.h>
#include <asm/memory.h>
#include <asm/page.h>
#ifdef CONFIG_ARM_KERNMEM_PERMS
#include <asm/pgtable.h>
#endif
#define PROC_INFO \
. = ALIGN(4); \
VMLINUX_SYMBOL(__proc_info_begin) = .; \
*(.proc.info.init) \
VMLINUX_SYMBOL(__proc_info_end) = .;
#define IDMAP_TEXT \
ALIGN_FUNCTION(); \
VMLINUX_SYMBOL(__idmap_text_start) = .; \
*(.idmap.text) \
VMLINUX_SYMBOL(__idmap_text_end) = .; \
. = ALIGN(PAGE_SIZE); \
VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \
*(.hyp.idmap.text) \
VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
#ifdef CONFIG_HOTPLUG_CPU
#define ARM_CPU_DISCARD(x)
#define ARM_CPU_KEEP(x) x
#else
#define ARM_CPU_DISCARD(x) x
#define ARM_CPU_KEEP(x)
#endif
#if (defined(CONFIG_SMP_ON_UP) && !defined(CONFIG_DEBUG_SPINLOCK)) || \
defined(CONFIG_GENERIC_BUG)
#define ARM_EXIT_KEEP(x) x
#define ARM_EXIT_DISCARD(x)
#else
#define ARM_EXIT_KEEP(x)
#define ARM_EXIT_DISCARD(x) x
#endif
OUTPUT_ARCH(arm)
ENTRY(stext)
#ifndef __ARMEB__
jiffies = jiffies_64;
#else
jiffies = jiffies_64 + 4;
#endif
SECTIONS
{
/*
* XXX: The linker does not define how output sections are
* assigned to input sections when there are multiple statements
* matching the same input section name. There is no documented
* order of matching.
*
* unwind exit sections must be discarded before the rest of the
* unwind sections get included.
*/
/DISCARD/ : {
*(.ARM.exidx.exit.text)
*(.ARM.extab.exit.text)
ARM_CPU_DISCARD(*(.ARM.exidx.cpuexit.text))
ARM_CPU_DISCARD(*(.ARM.extab.cpuexit.text))
ARM_EXIT_DISCARD(EXIT_TEXT)
ARM_EXIT_DISCARD(EXIT_DATA)
EXIT_CALL
#ifndef CONFIG_MMU
*(.text.fixup)
*(__ex_table)
#endif
#ifndef CONFIG_SMP_ON_UP
*(.alt.smp.init)
#endif
*(.discard)
*(.discard.*)
}
#ifdef CONFIG_XIP_KERNEL
. = XIP_VIRT_ADDR(CONFIG_XIP_PHYS_ADDR);
#else
. = PAGE_OFFSET + TEXT_OFFSET;
#endif
.head.text : {
_text = .;
HEAD_TEXT
}
#ifdef CONFIG_ARM_KERNMEM_PERMS
. = ALIGN(1<<SECTION_SHIFT);
#endif
//省略.......
You can find ENTRY (stext) in line 49 of the vmlinux.lds link file . ENTRY indicates the entry of the Linux kernel, the entry is stext , and stext is defined in the file arch/arm/kernel/head.S , so it is necessary to analyze the Linux kernel To start the process, you must first start the analysis from the text of the file arch/arm/kernel/head.S .
1. The overall startup process of the Linux kernel
★The author divides the overall startup process of the Linux kernel into 5 parts:
2. Linux kernel startup process
2.1 Linux kernel entry stext
stext is the entry address of the Linux kernel. In the file arch/arm/kernel/head.S, there are prompts as follows:
/*
* Kernel startup entry point.
* ---------------------------
*
* This is normally called from the decompressor code. The requirements
* are: MMU = off, D-cache = off, I-cache = dont care, r0 = 0,
* r1 = machine nr, r2 = atags or dtb pointer.
*
* This code is mostly position independent, so if you link the kernel at
* 0xc0008000, you call this at __pa(0xc0008000).
*
* See linux/arch/arm/tools/mach-types for the complete list of machine
* numbers for r1.
*
* We're trying to keep crap to a minimum; DO NOT add any machine specific
* crap here - that's what the boot loader (or in extreme, well justified
* circumstances, zImage) is for.
*/
According to the comments in the sample code above, the requirements before the Linux kernel starts are as follows:
①. Close the MMU.
②, close the D-cache.
③, I-Cache does not matter.
④, r0 = 0.
⑤, r1= machine nr (that is, machine ID).
⑥, r2 = atags or the first address of the device tree (dtb).
The entry point stext of the Linux kernel is actually equivalent to the entry function of the kernel. The content of the stext function is as follows:
ENTRY(stext)
ARM_BE8(setend be ) @ ensure we are in BE8 mode
THUMB( adr r9, BSYM(1f) ) @ Kernel is always entered in ARM.
THUMB( bx r9 ) @ If this is a Thumb-2 kernel,
THUMB( .thumb ) @ switch to Thumb now.
THUMB(1: )
#ifdef CONFIG_ARM_VIRT_EXT
bl __hyp_stub_install
#endif
@ ensure svc mode and all interrupts masked
safe_svcmode_maskall r9
mrc p15, 0, r9, c0, c0 @ get processor id
bl __lookup_processor_type @ r5=procinfo r9=cpuid
movs r10, r5 @ invalid processor (r5=0)?
THUMB( it eq ) @ force fixup-able long branch encoding
beq __error_p @ yes, error 'p'
#ifdef CONFIG_ARM_LPAE
mrc p15, 0, r3, c0, c1, 4 @ read ID_MMFR0
and r3, r3, #0xf @ extract VMSA support
cmp r3, #5 @ long-descriptor translation table format?
THUMB( it lo ) @ force fixup-able long branch encoding
blo __error_lpae @ only classic page table format
#endif
#ifndef CONFIG_XIP_KERNEL
adr r3, 2f
ldmia r3, {r4, r8}
sub r4, r3, r4 @ (PHYS_OFFSET - PAGE_OFFSET)
add r8, r8, r4 @ PHYS_OFFSET
#else
ldr r8, =PLAT_PHYS_OFFSET @ always constant in this case
#endif
/*
* r1 = machine no, r2 = atags or dtb,
* r8 = phys_offset, r9 = cpuid, r10 = procinfo
*/
bl __vet_atags
#ifdef CONFIG_SMP_ON_UP
bl __fixup_smp
#endif
#ifdef CONFIG_ARM_PATCH_PHYS_VIRT
bl __fixup_pv_table
#endif
bl __create_page_tables
/*
* The following calls CPU specific code in a position independent
* manner. See arch/arm/mm/proc-*.S for details. r10 = base of
* xxx_proc_info structure selected by __lookup_processor_type
* above. On return, the CPU will be ready for the MMU to be
* turned on, and r0 will hold the CPU control register value.
*/
ldr r13, =__mmap_switched @ address to jump to after
@ mmu has been enabled
adr lr, BSYM(1f) @ return (PIC) address
mov r8, r4 @ set TTBR1 to swapper_pg_dir
ldr r12, [r10, #PROCINFO_INITFUNC]
add r12, r12, r10
ret r12
1: b __enable_mmu
ENDPROC(stext)
The following process is carried out in the stext function:
①. Call the function safe_svcmode_maskall to ensure that the CPU is in SVC mode and all interrupts are turned off . safe_svcmode_maskall is defined in the file arch/arm/include/asm/assembler.h .
②. Read the processor ID , and the ID value is stored in the r9 register.
③. Call the function __lookup_processor_type to check whether the current system supports this CPU, and if so, obtain procinfo information. procinfo is a structure of type proc_info_list , and proc_info_list is defined in the file arch/arm/include/asm/procinfo.h
④, call the function __create_page_tables to create a page table
⑤. Save the address of the function __mmap_switched to the r13 register. __mmap_switched is defined in the file arch/arm/kernel/head-common.S , __mmap_switched will eventually call the start_kernel function
⑥. Call the __enable_mmu function to enable the MMU . __enable_mmu is defined in the file arch/arm/kernel/head.S . __enable_mmu will eventually turn on the MMU by calling __turn_mmu_on , and __turn_mmu_on will finally execute the __mmap_switched function saved in r13 .
2.2 __mmap_switched function
The __mmap_switched function is defined in the file arch/arm/kernel/head-common.S , and the function code is as follows:
/*
* The following fragment of code is executed with the MMU on in MMU mode,
* and uses absolute addresses; this is not position independent.
*
* r0 = cp#15 control register
* r1 = machine ID
* r2 = atags/dtb pointer
* r9 = processor ID
*/
__INIT
__mmap_switched:
adr r3, __mmap_switched_data
ldmia r3!, {r4, r5, r6, r7}
cmp r4, r5 @ Copy data segment if needed
1: cmpne r5, r6
ldrne fp, [r4], #4
strne fp, [r5], #4
bne 1b
mov fp, #0 @ Clear BSS (and zero fp)
1: cmp r6, r7
strcc fp, [r6],#4
bcc 1b
ARM( ldmia r3, {r4, r5, r6, r7, sp})
THUMB( ldmia r3, {r4, r5, r6, r7} )
THUMB( ldr sp, [r3, #16] )
str r9, [r4] @ Save processor ID
str r1, [r5] @ Save machine type
str r2, [r6] @ Save atags pointer
cmp r7, #0
strne r0, [r7] @ Save control register values
b start_kernel
ENDPROC(__mmap_switched)
The __mmap_switched function finally calls start_kernel to start the Linux kernel, and the start_kernel function is defined in the file init/main.c .
2.3 start_kernel function
start_kernel completes some initialization work before Linux starts by calling many sub-functions. Since thereare too many sub-functions called in the start_kernel function, and these sub-functions are very complicated, let's briefly look at some important sub-functions. The simplified and annotated start_kernel function is as follows:
asmlinkage __visible void __init start_kernel(void)
{
char *command_line;
char *after_dashes;
lockdep_init(); /* lockdep 是死锁检测模块,此函数会初始化
* 两个 hash 表。此函数要求尽可能早的执行!
*/
set_task_stack_end_magic(&init_task);/* 设置任务栈结束魔术数,
*用于栈溢出检测
*/
smp_setup_processor_id(); /* 跟 SMP 有关(多核处理器),设置处理器 ID。
* 有很多资料说 ARM 架构下此函数为空函数,那是因
* 为他们用的老版本 Linux,而那时候 ARM 还没有多
* 核处理器。
*/
debug_objects_early_init(); /* 做一些和 debug 有关的初始化 */
boot_init_stack_canary(); /* 栈溢出检测初始化 */
cgroup_init_early(); /* cgroup 初始化,cgroup 用于控制 Linux 系统资源*/
local_irq_disable(); /* 关闭当前 CPU 中断 */
early_boot_irqs_disabled = true;
/*
* 中断关闭期间做一些重要的操作,然后打开中断
*/
boot_cpu_init(); /* 跟 CPU 有关的初始化 */
page_address_init(); /* 页地址相关的初始化 */
pr_notice("%s", linux_banner);/* 打印 Linux 版本号、编译时间等信息 */
setup_arch(&command_line); /* 架构相关的初始化,此函数会解析传递进来的
* ATAGS 或者设备树(DTB)文件。会根据设备树里面
* 的 model 和 compatible 这两个属性值来查找
* Linux 是否支持这个单板。此函数也会获取设备树
* 中 chosen 节点下的 bootargs 属性值来得到命令
* 行参数,也就是 uboot 中的 bootargs 环境变量的
* 值,获取到的命令行参数会保存到
*command_line 中。
*/
mm_init_cpumask(&init_mm); /* 看名字,应该是和内存有关的初始化 */
setup_command_line(command_line); /* 好像是存储命令行参数 */
setup_nr_cpu_ids(); /* 如果只是 SMP(多核 CPU)的话,此函数用于获取
* CPU 核心数量,CPU 数量保存在变量
* nr_cpu_ids 中。
*/
setup_per_cpu_areas(); /* 在 SMP 系统中有用,设置每个 CPU 的 per-cpu 数据 */
smp_prepare_boot_cpu();
build_all_zonelists(NULL, NULL); /* 建立系统内存页区(zone)链表 */
page_alloc_init(); /* 处理用于热插拔 CPU 的页 */
/* 打印命令行信息 */
pr_notice("Kernel command line: %s\n", boot_command_line);
parse_early_param(); /* 解析命令行中的 console 参数 */
after_dashes = parse_args("Booting kernel",
static_command_line, __start___param,
__stop___param - __start___param,
-1, -1, &unknown_bootoption);
if (!IS_ERR_OR_NULL(after_dashes))
parse_args("Setting init args", after_dashes, NULL, 0, -1, -1,
set_init_arg);
jump_label_init();
setup_log_buf(0); /* 设置 log 使用的缓冲区*/
pidhash_init(); /* 构建 PID 哈希表,Linux 中每个进程都有一个 ID,
* 这个 ID 叫做 PID。通过构建哈希表可以快速搜索进程
* 信息结构体。
*/
vfs_caches_init_early(); /* 预先初始化 vfs(虚拟文件系统)的目录项和
* 索引节点缓存
*/
sort_main_extable(); /* 定义内核异常列表 */
trap_init(); /* 完成对系统保留中断向量的初始化 */
mm_init(); /* 内存管理初始化 */
sched_init(); /* 初始化调度器,主要是初始化一些结构体 */
preempt_disable(); /* 关闭优先级抢占 */
if (WARN(!irqs_disabled(), /* 检查中断是否关闭,如果没有的话就关闭中断 */
"Interrupts were enabled *very* early, fixing it\n"))
local_irq_disable();
idr_init_cache(); /* IDR 初始化,IDR 是 Linux 内核的整数管理机
* 制,也就是将一个整数 ID 与一个指针关联起来。
*/
rcu_init(); /* 初始化 RCU,RCU 全称为 Read Copy Update(读-拷贝修改) */
trace_init(); /* 跟踪调试相关初始化 */
context_tracking_init();
radix_tree_init(); /* 基数树相关数据结构初始化 */
early_irq_init(); /* 初始中断相关初始化,主要是注册 irq_desc 结构体变
* 量,因为 Linux 内核使用 irq_desc 来描述一个中断。
*/
init_IRQ(); /* 中断初始化 */
tick_init(); /* tick 初始化 */
rcu_init_nohz();
init_timers(); /* 初始化定时器 */
hrtimers_init(); /* 初始化高精度定时器 */
softirq_init(); /* 软中断初始化 */
timekeeping_init();
time_init(); /* 初始化系统时间 */
sched_clock_postinit();
perf_event_init();
profile_init();
call_function_init();
WARN(!irqs_disabled(), "Interrupts were enabled early\n");
early_boot_irqs_disabled = false;
local_irq_enable(); /* 使能中断 */
kmem_cache_init_late(); /* slab 初始化,slab 是 Linux 内存分配器 */
console_init(); /* 初始化控制台,之前 printk 打印的信息都存放
* 缓冲区中,并没有打印出来。只有调用此函数
* 初始化控制台以后才能在控制台上打印信息。
*/
if (panic_later)
panic("Too many boot %s vars at `%s'", panic_later,
panic_param);
lockdep_info();/* 如果定义了宏 CONFIG_LOCKDEP,那么此函数打印一些信息。*/
locking_selftest() /* 锁自测 */
......
page_ext_init();
debug_objects_mem_init();
kmemleak_init(); /* kmemleak 初始化,kmemleak 用于检查内存泄漏 */
setup_per_cpu_pageset();
numa_policy_init();
if (late_time_init)
late_time_init();
sched_clock_init();
calibrate_delay(); /* 测定 BogoMIPS 值,可以通过 BogoMIPS 来判断 CPU 的性能
* BogoMIPS 设置越大,说明 CPU 性能越好。
*/
pidmap_init(); /* PID 位图初始化 */
anon_vma_init(); /* 生成 anon_vma slab 缓存 */
acpi_early_init();
......
thread_info_cache_init();
cred_init(); /* 为对象的每个用于赋予资格(凭证) */
fork_init(); /* 初始化一些结构体以使用 fork 函数 */
proc_caches_init(); /* 给各种资源管理结构分配缓存 */
buffer_init(); /* 初始化缓冲缓存 */
key_init(); /* 初始化密钥 */
security_init(); /* 安全相关初始化 */
dbg_late_init();
vfs_caches_init(totalram_pages); /* 为 VFS 创建缓存 */
signals_init(); /* 初始化信号 */
page_writeback_init(); /* 页回写初始化 */
proc_root_init(); /* 注册并挂载 proc 文件系统 */
nsfs_init();
cpuset_init(); /* 初始化 cpuset,cpuset 是将 CPU 和内存资源以逻辑性
* 和层次性集成的一种机制,是 cgroup 使用的子系统之一
*/
cgroup_init(); /* 初始化 cgroup */
taskstats_init_early(); /* 进程状态初始化 */
delayacct_init();
check_bugs(); /* 检查写缓冲一致性 */
acpi_subsystem_init();
sfi_init_late();
if (efi_enabled(EFI_RUNTIME_SERVICES)) {
efi_late_init();
efi_free_boot_services();
}
ftrace_init();
rest_init(); /* rest_init 函数 */
}
A large number of functions are called in start_kernel , and each function is a huge knowledge point. If you want to learn the Linux kernel, then these functions need to be studied in detail. This blog is an overview of the Linux kernel startup process. Here we will not delve into the implementation of the internal functions of each function. Interested friends can go and see for themselves!
The author here briefly summarizes what the start_kernel function does:
(1) Initialization related to kernel architecture and general configuration
(2) Initialization related to interrupt vector table (
3) Initialization related to memory management (4) Initialization
related to process management
(5) Initialization related to process scheduling
(6) Initialization related to network subsystem management
(7) Virtual file system initialization
(8) file system initialization, etc.
Finally call rest_init()
2.4 rest_init function
The rest_init function is defined in the file init/main.c , and the content of the function is as follows:
static noinline void __init_refok rest_init(void)
{
int pid;
rcu_scheduler_starting();
smpboot_thread_init();
/*
* We need to spawn init first so that it obtains pid 1, however
* the init task will end up wanting to create kthreads, which, if
* we schedule it before we create kthreadd, will OOPS.
*/
kernel_thread(kernel_init, NULL, CLONE_FS);
numa_default_policy();
pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
rcu_read_lock();
kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
rcu_read_unlock();
complete(&kthreadd_done);
/*
* The boot idle thread must execute schedule()
* at least once to get things moving:
*/
init_idle_bootup_task(current);
schedule_preempt_disabled();
/* Call into cpu_idle with preempt disabled */
cpu_startup_entry(CPUHP_ONLINE);
}
The following process is carried out in the rest_init function:
①. Call the function rcu_scheduler_starting to start the RCU lock scheduler
②. Call the function kernel_thread to create the kernel_init process, which is the famous init kernel process. The PID of the init process is 1 . The init process is a kernel process at the beginning (that is, running in the kernel state), and then the init process will look for a program named "init" in the root file system. This "init" program is in the user state. By running this "init" program , the init process will realize the transition from kernel mode to user mode.
③. Call the function kernel_thread to create the kthreadd kernel process. The PID of this kernel process is 2 . The kthreadd process is responsible for the scheduling and management of all kernel processes.
④. Finally, call the function cpu_startup_entry to enter the idle process, cpu_startup_entry will call cpu_idle_loop, cpu_idle_loop is a while loop, which is the idle process code. The PID of the idle process is 0 , and the idle process is called an idle process.
Enter "ps -A" in the Linux terminal to print out all the processes in the current system, in which you can see the init process and the kthreadd process, as shown in the following figure:
Let's focus on the init process next, kernel_init is the process function of the init process.
2.5 init process
The kernel_init function is the specific work of the init process. It is defined in the file init/main.c . The function content is as follows:
static int __ref kernel_init(void *unused)
{
int ret;
kernel_init_freeable();
/* need to finish all async __init code before freeing the memory */
async_synchronize_full();
free_initmem();
mark_rodata_ro();
system_state = SYSTEM_RUNNING;
numa_default_policy();
flush_delayed_fput();
if (ramdisk_execute_command) {
ret = run_init_process(ramdisk_execute_command);
if (!ret)
return 0;
pr_err("Failed to execute %s (error %d)\n",
ramdisk_execute_command, ret);
}
/*
* We try each of these until one succeeds.
*
* The Bourne shell can be used instead of init if we are
* trying to recover a really broken machine.
*/
if (execute_command) {
ret = run_init_process(execute_command);
if (!ret)
return 0;
panic("Requested init %s failed (error %d).",
execute_command, ret);
}
if (!try_to_run_init_process("/sbin/init") ||
!try_to_run_init_process("/etc/init") ||
!try_to_run_init_process("/bin/init") ||
!try_to_run_init_process("/bin/sh"))
return 0;
panic("No working init found. Try passing init= option to kernel. "
"See Linux Documentation/init.txt for guidance.");
}
The following process is carried out in the kernel_init function:
①. The kernel_init_freeable function is used to complete some other initialization tasks of the init process;
②, ramdisk_execute_command is a global char pointer variable, the value of this variable is "/init", that is, the init program under the root directory. ramdisk_execute_command can also be passed through uboot, just use "rdinit=xxx" in bootargs, where xxx is the name of the specific init program.
③. If there is a "/init" program, run this program through the function run_init_process .
④. If ramdisk_execute_command is empty, it depends on whether execute_command is empty. Anyway, you must find a runnable init program in the root file system anyway. The value of execute_command is passed through uboot, just use "init=xxxx" in bootargs, for example, "init=/linuxrc" means that linuxrc in the root file system is the user space init program to be executed.
⑤. If both ramdisk_execute_command and execute_command are empty, then search for "/sbin/init", "/etc/init", "/bin/init" and "/bin/sh" in turn , these four are equivalent to the backup init program , if these four do not exist, then Linux failed to start!
If none of the above steps can find the init program of the user space, it will prompt that an error has occurred!
3. Summary of Linux kernel startup process
Summary of the boot process of the Linux kernel:
The language of the Linux kernel: mainly C and assembly language, supplemented by other languages
A simplified version of the boot process for the Linux kernel:
1. Make the CPU enter the SVC privileged mode and turn off interrupts;
2. Create a page table, enable the memory management unit MMU, turn on the MMU, and turn on the __mmap_switched function;
3. Initialize a series of configuration, interrupt, memory, etc. (the 7 major components of the Linux kernel) through the start_kernel function;
4. Start the RCU lock scheduler, kernel_thread creates the init process, kernel_thread creates the kthreadd kernel process, and cpu_startup_entry to enter the idle process;
5. The kernel_init_freeable function is used to complete some other initialization tasks of the init process, and run the init program through run_init_process;