System calls and user mode fall into kernel mode

We all know that there are many processes running in the operating system. If ordinary processes can directly operate the hardware, then the security of the system cannot be guaranteed. Therefore, the operating system has two states. One state is that the running code can operate the hardware. ; One state cannot operate the hardware, and can only switch to the first state to operate and then switch back. This is the kernel state and user state.

In user mode, a process can only access its own virtual address space and some limited resources, and cannot directly access system hardware devices or address spaces of other processes; while in kernel mode, a process can access all resources of the system, including hardware Address space for devices and other processes . Kernel mode has higher privileges and more system resources than user mode. When a process needs to access system resources, such as reading and writing files, creating a new process, etc., it needs to switch from user mode to kernel mode through system calls to perform corresponding operations . Before the process executes the system call, its running state is in the user state; during the execution of the system call, the running state of the process will switch from the user state to the kernel state; after the system call is executed, the process will switch from the kernel state to the user state state. This switching between the user mode and the kernel mode is an important means for the operating system to realize resource isolation and protection between processes.

1. Privilege level bits

In order to protect the security of the operating system and application programs, CPU instructions are divided into different permission levels, and different access restrictions are implemented for different instruction permission levels. Generally speaking, CPU instruction permissions can be divided into four levels, which are kernel mode, user mode, super user mode and virtual machine mode, and the CPU will set a corresponding permission level for each instruction . When a program is running, the CPU will check whether the instructions used by the program have sufficient privilege levels to execute . If the privilege level is not enough, an exception will be thrown. This exception will be caught by the operating system and handed over to the kernel for processing. The kernel will decide Perform a restricted operation on behalf of the program or terminate the program.

The CPU uses the privilege level bit to distinguish whether the current program is running in the user mode or the kernel mode. Only when running in the kernel mode can it have sufficient authority to execute kernel mode instructions .

The privilege level bits are stored in the CPU's control registers, specifically, in the Processor Status Word (PSW) or similar register. In the CPU of x86 architecture, PSW contains multiple flag bits, including the privilege level bit. The privilege level bit is usually called the "privilege bit" of the processor status word (Privilege Bit), which can be a single flag bit or a multi-bit flag bit.

In the CPU of the x86 architecture, the privilege level bit is usually set to two states: user mode and kernel mode. The privilege level bit is called the privilege level (Privilege Level, PL), which has two states: PL0 represents the highest privilege level, that is, the kernel state; PL3 represents the lowest privilege level, that is, the user state. When the CPU executes an instruction, it will check the value of the current privilege level bit according to the privilege level of the instruction. If the privilege level of the instruction is higher than the current privilege level, an exception will be generated. This exception can be caught and processed by the operating system kernel .

It should be noted that the privilege level bits of different CPU architectures may be stored in different registers, and the specific implementation methods are also different. Therefore, when writing the operating system kernel or low-level drivers, special attention needs to be paid to the storage and use of privilege level bits to ensure the correctness and portability of the program.

2. The user state falls into the kernel state

When a program in user mode wants to perform privileged operations, it needs to switch to kernel mode to execute. Falling into the kernel state is also a privileged operation. The user program cannot fall into the kernel state directly. It can only fall into the user state through a system call or when an interrupt or exception occurs. Among them, the system call is the only one initiated by the user program. Ways to fall into kernel mode .

  1. System Call: User programs can initiate requests to the operating system kernel through system calls, requesting the operating system kernel to perform some privileged operations, such as creating processes, opening files, reading and writing devices, and so on. During the execution of the system call, the user program needs to fall into the kernel state, so that the operating system kernel can respond to the request of the user program.
  2. Exception: When an exception occurs when the processor executes an instruction (such as a page fault exception, an illegal opcode, etc.), the processor will automatically switch to the kernel mode and hand over control to the operating system kernel to handle the exception. The operating system kernel can perform corresponding processing according to the type and information of the exception, such as allocating physical pages, closing programs, and so on.
  3. Interrupt (Interrupt): When an interrupt event occurs on an external device, the processor will automatically switch to the kernel mode and hand over control to the operating system kernel to handle the interrupt. The operating system kernel can perform corresponding processing according to the type and information of the interrupt, such as reading data from the input device, sending output data, and so on.

3. System call

System Call (System Call) is an interface provided by the operating system to user programs. Through system calls, user programs can request the operating system kernel to perform some privileged operations, such as creating processes, opening files, and reading and writing devices. The execution process of a system call can be summarized in the following steps:

  1. The user program calls the system function, and passes the parameters of the system call to the system function.
  2. The system function stores the parameters of the system call in the register and stores the system call number (syscall number) in the register eax .
  3. The system function executes the interrupt instruction int 0x80, triggers the CPU to switch from the user mode to the kernel mode , and transfers control to the operating system kernel.
  4. The operating system kernel performs corresponding privileged operations according to the system call number and parameters, such as creating a process, opening a file, reading and writing a device, and so on.
  5. The operating system kernel returns the result of the privileged operation to the user program, switches the control right back to the user state, and continues to execute the user program.

The IDT table (Interrupt Descriptor Table) refers to the interrupt descriptor table, which is one of the data structures used by the operating system kernel to manage and respond to events such as interrupts, exceptions, and system calls.

In an x86-based computer, the IDT table is an array of 256 entries (or entries), each of which contains the address of a handler and associated flag information for responding to specific types of interrupts or abnormal. Among them, the first 32 entries are used to respond to CPU predefined interrupts and exceptions, called Exception Descriptor Table; the 33rd to 255th entries are used for user-defined interrupts, exceptions and system calls and other events , called the Interrupt Descriptor Table (Interrupt Descriptor Table).

**When an interrupt or exception occurs, the CPU will look up the corresponding entry in the IDT table, and jump to the corresponding handler for execution according to the handler address stored therein. **If the handler executes successfully, the interrupt or exception can be processed and returned; otherwise, the handler may throw an exception, generate an error message, or directly cause the system to crash.

The DPL in the IDT table of int 0x80 is set to 3, so the interrupt instruction of int 0x80 can be directly called from the user mode (because the CPL of the user mode is 3, which is less than or equal to the DPL), thus falling into the kernel mode.

That is, if you are forced to bury some privileged instructions in the user mode code, an error will be reported due to insufficient permissions when executing. If you want to execute privileged instructions, you can only enter the kernel mode to execute them, and you can only enter the kernel mode actively through system calls. Calling the system functions defined by the kernel can only be executed in a legal and safe way provided by the system , and cannot directly call privileged instructions, thereby improving the security of the system.

4. Kernel space and user space

It should be noted that there is no physical difference between the operating system kernel and user programs. They are in the same block of memory. It is just that the CPU uses the privilege level bit to distinguish the permission level of the current program execution instructions during execution, which restricts the execution of instructions through the system. function executes privileged instructions. But the system functions reside in the kernel space, we also need to protect these functions from being modified . Therefore, the operating system kernel needs to use some special means to ensure its own security and stability, such as using different memory address spaces to isolate different programs, limiting the access range of programs, and so on.

The kernel refers to a collection of modules such as memory management, file management, and hardware management. It can be regarded as a globally shared, bottom-level library with permission protection functions. It is essentially a binary file that resides in memory. The system call is the essence of the interaction between the process and the kernel. When a process wants to do some high-privileged operations, such as reading and writing files, it will execute part of its code through the system call. After booting, the bootloader loads the kernel into memory.

In modern operating systems, the memory space is usually divided into kernel space and user space. Kernel space is the memory area where the operating system kernel resides. It contains all the code and data structures of the operating system kernel, as well as protected hardware resources, such as interrupt controllers, clocks, device drivers, etc. User space is the memory area where ordinary applications are located. It contains the code and data structures of the application, but cannot directly access the protected kernel space resources.

In the CPU of the x86 architecture, the kernel space is usually limited to the highest end of the address space, while the user space is limited to the lowest end of the address space. Specifically, in the 32-bit x86 architecture, the kernel space is usually limited to the address range of 0xC0000000 and above, while the user space is limited to the address range between 0x00000000 and 0xBFFFFFFF . The purpose of doing this is to prevent user programs from accessing code and data in the kernel space beyond the boundary, thereby improving the security and stability of the operating system.

Every time you access the memory, you need to compare the DPL (the privilege level of the target memory segment, which is stored in the segment descriptor) and the CPL (the current privilege level, also called the process privilege level, which exists in the lower two bits of the cs register) to determine whether there is enough permission level access. Access is only allowed when DPL>=CPL . The DPL of the kernel segment is set to 0 during the kernel initialization phase, so the kernel mode can access any data, while the user mode cannot access kernel data.

5. Switch between kernel stack and user stack

In a computer with x86 architecture, each process has its own kernel stack and user stack, which are used to process function calls in kernel mode and user mode respectively. When a process is created, the operating system allocates a certain amount of virtual memory space for it, including user stack and kernel stack.

When a process initiates a system call or an interrupt request enters the kernel mode, the processor will automatically push the current user mode stack pointer (ESP) and flag register (EFLAGS) into the process kernel mode stack, and at the same time load the kernel mode stack pointer into In ESP , the process starts running in kernel mode at this time. After the kernel mode executes the system call or interrupt service routine, the processor will automatically take out the previously saved user mode stack pointer from the kernel mode stack, and load it into the ESP register, so as to return to the user mode to continue execution. In this way, the user mode process can continue to use its own user mode stack by restoring the previous ESP value.

Guess you like

Origin blog.csdn.net/qq_25046827/article/details/130353420