[Reading notes] Linux kernel design and implementation-system call

The implementation of the interface of the system call is mainly to ensure the stability and reliability of the system and to avoid arbitrary application of the application.

1. Yonai Nuclear Communication

System calls add an intermediate layer between user-space processes and hardware devices.
The role is as follows:

  1. Provides a hardware abstract interface for user space;
  2. Ensure the stability and safety of the system;
  3. Each process runs in a virtual system, and provides such a common interface in user space and the rest of the system.

ps: In Linux systems, system calls are the only means of user space access to the kernel; except for exceptions and traps, they are the only legal entry point for the kernel.

2. API, POSIX and C libraries

In general, applications are programmed through application programming interfaces (APIs) implemented in user space rather than directly through system calls.
ps: Applications that use this programming interface do not actually need to correspond to system calls provided by the kernel.

As shown in the figure to deepen understanding:
Insert picture description here
There is a maxim about the interface design of unix " provide mechanism (what functions need to provide) instead of strategy (how to achieve these functions) ". That is, the Unix system call abstracts out functions for accomplishing a certain purpose. As for how to use these functions, the kernel does not need to be concerned at all.

3. System call

To access the system call (often called syscall in Linux), usually through the function call defined in the C library.
When a system call error occurs, the C library will write the error code to the errno global variable. This variable can be translated into an error string that the user can understand by calling the oerror () library function.

Q: How to define system call?
A: In order to ensure the compatibility of 32-bit and 64-bit systems, system calls have different return value types in user space and kernel space, int in user space and long in kernel space. The function name is also prefixed with sys_. For example, the system call getpid () is defined as sys_getpid () in the kernel.

3.1 System call number-corresponding system call function

In Linux, each system call is given a system call number (unique).
When a process in user space executes a system call, the system call number is used to indicate which system call is to be executed. The process does not mention the name of the system call (so the system call number is important).

Linux has an "unimplemented" system call sys_ni_syscall (), which does nothing other than return -ENOSYS.

The kernel records a list of all registered system calls in the system call table and stores them in sys_call_table. This table specifies a unique system call number for each valid system call.

3.2 Performance of system calls

Linux system calls execute faster than many other operating systems.
Why?
A: Linux's short context switching time is an important reason. Both the in and out kernels are optimized to be simple and efficient; the
second is the system call handler and each system call itself is also very simple.

4. System call handler

User-space programs cannot directly execute kernel code.
The application should notify the system in some way and let the kernel perform system calls in the kernel space on behalf of the application.
The mechanism for notifying the kernel is implemented by soft interrupts : by raising an exception, the system is switched to the kernel state to execute the exception handler-the system call handler.

4.1 Specify the appropriate system call

Because all system calls fall into the kernel in the same way, it is not enough to fall into the kernel space. The system call number must be passed to the kernel.
Insert picture description here

4.2 Parameter passing

In addition to the system call number, most system calls also require some external parameter input. Therefore, when a trap occurs, these parameters should be passed from user space to the kernel. The easiest way is to store these parameters in the register like passing the system call number.

5. Implementation of system call

5.1 Implementing system calls

The first step in implementing a new system call is to determine its purpose. Every system call should have a clear purpose. It is not recommended to use multi-purpose system calls in Linux (a system call chooses to complete different tasks by passing different parameter values).

5.2 Parameter verification

System calls must carefully check whether all their parameters are legal and valid . Prevent users from passing illegal input to the kernel.
Each parameter must be checked to ensure that they are not only valid and valid, but also correct. A process should not let the kernel access resources that it does not have access to.
The most important kind of check is to check whether the pointer provided by the user is valid.
Before receiving a user-space pointer, the kernel must guarantee the following:

  1. The memory area pointed to by the pointer belongs to user space. The process must not coax the kernel to read the data in the kernel space;
  2. The memory area pointed to by the pointer is in the address space of the process. Processes must not trick the kernel to read data from other processes;
  3. If it is read, the memory should be marked as readable; if it is written, the memory should be marked as writable; if it is executable, the memory should be marked as executable. Processes must not bypass memory access restrictions.

The kernel provides two methods to complete the necessary checks and copy data back and forth between kernel space and user space. As shown in the following table:

method Features Explanation return value
copy_to_user() Write data to user space The first parameter is the destination memory address in the process space, the second is the source address in the kernel space, and the last parameter is the length of the data to be copied (number of bytes) Execution failure: the number of bytes of data that could not be copied; success: 0; when the above error occurs, the system call returns the standard -EFAULT
copy_from_user() Read data from user space There are also three parameters, similar to copy_to_user, the data at the position specified by the second parameter is copied to the position specified by the first parameter, and the length of the copied data is determined by the third parameter Same as above

ps: Both copy_to_user () and copy_from_user () may cause blocking. When pages containing user data are swapped out to the hard disk rather than on physical memory, blocking occurs. At this point, the process will sleep until the page fault handler returns the page from the hard disk back to physical memory.

The last check is for legal authority .
The new version of the Linux kernel provides a more fine-grained "capability" mechanism, and the new system allows checking special permissions for specific resources. The caller can use the enable () function to check whether it has the right to operate on the specified resource. If it returns a non-zero value, the caller has the right to operate, and returns 0 to have no right.
eg:

if(!capable(CAP_SYSY_BOOT))	/* 启动系统的系统管理员 */
	return -EPERM;

Refer to <linux / capability.h> for a list of ownership capabilities and their permissions.

6. System call context

The kernel is in the process context when executing system calls. The current pointer points to the current task, the process that caused the system call.

In the context of the process , the kernel can sleep (such as when a system call is blocked or when schedule () is explicitly called) and can be preempted .

6.1 The final step of binding a system call

Q: After writing a system call, how to register it as a formal system call?
A:

  1. Add an entry at the end of the system call table (for most architectures, this table is located in the entry.s file). Every hardware system that supports the system call must do this kind of work. Starting from 0, the position of the system call in this table is its system call number;
  2. For various supported architectures, the system call number must be defined in <asm / unistd.h>. In the file, a habit of calling good comments is generally added every 5 entries in the file. To provide convenience when calling
  3. System calls must be compiled into the kernel image (cannot be compiled into modules). Just put it in a relevant file under kernel /, such as sys.c, which contains various system calls.

6.2 Access System Calls from User Space – C Library / Linux Macro

Usually, system calls are supported by the C library. By including standard header files and linking with the C library, user programs can use system calls (or call library functions, which are actually called by library functions).
Linux itself provides a set of macros for direct access to system calls (no need to introduce C library header files), it will set up registers and call trap instructions.
These macros are _syscalln (), where n ranges from 0 to 6, representing the number of parameters that need to be passed to the system call. This is because the macro must know how many parameters are pushed into the register in what order.
eg:
The system call definition of open () is:

long open(const char *filename, int flags, int mode);

Without the support of the C library, the form of direct calling through the macro is:

#define NR_open 5		/* <asm/unistd.h>中定义的系统调用号 */
_syscall3(long, open, const char*, filename, int, flags, int, mode)  

For each macro, there are 2 + 2xn parameters.
The first parameter corresponds to the return value type of the system call.
The second parameter is the name of the system call.
The following is the type and name of each parameter arranged in the order of the system call parameters.

6.3 Why not use the system call

Linux systems try to avoid simply adding a new system call whenever a new abstraction appears.
Usually replace the system call you want to implement as follows:
implement a device node, and implement read () and write () for this. Use ioctl to operate on specific settings or retrieve specific information.

  1. Some interfaces like semaphores can be represented by file descriptors, so they can be operated as described above;
  2. Put the added information as a file in a suitable location in sysfs.
Published 91 original articles · praised 17 · 50,000+ views

Guess you like

Origin blog.csdn.net/qq_23327993/article/details/105382862