Kernel and user space interaction

 

From the URL http://www.kerneltravel.net/jiaoliu/005.htm   

The information exchange between the user program and the kernel is bidirectional, which means that it can either actively send information from user space to kernel space, or submit data from kernel space to user space. Of course, the user program can also actively extract data from the kernel. Below we will summarize and summarize the methods for the kernel and user interaction data.

   According to the initiator of information transmission, information interaction can be divided into two categories: the user transmits/extracts data to the kernel and the kernel submits a request to the user space. Let me start with: the information interaction initiated by the user-level program .

Information interaction initiated by user-level programs

A write your own system call

    System calls are the most basic way for user-level programs to access the kernel. At present, Linux roughly provides more than two hundred standard system calls (see include/asm-i386/unistd.h and arch/i386/kernel/entry.S files in the kernel code tree), and allows us to add our own system calls To achieve information exchange with the kernel. For example, we hope to establish a system call log system to record all system call actions for intrusion detection. At this point, we can write a kernel service program. The program is responsible for collecting all system call requests and recording these call information into a self-built buffer in the kernel. We cannot implement a complex intrusion detection program in the kernel, so the records in the buffer must be extracted to user space. The most straightforward method is to write a new system call to implement this function of extracting buffered data. When the kernel service program and the new system call are implemented, we can write the user program in the user space to perform intrusion detection tasks. The intrusion detection program can be scheduled, rotated, or call new system calls when needed to extract data from the kernel, and then Intrusion detection is performed.

B write driver

    A feature of Linux/UNIX is that everything is a file (every thing is a file). The system defines a concise and complete driver interface, and client programs can interact with the kernel driver in a unified way through this interface. Most system users and developers are already very familiar with this interface and the corresponding development process.

The driver program runs in the kernel space, and the user space application program interacts with it through a file in the /dev/ directory in the file system. This is the file operation process we are familiar with: open() —— read() —— write() —— ioctl() —— close(). (It should be noted that not all kernel drivers have this interface. The use of network drivers and various protocol stacks is not very consistent. For example, socket programming also has concepts such as open()close(), but it The implementation of the kernel and external usage are very different from ordinary drivers.)

This article does not care about the interrupt response, device management, data processing, and other tasks that the device driver needs to do in the kernel. We focus on the part that interacts with user-level programs. The operating system defines a unified interactive interface for this purpose, which is the aforementioned open(), read(), write(), ioctl(), close() and so on. Each driver is implemented independently according to its own needs, hiding the functions and services it provides under this unified interface. The client-level program selects the required driver or service (in fact, selects the file in the /dev/ directory), and follows the above interface and file operation process to interact with the driver in the kernel. Its practical object-oriented concept will be easier to explain, the system defines an abstract interface (abstract interface), each specific driver is an implementation of this interface (implementation).

Therefore, the driver is also one of the important ways for user space and kernel information interaction. In fact, ioctl, read, write are essentially done through system calls, but these calls have been standardized and defined by the kernel. Therefore, the user does not have to modify the kernel code and recompile the new kernel like filling in a new system call. The virtual device only needs to install the new virtual device into the kernel (insmod) through the module method to make it easy to use. Please refer to reference 5 for design details in this regard, and refer to reference 6 for programming details.

In Linux, devices can be roughly divided into: character devices, block devices, and network interfaces (character devices include those that must be accessed in a sequential manner, like a byte stream; such as character terminals, serial ports, etc.. Block devices refer to Those devices, such as hard disks, that can be accessed in a random manner and in units of a whole block of data; network interfaces refer to complex network input and output services such as network cards and protocol stacks). If our system call log system is implemented as a character driver, it will be a relaxing job. We can write a character device driver that collects and records information in the kernel. Although there is no actual corresponding physical device, there is no problem: Linux device driver is originally a software abstraction, it can be combined with hardware to provide services, it can also provide services as pure software (of course, the use of memory is unavoidable. of). In the driver, we can use open to start the service, use read() to return processed records, use ioctl() to set the record format, etc., use close() to stop the service, write() is not used, then we can not To achieve it. Then create a device file in the /dev/ directory corresponding to our newly added kernel system call log system driver.

C: Use the proc file system

    Proc is a special file system provided by Linux. Its purpose is to provide a convenient way of interaction between users and the kernel. It uses the file system as the user interface, so that the application can safely and conveniently obtain the current running state of the system and some other kernel data information by file operation.

The proc file system is mostly used for monitoring, management and debugging systems. Many of the management tools we use, such as ps, top, etc., use proc to read kernel information. In addition to reading kernel information, the proc file system also provides a write function. So we can also use it to input information to the kernel. For example, by modifying the system parameter configuration file (/proc/sys) under the proc file system, we can directly change the kernel parameters dynamically at runtime; another example, through the following instruction:

echo 1 > /proc/sys/net/ip_v4/ip_forward

By turning on the switch that controls IP forwarding in the kernel, we can enable the routing function on the running Linux system. Similarly, there are many kernel options that can be queried and adjusted directly through the proc file system.

In addition to the file entries already provided by the system, proc also leaves us with an interface that allows us to create new entries in the kernel to share information and data with user programs. For example, we can create a new file entry in the proc file system for the system call log program (whether as a driver or as a pure kernel module). In this entry, the number of times the system call is used is displayed. The frequency of use of a single system call, etc. We can also add additional entries to set logging rules, such as not recording the usage of open system calls. For details on the use of the proc file system, please refer to Reference 7.

D: Use virtual file system

Some kernel developers think that using the ioctl() system call often makes the system call unclear and difficult to control. Putting the information into the proc file system will cause confusion in the organization of the information, so excessive use is not recommended. They propose to implement an isolated virtual file system instead of ioctl() and /proc, because the file system interface is clear and easy to access from user space. At the same time, using the virtual file system makes the use of scripts to perform system management tasks more convenient and effective.

Let's take for example how to modify the kernel information through the virtual file system. We can implement a virtual file system called sagafs, where the file log corresponds to the system call log stored by the kernel. We can obtain log information through file access methods: such as

# cat /sagafs/log

The use of virtual file system-VFS to achieve information interaction makes system management more convenient and clear. However, some programmers may say that the API interface of VFS is complicated and difficult to master. Don't worry that the 2.5 kernel provides a sample program called libfs to help users who are not familiar with the file system encapsulate the common operations of implementing VFS. See reference materials for the method of using VFS to achieve interaction.

E: Use memory image

    Linux provides the ability of user programs to directly access memory through a memory mapping mechanism. Memory mapping means to map a specific part of the memory space in the kernel to the memory space of a user-level program. In other words, user space and kernel space share the same memory. The intuitive effect of this is obvious: the kernel stores any changed data in this address, and users can find and use it immediately, without data copying at all. When using system calls to interact with information, there must be a step of data copying during the entire operation-either copying the kernel data to the user buffer, or just copying the user data to the kernel buffer-this is for many data Applications with large transmission volume and high time requirements are undoubtedly a fatal blow: many applications simply cannot tolerate the time and resources consumed by data copying.

We have developed a driver for a high-speed sampling device. The device requires 16-bit real-time sampling at a repetition rate of 1KHz at a sampling rate of 20 megabytes. The amount of data that needs to be sampled, DMA, and processed every millisecond is amazing. If you want to use data copy The method simply cannot meet the requirements. At this time, memory mapping becomes the only choice: we reserve a space in the memory and configure it as a circular queue for the sampling device DMA to output data. Then map this memory space to the data processing program running in the user space, so the data just obtained by the sampling device and transmitted to the host can be processed by the program in the user space immediately.

In fact, the memory mapping method is usually used when the kernel and user space need to interact with a large amount of data quickly, especially those applications that require strong real-time performance. The virtual memory area of ​​the server of the X window system can be regarded as a typical example of memory mapping usage: X server needs to exchange a lot of data on the video memory. Compared with lseek/write, the graphics display memory is directly mapped. Going to user space can significantly improve performance.

Not all types of applications are suitable for mmap, such as character devices based on stream data such as serial ports and mice, mmap is not very useful. Moreover, this way of sharing memory has the problem of poor synchronization. Since there is no special synchronization mechanism for user programs and kernel programs to share, it is necessary to have very careful design when reading and writing data to ensure that no interference will occur.

mmap is completely based on the concept of shared memory, and because of this, it can provide additional convenience, but it is also particularly difficult to control.

Information interaction initiated by the kernel

A Call the user program from the kernel space

    Even in the kernel, we sometimes need to perform some operations provided at the user level: such as opening a file to read specific data, executing a user program to complete a function. Because many data and functions are existing or have been implemented in the user space, there is no need to spend a lot of resources to repeat. In addition, when the kernel is designed, in order to have better flexibility or performance to support unknown but possible changes, it itself requires the use of user space resources to cooperate to complete the task. For example, the part of the kernel that dynamically loads the module needs to call kmod. But when compiling kmod, it is impossible to subscribe all the kernel modules (if this is the case, dynamic loading of the module is meaningless), so it is impossible to know the location and loading method of those modules that will appear after it. Therefore, the dynamic loading of the module adopts the following strategy: the loading task is actually done with the help of the modprobe program located in the user space-in the simplest case, modprobe calls insmod with the module name passed from the kernel as a parameter. Use this method to load the required modules.

To start a user program in the kernel, you still need to use the execve system call prototype, but the call at this time occurs in the kernel space, while the general system call is performed in the user space. If the system call takes parameters, it will encounter a problem: because the legality of the parameters must be checked in the specific implementation code of the system call, the check requires that all parameters must be located in the user space-the address is between 0x0000000 and 0xC0000000, So if we pass parameters from the kernel (address greater than 0xC0000000), then the check will reject our call request. In order to solve this problem, we can use the set_fs macro to modify the checking strategy so that the parameter address is allowed to be the kernel address. In this way, the kernel can directly use the system call.

For example: set_fs(KERNEL_DS) is required before kmod executes modprobe code by calling execve:

......

set_fs(KERNEL_DS);

/* Go, go, go... */
if (execve(program_path, argv, envp) < 0)
return -errno;
上述代码中program_path 为"/sbin/modprobe",argv为{ modprobe_path, "-s", "-k", "--", (char*)module_name, NULL },envp为{ "HOME=/", "TERM=linux", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL }。

To open a file from the kernel, the open system call with parameters is also used. What is needed is to call the set_fs macro first.

B Use brk system call to export kernel data

The kernel and user space transfer data mainly through get_user (ptr) and put_user (datum, ptr) routines. So they can be found in most system calls that need to pass data. However, if we do not initiate a system call through the user program—that is, without explicitly providing the buffer location in user space—how can we pass kernel data to user space?

Obviously, we can no longer use put_user() directly, because we have no way to specify the destination buffer for it. Therefore, we have to borrow the brk system call and the current process space: brk is used to set the size of the heap space for the process. Each process has an independent heap space, and dynamic memory allocation functions such as malloc actually acquire memory from the heap space of the process. We will use brk to expand a new temporary buffer on the heap space of the current process, and then use put_user to export the kernel data to this certain user space.

Remember the process of calling the user program in the kernel just now? There, we have an operation to skip the parameter check. Now with this method, we can find another way: we expand a space on the heap of the current process, and copy the parameters used by the system call to the new one through put_user() In the user space obtained by the expansion, the newly opened space address is used as a parameter when execve is called. Therefore, the obstacle of parameter checking no longer exists.

char * program_path = "/bin/ls" ;

/* Find the current position of the top of the heap*/ 
mmm=current->mm->brk;
/* Use brk to expand a new 256-byte buffer on the top of the heap*/
ret = brk(*(void)(mmm +256));
/* Copy the parameters needed by execve to the new buffer*/
put_user((void*)2,program_path,strlen(program_path)+1);
/* Successfully execute the /bin/ls program! */ 
execve((char*)(mmm+2));
/* Restore the scene*/
tmp = brk((void*)mmm);

This method is not general (specifically, does this method have negative effects), it can only be used as a technique, but it is not difficult to find: if you are familiar with the kernel structure, you can do a lot of unexpected things!

C: Use signal:

    The purpose of the signal in the kernel is mainly to notify the user that there is a major error in the program, and forcibly kill the current process. At this time, the kernel sends a SIGKILL signal to notify the process to terminate. The kernel sends the signal using the send_sign(pid,sig) routine, you can see the signal Sending must know the incoming program number (pid) in advance, so if you want to asynchronously notify the user process to perform a certain task by signaling from the kernel, you must know the process number of the user process in advance. While searching for the process number of a specific process when the kernel is running is a laborious task, it may be necessary to traverse the entire process control block linked list. Therefore, the method of signaling a specific user process is very bad and generally not used in the kernel. The use of signals in the kernel only occurs when notifying the current process (you can easily obtain the pid from the current variable) to do certain general operations, such as termination operations. Therefore, this method is not very useful for kernel developers. There are also message operations in similar situations. It's not wordy here.

 

Summary The information interaction initiated by the user-level program, whether in the standard calling method or through the driver interface, generally uses system calls. However, there are not many cases where the kernel actively initiates information interaction. There is no standard interface, which makes the operation very inconvenient. Therefore, in general, use the first several methods described in this article as much as possible for information exchange. After all, at the root of the design, the kernel is defined as a passive service provider relative to client-level programs. Therefore, our own development should follow this design principle as much as possible.

Guess you like

Origin blog.csdn.net/u014426028/article/details/110519387