In-depth understanding of the linux kernel-inter-process communication

pipeline

Pipe (pipe) is an inter-process communication mechanism that all Unix is ​​willing to provide. A pipe is a one-way data flow between processes: all data written by one process to the pipe is directed by the kernel to another process, which can then read data from the pipe. In the Unix command shell, you can use the "1" operator to create a pipe. For example, the following statement tells the shell to create two processes and connect them together using a pipe: $ ls | more

The standard output of the first process (executing the ls program) is redirected to the pipe; the second process (executing the more program) reads input from this pipe. Note that you can get the same result by executing the following two commands:

$ ls > temp
$ more < temp

The first command redirects the output of ls to a normal file; then, the second command forces more to read input from this normal file.

Of course, it is usually more convenient to use pipes than temporary files because:
1. The shell statement is shorter and simpler.
2. There is no need to create temporary ordinary files that must also be deleted in the future.

Use pipes

The pipe is treated as an open file but has no corresponding image in the mounted file system. A new pipe can be created using the pipe() system call, which returns a pair of file descriptors; the process then passes the two descriptors to its child process through fork(), thereby sharing the pipe with the child process. A process can use the first file descriptor in the read() system call to read data from the pipe, and it can also use the second file descriptor in the write() system call to write data to the pipe.

POSIX only defines half-duplex pipes, so even if the pipe() system call returns two descriptors, each process still has to close one file descriptor before using the other.

If bidirectional data flow is required, the process must use two different pipes by calling pipe() twice. Some Unix systems, such as System VReleas 4, implement full-duplex pipes. In a full-duplex pipe, two file descriptors are allowed to be written to and read from, resulting in two bidirectional information channels.

Linux adopts another solution; each pipe's file descriptor is still one-way, but one descriptor does not need to be closed before using another descriptor. Let's review the previous example. When the shell command interprets the ls I more statement, it actually performs the following operations: The shell process:

  1. The pipe() system call is called; let's assume that pipe() returns file descriptors 3 (the pipe's read channel) and 4 (the pipe's write channel).
  2. The fork() system call is called twice.
  3. The close() system call is called twice to release file descriptors 3 and 4.

The first child process must execute the ls program, which performs the following operations:
a. Call dup2(4,1) to copy file descriptor 4 to file descriptor 1. From now on, file descriptor 1 represents the write channel of this pipe.
b. Call the close() system call twice to release file descriptors 3 and 4.
c. Call the execve() system call to execute the ls program. By default, this program writes its output to the file with file descriptor 1 (standard output), that is, into the pipe.

The second child process must execute the more program; therefore, the process performs the following operations:
1. Calls dup2(3,0) to copy file descriptor 3 to file descriptor 0. From now on, file descriptor 0 represents the read channel of the pipe.
2. Call the close() system call twice to release file descriptors 3 and 4.
3. Call the execve() system call to execute the more program. By default, this program reads input from the file with file descriptor 0 (standard input), that is, from the pipe.

In this simple example, the pipe is used entirely by two processes. However, because of the way pipes are implemented, two pipes can be used by any number of processes. Obviously, if two or more processes read and write to the same pipe,

Then these processes must use the file locking mechanism or the IPC semaphore mechanism to explicitly synchronize their access. In addition to the pipe() system call, many Unix systems provide two wrapper functions named popen() and pclose() to handle all the dirty work generated in the process of using pipes. As long as you create two pipes using the popen() function, you can use the high-level IO functions included in the C function base (fprintf(), fscanf(), etc.) to operate on this pipe. In Linux, both popen() and pclose() are included in the C function library. The popen() function receives two parameters: the path name of the executable file filename and the string type that defines the data transfer direction. This function returns a pointer to the FILE data structure.

The popen() function actually does the following:
1. Creates a new pipe using the pipe() system call.
2. Create a new process, which performs the following operations:
a. If type is r, copy the file descriptor related to the write channel of the pipe to file descriptor 1 (standard output); if type is w, then Copies the file descriptor associated with the pipe's read channel to file descriptor 0 (standard input).
b. Close the file descriptor returned by pipe().
c. Call the execve() system call to execute the program specified by filename.
3. If type is r, close the file descriptor related to the write channel of the pipe; if type is w, close the file descriptor related to the read channel of the pipe.
4. Return the address pointed to by the FILE file pointer, which points to any file descriptor involved in the still open pipe. After the popen() function is called, the parent process and the child process can exchange information through the pipe:

1, 2, 3, and 4 refer to the parent process.
a, b, c refer to child processes.

The parent process can use the FILE pointer returned by this function to read (if type is r) and write (if type is w) data. The programs executed by the child process write data to standard output or read data from standard input. The pclose() function receives the file pointer returned by popen() as a parameter. It simply calls the wait4() system call and waits for the process created by popen() to end.

Pipeline data structure

We are now again considering the problem at the system call level. As soon as the pipe is created, the process can access the pipe using the two VFS system calls read() and write(). Therefore, for each pipe, the kernel creates an inode object and two file objects, one for reading and one for writing.

When a process wishes to read data from or write data to a pipe, it must use the appropriate file descriptor. When the inode refers to a pipe, its i_pipe field points to a pipe_inode_info structure as shown in Table 19-1.
Insert image description here
Insert image description here
In addition to an inode object and two file objects, each pipe has its own pipe buffer. In fact, it is a single page that contains the data that has been written to the pipe and is waiting to be read.

Prior to Linux 2.6.10, there was one pipe buffer per pipe. In the 2.6.11 kernel, the data buffers of pipes (and FIFOs) have been greatly changed. Each pipe can use 16 pipe buffers. This change greatly enhances the performance of user-mode applications that write large amounts of data to the pipe.

The bufs field of the pipe_inode_info data structure stores an array of 16 pipe_buffer objects, each object representing a pipe buffer. The fields of this object are shown in Table 19-2.
Insert image description here
The ops field points to the pipe buffer method table anon_pipe_buf_ops, which is a data structure of type pipe_buf_operations. Actually, it has three methods:

map
	在访问缓冲区数据之前调用。它只在管道缓冲区在高端内存时对管道缓冲区页框调用kmap()。
unmap
	不再访问缓冲区数据时调用。它对管道缓冲区页框调用kunmap()。
release
	当释放管道缓冲区时调用。
	该方法实现了一个单页内存高速缓存:
	释放的不是存放缓冲区的那个页框,
	而是由pipe_inode_info数据结构(如果不是NULL)的tmp_page字段指向的高速缓存页框。
	存放缓冲区的页框变成新的高速缓存页框。

The 16 buffers can be thought of as an overall ring buffer: the writing process keeps appending data to this large buffer, while the reading process keeps moving data out.

The number of bytes currently written and waiting to be read in all pipe buffers is the so-called pipe size. To improve efficiency, the data that still needs to be read can be spread across several unfilled pipe buffers: in fact, each write operation may copy the data when the previous pipe buffer does not have enough space to store the new data. to a new empty pipe buffer. Therefore, the kernel must record:
1. The corresponding offset in the pipe buffer and page frame where the next byte to be read is located. The index of the pipe buffer is stored in the curbuf field of the pipe_inode_info data structure, and the offset is in the offset field of the corresponding pipe_buffer object.
2. The first empty pipe buffer. It can be obtained by increasing the index of the current pipe buffer (modulo 16), and is stored in the curbuf field of the pipe_inode_info data structure, and the pipe buffer number that stores valid data is stored in the nrbufs field. To avoid race conditions on pipe data structures, the kernel uses the i_sem semaphore contained in the inode object.

pipefs special file system

Pipes are implemented as a set of VFS objects, so there is no corresponding disk image. In Linux 2.6, these VFS objects are organized into pipefs special file system to speed up their processing. Because this file system has no mount point in the system tree, users cannot see it at all.

However, with pipefs, pipes are fully integrated into the VFS layer, and the kernel can handle them as named pipes or FIFOs, which exist as files recognized by end users. The init_pipe_fs() function (usually executed during kernel initialization) registers the pipefs file system and mounts it:

struct file_system_type pipe_fs_type;
pipe_fs_type.name ="pipefs";
pipe_fs_type.get_sb = pipefs_get_sb;
pipe_fs.kill_sb = kill_anon_super;
register_filesystem(&pipe_fs_type);
pipe_mnt = do_kern_mount("pipefs", 0, "pipefs", NULL);

The mounted file system object representing the pipefs root directory is stored in the pipe_mnt variable.

Create and tear down pipelines

The pipe() system call is handled by the sys_pipe() function, which in turn calls the do_pipe() function. To create a new pipe, the do_pipe() function performs the following operations:

  1. Call the get_pipe_inode() function, which allocates an inode object for the pipe in the pipefs file system and initializes it. Specifically, this function performs the following operations:
    a. Allocates a new index node in the pipefs file system.
    b. Allocate the pipe_inode_info data structure and store its address in the i_pipe field of the index node.
    c. Set the curbuf and nrbufs fields of pipe_inode_info to 0, and clear all fields of the pipe buffer object in the bufs array to 0.
    d. Initialize the r_counter and w_counter fields of the pipe_inode_info structure to 1.
    e. Initialize the readers and writers fields of the pipe_inode_info structure to 1.
  2. Allocate a file object and a file descriptor to the read channel of the pipe, set the f_flag field of the file object to O_RDONLY, and initialize the f_op field to the address of the read pipe_fops table.
  3. Allocate a file object and a file descriptor to the write channel of the pipe, set the flag field of the file object to O_WRONLY, and initialize the f_op field to the address of the write_pipe_fops table.
  4. Allocate a directory entry object and use it to join the two file objects and the inode object together;
    then, insert the new inode into the pipefs special file system.
  5. Return the two file descriptors to the user mode process. The process that issues a pipe() system call is initially the only process with read and write access to the new pipe. In order to indicate that the pipe actually has both a reading process and a writing process, the readers and writers fields of the pipe_inode_info data structure must be initialized to 1. Generally, each of these two fields should be set to 1 as long as the file object of the corresponding pipe is still open by a process; if the corresponding file object has been released, then this field is set to 0. Because no process will access this pipe anymore. Creating a new process does not increment the readers and writers fields, so they never exceed 1. However, all file objects still used by the parent process will have their reference counters incremented. Therefore, even if the object is not released when the parent process dies, the pipe will remain open for use by the child process. Whenever a process calls the close() system call on a file descriptor associated with a pipe, the kernel executes the fput() function on the corresponding file object, which decrements its reference counter. If the counter becomes 0, then the function calls the release method of the file operation.

Depending on whether the file is associated with a read channel or a write channel, the release method is implemented by either pipe_read_release() or pipe_write_release() function. Both functions call pipe_release(), which sets the readers field or writers field of the pipe_inode_info structure to 0.

pipe_release() also checks whether readers and writers are both equal to 0. If so, call the release method of all pipe buffers to release all pipe buffer page frames to the buddy system; in addition, the function also releases the cache page frame pointed to by the tmp_page field. Otherwise, the readers or writers field is not 0, and the function wakes up any processes sleeping on the pipe's wait queue so that they can recognize changes in the pipe's status.

Read data from pipe

A process that wishes to read data from a pipe issues a read() system call, specifying a file descriptor for the read end of the pipe. The kernel eventually calls the read method found in the file operation table associated with this file descriptor. In the case of pipes, the read method's entry in the read_pipe_fops table points to the pipe_read() function. pipe_read() is quite complex because the POSIX standard defines some requirements for pipe read operations. Table 19-3 summarizes the expected behavior of the read() system call that reads n bytes from a pipe with a pipe size (the number of bytes to be read in the pipe buffer) of p.
Insert image description here

This system call may block the current process in two ways:
1. The pipe buffer is empty when the system call starts.
2. The pipe buffer did not contain all requested bytes, and the writing process was put to sleep while waiting for space in the buffer.

Note that read operations can be non-blocking. In this case, the read operation is completed as soon as all available bytes (even 0) are copied into the user address space (Note 3). Also note that the read() system call will return 0 only if the pipe is empty and no process is currently using the file object associated with the pipe's write channel.

The pipe_read() function performs the following operations:

  1. Get the i_sem semaphore of the index node.
  2. Determine whether the pipe size stored in the nrbufs field of the pipe_inode_info structure is 0. If so, all pipe buffers are empty. At this point, you also need to determine whether the function must return or whether the process must be blocked while waiting until some other process writes some data to the pipe (see Table 19-3). The type of I/O operation (blocking or non-blocking) is indicated by the O_NONBLOCK flag in the f_flags field of the file object. If the current process must be blocked, the function performs the following operations:
    a. Call prepare_to_wait() to add current to the waiting queue of the pipe (wait field of the pipe_inode_info structure).
    b. Release the semaphore of the index node.
    c. Call schedule().

d. Once current is awakened, call finish_wait() to remove it from the waiting queue, obtain the i_sem index node semaphore again, and then jump back to step 2.
3. Get the current pipe buffer index from the curbuf field of the pipe_inode_info data structure.
4. Execute the map method of the pipe buffer.
5. Copy the requested number of bytes (if smaller, the number of bytes available in the pipe buffer) from the pipe buffer to the user address space.
6. Execute the unmap method of the pipe buffer.
7. Update the offset and len fields of the corresponding pipe_buffer object.
8. If the pipe buffer is empty (the len field of the pipe_buffer object is now equal to 0), call the release method of the pipe buffer to release the corresponding page frame, set the ops field of the pipe_buffer object to NULL, and add the curbuf in the pipe_inode_info data structure The current pipe buffer index stored in the field, and decrements the value of the non-empty pipe buffer counter in the nrbufs field.

  1. If all requested bytes have been copied, jump to step 12.
    10. Currently, all requested bytes have not been copied to the user-mode address space. If the pipe size is greater than 0 (the nrbufs field of pipe_inode_info is not NULL), skip to step 3.
    11. There are no bytes left in the pipe buffer. If at least one writing process is sleeping (that is, the waiting_writers field of the pipe_inode_info data structure is greater than 0), and the read operation is blocked, then call wake_up_interruptible_sync() to wake up all sleeping processes in the pipe waiting queue, and then jump to step 2.
    12. Release the i_sem semaphore of the index node.
  2. Call the wake_up_interruptible_sync() function to wake up all sleeping writer processes in the pipeline's waiting queue.
    14. Returns the number of bytes copied to the user address space.

Write data to the pipe

A process that wishes to write data to a pipe issues a write() system call, specifying a file descriptor for the write end of the pipe. The kernel fulfills this request by calling the write method of the appropriate file object; the corresponding entry in the write_pipe_fops table points to the pipe_write() function.

Table 19-4 summarizes the behavior of the write() system call, as defined by the POSIX standard, which requests the writing of n bytes to a pipe that has u unused bytes in its buffer. byte.

Specifically, the standard requires that write operations involving small numbers of bytes must be performed atomically. More precisely, if two or more processes are writing to a pipe concurrently, then any write operation less than 4096 bytes (the size of the pipe buffer) must be completed independently and not with a single process. Write operations to a pipe are interleaved. However, writes exceeding 4096 bytes are split and can also force the calling process to sleep.
Insert image description here
Also, if there is no reader process for the pipe (that is, if the value of the readers field of the pipe's inode object is 0), then any write operation performed on the pipe will fail. In this case, the kernel will send a SIGPIPE signal to the writing process and stop the write() system call, causing it to return an -EPIPE error code. This error code represents the familiar "Broken pipe (damaged pipe)" information.

The pipe_write() function performs the following operations:

  1. Get the i_sem semaphore of the index node.
  2. Checks whether the pipe has at least one reading process. If not, send a SIGPIPE signal to the current process, release the index node semaphore and return the -EPIPE value.
  3. Add the curbuf and nrbufs fields of the pipe_inode_info data structure and subtract one to get the last written pipe buffer index. If the pipe buffer has enough space to store the bytes to be written, copy the data in:
    a. Execute the map method of the pipe buffer.
    b. Copy all bytes to the pipe buffer.
    c. Execute the unmap method of the pipe buffer.
    d. Update the len field of the corresponding pipe_buffer object.
    e. Skip to step 11.
  4. If the nrbufs field of the pipe_inode_info data structure is equal to 16, it indicates that there is no free pipe buffer to store the bytes to be written. In this case:
    a. If the write operation is non-blocking, jump to step 11, end and return error code -EAGAIN
    b. If the write operation is blocking, add 1 to the waiting_writers field of the pipe_inode_info structure, and call prepare_to_wait() to The current operation is added to the pipeline waiting queue (wait field of the pipe_inode_info structure), releases the index node semaphore, and calls schedule(). Once awakened, finish_wait() is called to remove the current operation from the waiting queue, reacquire the index node semaphore, decrement the waiting_writers field, and
    then jump back to step 4.
  5. Now that there is at least one empty buffer, add the curbuf and nrbufs fields of the pipe_inode_info data structure to get the first empty pipe buffer index.
  6. Unless the tmp_page field of the pipe_inode_info data structure is not NULL, a new page frame is allocated from the partner system.
  7. Copy up to 4096 bytes from user-mode address space to the page frame (temporarily mapped in kernel-mode linear address space if necessary).
  8. Update the fields of the pipe_buffer object associated with the pipe buffer: set the page field to the address of the page frame descriptor, the ops field to the address of the anon_pipe_buf_ops table, the offset field to 0, and the len field to the number of bytes written.
  9. Increase the value of the non-empty pipe buffer counter, which is stored in the nrbufs field of the pipe_inode_inf structure.
    10. If all requested bytes have not been written, skip to step 4.
    11. Release the index node semaphore.
    12. Wake up all reading processes sleeping on the pipe waiting queue.
    13. Returns the number of bytes written to the pipe buffer (if it cannot be written, an error code is returned).

FIFO

Although pipes are a very simple, flexible, and effective communication mechanism, they have one major drawback, which is that they cannot open existing pipes. This makes it impossible for any two processes to share the same pipe unless the pipe was created by a common ancestor process. This shortcoming exists in many applications.

For example, consider a database engine server that continuously polls the client process that issued the query request and returns the results of the database query to the client process.

Each interaction between the server and a given client can be handled using a pipeline. However, when the user explicitly queries the database, the client process is usually created as needed by the shell command; therefore, the server process and the client process cannot easily share the pipe.

In order to break through this limitation, the Unix system introduces a method called named pipe (named pipe) or FIFO [FIFO stands for "first in, first out"; the first byte written to the file is always the last byte written to the file. Read first] special file type.

FIFOs are very similar to pipes in these respects: no disk blocks are owned by the file system, and an open FIFO is always associated with a kernel buffer that temporarily stores exchanges between two or more processes. The data. However, having a disk inode makes the FIFO accessible to any process because the FIFO file name is contained in the system's directory tree.

Therefore, in the previous database example, communication between the server and client could easily use FIFOs instead of pipes. The server creates a FIFO at startup, which is used by client programs to make their own requests. Each client program creates an additional FIFO before establishing a connection and includes the name of this FIFO in its initial request to the server. The server program can then write the query results into this FIFO. In Linux 2.6, FIFOs and pipes are almost identical and use the same pipe_inode_info structure.

In fact, the read and write operations of FIFO are implemented by the pipe_read() and pipe_write() functions described in the previous sections "Reading data from the pipe" and "Writing data to the pipe". In fact, there are only two major differences:
1. FIFO index nodes appear in the system directory tree rather than in the pipefs special file system.
2. A FIFO is a bidirectional communication pipe;
that is, a FIFO may be opened in read/write mode. Therefore, to complete our description, we only explain how to create and open a FIFO.

Create and open FIFO

The process creates a FIFO by executing the mknod() system call. The parameters passed are the path name of the new FIFO and the logical OR result of S_IFIFO (0x10000) and the permission bit mask of the new file. POSIX introduces a system call called mkfifo() specifically to create FIFO. This system call is implemented in Linux and System VRelease 4 as a C library function calling mknod().

Once a FIFO is created, you can use ordinary open(), read(), write(), and close() system calls to access the FIFO. However, VFS's handling of FIFO is special because the index nodes and file operations of FIFO are Dedicated and independent of the file system in which the FIFO resides. The POSIX standard defines the operation of the FIFO by the open() system call; this operation is inherently related to the type of access requested, the type of I/O operation (blocking or non-blocking), and the presence of other processes accessing the FIFO.

A process can open a FIFO for read operations, write operations, or read and write operations. According to these three situations, the file operations related to the corresponding file objects are set to specific methods.

When a process opens a FIFO, VFS performs some of the same operations as a device file. The inode object associated with the open FIFO is initialized by the file system-dependent read_inode superblock method.

This method always checks whether the inode on disk represents a special file and calls the init_special_inode() function if necessary. This function in turn sets the i_fop field of the index node object to the address of the def_fifo_fops table.

Subsequently, the kernel sets the file operation table of the file object to def_fifo_fops and executes its open method, which is implemented by fifo_open(). The fifo_open() function initializes a data structure dedicated to FIFO; specifically, it performs the following operations:

  1. Get the i_sem index node semaphore.
  2. Check the i_pipe field of the inode object; if it is NULL, allocate and initialize a new pipe_inode_info structure, which is the same as steps 1b-le of the "Creating and Destroying Pipes" section earlier in this chapter.
  3. According to the access mode specified in the parameters of the open() system call, the f_op field of the file object is initialized with the address of the appropriate file operation table (as shown in Table I9-5).
    Insert image description here
  4. If the access mode is either read-only or read/write, add 1 to the readers field and r_counter field of the pipe_inode_info structure. Additionally, if the access mode is read-only and there are no other reading processes, any writing processes waiting on the queue are awakened.
  5. If the access mode is either write-only or read/write, add 1 to the writers field and w_counter field of the pipe_inode_info structure. Additionally, if the access mode is write-only and there are no other writing processes, any reading processes waiting on the queue are awakened.
  6. If there is no reading process or no writing process, determine whether the function should block or return an error code and terminate (as shown in Table 19-6).
    Insert image description here
  7. Release the index node semaphore and terminate, returning 0 (success).
  8. The main difference between the three dedicated file operation tables of FIFO is the implementation of the read and write methods. If the access type allows read operations, the read method is implemented using the pipe_read() function; otherwise, the read method is implemented using the bad pipe_r() function, which simply returns an error code. If the access type allows writing, the write method is implemented using the pipe_write() function; otherwise, the write method is implemented using the bad pipe_w() function, which simply returns an error code.

System V IPC

IPC is the abbreviation of Interprocess Communication, which usually refers to a set of mechanisms that allow user-mode processes to perform the following operations: 1. Synchronize
with other processes through semaphores
2. Send messages to or receive messages from other processes and A section of memory is shared by other processes. System VIPC was first introduced in a development Unix variant called "Columbus Unix" and later adopted in AT&T's System II. It is now found on most Unix systems (including Linux).

The IPC data structure is dynamically created when a process requests an IPC resource (semaphore, message queue, or shared memory area). Every IPC resource is persistent: it resides in memory forever (until the system is shut down) unless explicitly released by a process. IPC resources can be used by any process, including those that do not share resources created by ancestor processes.

Since a process may require multiple IPC resources of the same type, each new resource is identified using a 32-bit IPC keyword, which is similar to the file path name in the system's directory tree. Each IPC resource has a 32-bit IPC identifier, which is somewhat similar to the file descriptor associated with the open file. IPC identifiers are assigned to IPC resources by the kernel and are unique within the system, while IPC keywords can be freely selected by the programmer.

When two or more processes want to communicate through an IPC resource, these processes must reference the IPC identifier of the resource.

Use IPC resources

Depending on whether the new resource is a semaphore, message queue or shared memory area, the semget(), msgget() or shmget() function is called respectively to create an IPC resource.

The main purpose of these three functions is to derive the corresponding IPC identifier from the IPC keyword (passed as the first parameter), and the process can later use this identifier to access the resource.

If there is no IPC resource associated with the IPC keyword, create a new resource. If all goes well, the function returns a positive IPC identifier; otherwise, it returns an error code as shown in Table 19-7.
Insert image description here
Suppose two independent processes want to share a common IPC resource. This can be achieved using two methods:
1. The two processes uniformly use fixed, predefined IPC keywords. This is the simplest case and works well for any complex application implemented by many processes. However, another unrelated program may also use the same IPC keyword. In this case, the IPC function may be called successfully but return the IPC identifier of the wrong resource (Note 5).

2. A process calls the semget(), msgget(), or shmget() function by specifying IPC_PRIVATE as its own IPC keyword. A new IPC resource is thus allocated, and the process can either share its IPC identifier with another process in the application (Note 6), or create another process of its own. This approach ensures that IPC resources are not accidentally used by other applications.

The last parameter of the semget(), msgget(), and shnget() functions can include three flags.
IPC_CREAT indicates that if the IPC resource does not exist, it must be created; IPC_EXCL indicates that if the resource already exists and the IPC_CREAT flag is set, the function must fail; IPC_NOWAIT indicates that the process never blocks when accessing the IPC resource (typically, such as getting a message or getting signal).

Even if a process uses the IPC_CREAT and IPC_EXCL flags, there is no way to guarantee exclusive access to an IPC resource because other processes may also reference the resource with their own IPC identifiers.

To minimize the risk of incorrectly referencing the wrong resource, the kernel does not reuse an IPC identifier as soon as it becomes free. In contrast, the IPC identifier assigned to a resource is always greater than the identifier assigned to the previous resource of the same type (the only exception occurs when the 32-bit IPC identifier overflows).

Each IPC identifier is determined by using a combination of the slot usage sequence number associated with the resource type, the arbitrary slot index of the allocated resource, and the maximum value chosen by the kernel for the allocable resource. And calculated.

If we use s to represent the location usage sequence number, M to represent the maximum number of allocable resources, and i to represent the location index, where 0 ≤ i < M, then the ID of each IPC resource can be calculated according to the following formula: IPC identifier = s × M + i. In Linux 2.6, the value of M is set to 32768 (IPCMNI macro). The location usage sequence number s is initialized to 0 and increments by 1 each time a resource is allocated. When s reaches a predetermined threshold (which depends on the IPC resource type), it restarts from 0.

Each type of IPC resource (semaphore, message queue, and shared memory area) has an ipc_ids data structure. The fields included in this structure are shown in Table 19-8.
Insert image description here
The ipc_id_ary data structure has two fields: p and size. The p field is an array of pointers to kern_ipc_perm data structures, each structure corresponding to an allocable resource. The size field is the size of this array.

Initially, the array stores 1, 16 or 128 pointers for the shared memory area, message queue and semaphore respectively. When too small, the kernel dynamically increases the array. But every resource has a limit. System administrators can modify the files /proc/sys/kernel/sem, /proc/sys/kernel/msgmni, and /proc/sys/kernel/shmmni to change these upper limits. Each kern_ipc_perm data structure is associated with an IPC resource and contains the fields shown in Table 19-9. uid, gid, cuid and cgid respectively store the user identifier and group identifier of the creator of the resource and the user identifier and group identifier of the current resource owner. The mode bitmask includes six flags, which store the owner, group, and read and write access permissions of other users of the resource.

The IPC access permissions are similar to the file access permissions introduced in the "Access Permissions and File Mode" section of Chapter 1. The only difference is that there is no execution permission flag.
Insert image description here
Insert image description here
The kern_ipc_perm data structure also includes a key field and a seq field. The former refers to the IPC keyword of the corresponding resource, and the latter stores the position sequence number used to calculate the IPC identifier of the resource.

The semctl(), msgctl(), and shmctl() functions can all be used to handle IPC resources. The IPC_SET subcommand allows a process to change the owner's user identifier and group identifier as well as the permission bit mask in the ipc_perm data structure. The IPC_STAT and IPC_INFO subcommands obtain information related to resources. Finally, the IPC_RMID subcommand releases the IPC resources.

Depending on the type of IPC resource, other dedicated subcommands can also be used. Once an IPC resource is created, the process can operate on this resource through some dedicated functions. A process can execute the semop() function to obtain or release an IPC semaphore. When a process wishes to send or receive an IPC message, it uses the msgsnd() and msgrcv() functions respectively. Finally, the process can use the shmat() and shmdt() functions respectively to attach a shared memory area to its own address space or to cancel this attachment relationship.

ipc() system call

All IPC functions must be implemented through appropriate Linux system calls. In fact, in the 80×86 architecture, there is only one IPC system call called ipc(). When a process calls an IPC function, say msgget(), the function actually calls a wrapper function in the C library, which in turn passes all the parameters of msgget() plus an appropriate subcommand code (in this case is MSGGET) to call the ipc() system call. The sys_ipc() service routine examines the subcommand code and calls a kernel function to implement the requested service. The ipc() "multiplexing" system call is inherited from earlier Linux versions, which included IPC code in dynamic modules (see Appendix 2). It doesn't make sense to keep several system call entries in the system_call table for kernel parts that may not be implemented, so the kernel designers adopted this multiplexing approach. Now, System V IPC is no longer compiled as a dynamic module, so there is no reason to use a single IPC system call. In fact, Linux provides a system call for each IPC function on HP's Alpha architecture and Intel's IA-64.

IPC semaphore

IPC semaphores are very similar to the kernel semaphores introduced in Chapter 5: Both are counters used to provide controlled access to data structures shared by multiple processes.

If the protected resource is available, the value of the semaphore is positive; if the protected resource is currently unavailable, the value of the semaphore is 0. The process accessing the resource attempts to decrement the semaphore's value by one, but the kernel blocks the process until an operation on the semaphore results in a positive value.

When the process releases the protected resource, the value of the semaphore is increased by 1; during this process, all other processes that are waiting for this semaphore are awakened.

In fact, the processing of IPC semaphores is more complicated than kernel semaphores for two main reasons:
1. Each IPC semaphore is a collection of one or more semaphore values, unlike the kernel semaphore, which has only one value. This means that the same IPC resource can protect multiple independent, shared data structures. In the process of resources being allocated, the number of semaphores in each IPC semaphore must be specified as a parameter of the semget() function.

From now on, we will refer to the counter inside the semaphore as a primitive semaphore. There are limits to the number of IPC semaphore resources and the number of original semaphores in a single IPC resource. The default values ​​are 128 for the former and 250 for the latter;

However, system administrators can easily modify these two boundaries through the /proc/sys/kernel/sem file.

2. System V IPC semaphore provides a fail-safe mechanism, which is used when the process dies if it cannot cancel the operation previously performed on the semaphore. When a process chooses to use this mechanism, the resulting operations are so-called undoable semaphore operations. When a process dies, all IPC semaphores can be restored to their original values, as if their operation had never been started. This helps prevent a situation where other processes using the same semaphore are stuck in a blocked state indefinitely because the dying process cannot manually cancel its semaphore operation. First, let's briefly describe the typical steps performed when a process wants to access one or more resources protected by an IPC semaphore:

  1. Call the semget() wrapper function to obtain the IPC semaphore identifier, and specify the IPC keyword of the IPC semaphore that protects shared resources as a parameter. If the process wishes to create a new IPC semaphore, it must also specify the IPC_CREATE or IPC_PRIVATE flag and the required original semaphore.
  2. Call the semop() wrapper function to test and decrement the values ​​involved in all original semaphores. If all tests succeed, the decrement operation is performed, ending the function and allowing the process to access the protected resource.

If some semaphores are in use, the process is usually suspended until some other process releases the resource. The parameters received by the function are the IPC semaphore identifier, a set of integers used to specify the atomic operations performed on the original semaphore, and the number of such operations. As an option, a process can also specify the SEM_UNDO flag, which tells the kernel to undo those operations if the process exits without releasing the original semaphore.
4. When the protected resource is released, the semop() function is called again to atomically increment all relevant original semaphores.
5. As an option, call the semctl() wrapper function and specify the IPC_RMID command in the parameters to delete the IPC semaphore from the system.

Now we can discuss how the kernel implements IPC semaphores. The relevant data structure is shown in Figure 19-1. The sem_ids variable stores the ipc_ids data structure of the IPC semaphore resource type; the corresponding ipc_id_ary data structure contains an array of pointers, which points to the sem_array data structure, and each element corresponds to an IPC semaphore resource.
Insert image description here
Formally, this array holds pointers to kern_ipc_perm data structures, but each structure is nothing more than the first field of the sem_array data structure. All fields of the sem_array data structure are shown in Table 19-10.
Insert image description here
The sem_base field points to an array of sem data structures, each element corresponding to an IPC raw semaphore. The sem data structure only includes two fields:

semval
	信号量的计数器的值。
sempid
	最后一个访问信号量的进程的PID。进程可以使用semctl()封装函数查询该值

Cancelable semaphore operations

If a process suddenly abandons execution, it cannot cancel the operations it has started (for example, releasing its own reserved semaphore); therefore by defining these operations as cancelable, the process can let the kernel return the semaphore to the consistent state. status and allow other processes to continue executing. A process can request a cancelable operation by specifying the SEM_UNDO flag in the semop() function.

To help the kernel undo undoable operations performed by a given process on a given IPC semaphore resource, relevant information is stored in the sem_undo data structure. This structure actually contains the semaphore's IPC identifier and an integer array that represents the modifications to the original semaphore value caused by all cancelable operations performed by the process.

There is a simple example of how to use this sem_undo element. Consider a process using an IPC semaphore resource with four primitive semaphores, and assume that the process calls the semop() function to increment the first counter by 1 and decrement the second counter by 2. If the SEM_UNDO flag is specified for this function, the integer value in the first array element in the sem_undo data structure is decremented by 1, and the second element is incremented by 2, leaving the other two integers unchanged.

Further cancelable operations performed by the same process on this IPC semaphore will correspondingly change the integer value stored in the sem_undo structure. When the process exits, any non-zero value in this array represents one or more erroneous operations on the corresponding original semaphore;

The kernel simply increments the corresponding raw semaphore counter by this non-zero value to cancel these operations. In other words, the modifications made by the abnormally interrupted process are rolled back, while the modifications made by other processes can still reflect the status of the semaphore.

For each process, the kernel keeps track of all semaphore resources handled as cancelable operations so that these operations can be rolled back if the process exits unexpectedly. In addition, the kernel must also record all of its sem_undo structures for each semaphore,
so that whenever a process uses semctl() to force an explicit value to a raw semaphore counter or to deactivate an IPC semaphore resource, the kernel You can quickly access these structures.

It is precisely because of the two linked lists (which we call the per-process linked list and the per-semaphore linked list) that the kernel can handle these tasks efficiently. The first linked list records all semaphores handled by a given process with cancelable operations. The second linked list records all processes that operate on a given semaphore with a cancelable operation. More precisely:
1. Each process linked list contains all sem_undo data structures, which correspond to IPC semaphores for which the process has performed a cancelable operation. The sysvsem.undo_list field of the process descriptor points to a sem_undo_list type data structure, which in turn contains a pointer pointing to the first element of the linked list.

The proc_next field of each sem_undo data structure points to the next element of the linked list. Because they all share a sem_undo_list descriptor, the CLONE_SYSVSEM flag is passed to the clone() system call and the cloned processes all share the same cancelable semaphore operation linked list.

2. All sem_undo data structures contained in each semaphore linked list correspond to the process that performs cancelable operations on the semaphore. The undo field of the sem_array data structure points to the first element of the linked list, and the id_next field of each sem_undo data structure points to the next element of the linked list.

Each process's linked list is only used when the process terminates. The exit_sem() function is called by do_exit(), which traverses the linked list and smooths out the effects of chaotic operations for each IPC semaphore involved in the process.

In contrast, each semaphore's linked list is used when the process calls the semctl() function to force an explicit value to a raw semaphore. The kernel sets the corresponding elements of the arrays in all sem_undo data structures pointing to IPC semaphore resources to 0 because undoing a cancelable operation on the original semaphore no longer makes sense. Additionally, each semaphore list is used when the IPC semaphore is cleared. All related sem_undo data structures are invalidated by setting the semid field to -1 (Note 8).

Queue of pending requests

The kernel assigns a pending request queue to each IPC semaphore to identify the process that is waiting for one (or more) semaphores in the array.
This queue is a doubly linked list of the sem_queue data structure, and its fields are shown in Table 19-11. The first and last pending requests in the queue are pointed to by the sem pending and sem_pending_last fields in the sem_array structure respectively.
This last field allows simple processing of the linked list as a FIFO. New pending requests are appended to the end of the list so they can be serviced later.
The most important fields of the suspend request are nsops and sops. The former stores the number of original semaphores involved in the suspend operation, and the latter points to an integer array describing each semaphore operation. The sleeper field stores the descriptor address of the sleep process that issued the requested operation.
Insert image description here

Figure 19-1 shows an IPC semaphore with three pending requests. The second and third requests involve cancelable operations, so the undo field of the sem_queue data structure points to the corresponding sem_undo structure; the undo field of the first pending request is NULL because the corresponding operation is non-cancelable.

IPC message

Processes can communicate with each other through IPC messages. Each message generated by a process is sent to an IPC message queue, where the message remains in the queue until another process reads it.
A message consists of a fixed-size header and a variable-length body. An integer value (message type) can be used to identify the message, which allows the process to selectively obtain messages from the message queue (Note 9). Whenever a process reads a message from the IPC message queue, the kernel deletes the message; therefore, only one process can receive a given message.

In order to send a message, the process calls the msgsnd() function, passing it the following parameters:
1. The IPC identifier of the target message queue
2. The size of the message body
3. The address of the user-mode buffer, which contains the message type, followed by the text of the message

To obtain a message, the process must call the msgrcv() function and pass it the following parameters: a. The
IPC identifier of the IPC message queue resource
b. The pointer to the user-mode buffer. The message type and message body should be copied to this buffer. Area
c. The size of the buffer
4. A value t specifying what message should be obtained

If the value of t is 0, the first message in the queue is returned. If t is a positive number, return the first message in the queue with type equal to t. Finally, if 1 is a negative number, the smallest first message whose message type is less than or equal to the absolute value of t is returned. In order to avoid resource exhaustion, IPC message queue resources are limited in these aspects:

The number of IPC message queues (default is 16), the size of each message (default is 8192 bytes) and the size of all information in the queue (default is 16384 bytes). However, similar to the previous one, the system administrator can modify the /proc/sys/kernel/msgmni, /proc/sys/kernel/msgmnb and /proc/sys/kernel/msgmax files to adjust these values. The data structure related to the IPC message queue is shown in Figure 19-2. The msg_ids variable stores the ipc_ids data structure of the IPC message queue resource type; the corresponding ipc_id_ary data structure contains an array of pointers to the shmid_kernel data structure – one element for each IPC message resource.

Formally, the array stores pointers to the kern_ipc_perm data structure, but each such structure is nothing more than the first field of the msg_queue data structure. All fields of the msg_queue data structure are shown in Table 19-12.
Insert image description here

Insert image description here
The most important field is qmessages, which represents the head (that is, the first dummy element) of the two-way circular linked list containing all messages currently in the queue. Each message is stored separately in one or more dynamically allocated pages.
The beginning of the first page stores the message header, which is a msg_msg type data structure; its fields are shown in Table 19-13. The m_list field points to the previous and next messages in the queue. The text of the message starts just after the msg_msg descriptor; if the message (the size of the page minus the size of the msg_msg descriptor) is larger than 4072 bytes, it continues to be placed on another page, and its address is stored in the next field of the msg_msg descriptor. .

The second page frame starts with a descriptor of type msg_msgseg. This descriptor contains only a next pointer, which stores the optional third page, and so on.
Insert image description hereWhen the message queue is full (either the maximum number of messages is reached, or the maximum number of bytes in the queue is reached), the process trying to enqueue new messages may be blocked.
The q_senders field of the msg_queue data structure is the head of the linked list formed by the descriptors of all blocked sending processes. When the message queue is empty (or when a message type specified by the process is not in the queue), the receiving process will also be blocked.

The qreceivers field of the msg_queue data structure is the head of the linked list of the msg_receiver data structure, and each blocked receiving process corresponds to one of the elements. Each of these structures essentially contains. A pointer to the process description, a pointer to the msg_msg structure of the message, and the requested message type.

IPC shared memory

The most useful IPC mechanism is shared memory, which allows two or more processes to access common data structures by placing them in an IPC shared memory region.

If a process wants to access this data structure stored in a shared memory area, it must add a new memory area to its own address space, which will map the page frame related to this shared memory area.

Such page frames can be easily handled by the kernel via request paging. Like semaphores and message queues, call the shmget() function to obtain the IPC identifier of a shared memory area. If the shared memory area does not exist, create it.

Call the shmat() function to "attach" a shared memory area to a process. This function takes the identifier of the IPC shared memory resource as a parameter and attempts to add a shared memory area to the address space of the calling process.

The calling process can obtain the starting linear address of this memory area, but this address is usually not important. Each process accessing this shared memory area can use a different address in its own address space.

The shmat() function does not modify the process's page table. We'll see later how the kernel handles this when a process attempts to access a page that belongs to a new memory region. Call the shmdt() function to "detach" the shared memory area specified by the IPC identifier, that is, delete the corresponding shared memory area from the process's address space.

Recall that IPC shared memory resources are persistent; even if no process is currently using it, the corresponding page cannot be discarded, but it can be swapped out. Like other types of IPC resources, in order to avoid excessive use of memory by user-mode processes, there are also some restrictions imposed on the number of allowed IPC shared memory areas (default is 4096), and the size of each shared segment (default is 32MB) And the maximum number of bytes for all shared segments (default is 8GB).

However, system administrators can still adjust these values ​​by modifying the /proc/sys/kernel/shmmni, /proc/sys/kernel/shmmax and /proc/sys/kernel/shmall files respectively.
Insert image description here

Figure 19-3 shows the data structures associated with the IPC shared memory area. The shm_ids variable stores the data structure of the IPC shared memory resource type ipc_ids; the corresponding ipc_id_ary data structure contains an array of pointers to the shmid_kernel data structure, and each IPC shared memory resource corresponds to an array element.

Formally, this array stores pointers to the kern_ipc_perm data structure, but each such structure is nothing more than the first field of the shmid_kernel data structure. All fields of the shmid_kernel data structure are shown in Table 19-14.
Insert image description here

The most important field is shm_file, which stores the address of the file object. This reflects the tight integration of IPC shared memory and VFS in Linux 2.6. Specifically, each IPC shared memory area is associated with an ordinary file belonging to the shm special file system.

Because the shm file system has no mount point in the system directory tree, users cannot open and access its files through ordinary VFS system calls.

However, whenever a process "attaches" a memory segment, the kernel calls do_mmap() and creates a new shared memory mapping of the file in the process's address space. Therefore, files belonging to the shm special file system have only one file object method mmap, which is implemented by the shm_mmap() function.
As shown in Figure 19-3, the memory area corresponding to the IPC shared memory area is described by the vm_area_struct object;
its vm_file field points back to the file object of the special file, and the special file in turn refers to the directory entry object and the index node object.

The index node number stored in the i_ino field of the index node is actually the location index of the IPC shared memory area. Therefore, the index node object indirectly refers to the shmid_kernel descriptor.

Likewise, for any shared memory mapping, the page frame is contained in the page cache through the address_space object, which is contained in the index node and referenced by the i_mapping field of the index node (you can also see Figure 16-2).

If the page frame belongs to the IPC shared memory area, the method of the address_space object is stored in the global variable shmem_aops.

Swap out pages in the IPC shared memory area

The kernel must be careful when swapping out pages contained in shared memory areas, and the role of the swap cache is critical.
Because the IPC shared memory area maps a special index node that is not imaged on disk, its pages are swappable. Therefore, in order to reclaim a page in the IPC shared memory area, the kernel must write it to the swap area. Because the IPC shared memory area is persistent - that is, these pages must be retained even if the memory segment is not attached to the process.

So even if no process is using these pages, the kernel cannot simply delete them. Let's see how PFRA reclaims IPC shared memory area page frames. Until the shrink_list() function processes the page, it is the same as described in the "Memory Shortage Reclamation" section of Chapter 17. Because this function does not do any checks for the IPC shared memory area, it calls the try_to_unmap() function to remove every reference to the page frame from the user-space address space. As described in the "Reverse Mapping" section of Chapter 17, the corresponding page table entry is deleted.

Then, the shrink_list() function checks the PG_dirty flag of the page and calls pageout() (page frames in the IPC shared memory area are always marked dirty when allocated, so pageout() is always called). The pageout() function calls the writepage method of the address_space object of the mapped file.

The shmem_writepage() function implements the writepage method of the IPC shared memory area page. It actually allocates a new page slot to the swap area and then moves it from the page cache to the swap cache (effectively changing the page owner's address_space object).

This function also stores the swapped page identifier in the shmem_inode_info structure, which contains the inode object of the IPC shared memory area. It again sets the PG_dirty flag of the page.

As shown in Table 17-5 in Chapter 17, the shrink_list() function checks the PG_dirty flag and interrupts the recycling process by leaving the page in the inactive list.

Sooner or later, the PFRA will also deal with this page frame. shrink_list() calls pageout() again to try to flush the page to disk.

But this time, the page is already in the swap cache, so its owner is the swap subsystem's address_space object, swapper_space. The corresponding writepage method swap_writepage() begins to effectively write to the swap area.

Once pageout() completes, shrink_list() confirms that the page is clean, removes the page from the swap cache and releases it to the partner system.

Request paging of IPC shared memory area

The pages added to the process through shmat() are all dummy pages; this function adds a new memory area to the address space of a process, but it does not modify the page table of the process.

Additionally, we have seen that pages in the IPC shared memory area can be swapped out. Therefore, these pages can be processed through the request paging mechanism.

We know that a page fault exception occurs when a process attempts to access a unit in the IPC shared memory area and its basic page frame has not yet been allocated. The corresponding exception handler determines that the address that caused the page fault is within the process's address space and that the corresponding page table entry is empty; therefore, it calls the do_no_page() function. This function also checks whether the nopage method is defined for this memory area. This method is then called and the page table entry is set to the returned address.

The memory area used by IPC shared memory usually defines the nopage method. This is achieved through the shmem_nopage() function, which does the following:

  1. Traverse the pointer list of the VFS object and derive the address of the index node object of the IPC shared memory resource (see Figure 19-3).

  2. The logical page number within the shared segment is calculated from the vm_start field of the memory region descriptor and the requested address.

  3. Checks to see if the page is already in the page cache, and if so, ends and returns the address of the descriptor.

  4. Check whether the page is in the swap cache and is up to date. If so, ends and returns the address of the descriptor.

  5. Check whether the shmem_inode_info embedded in the index node object stores the swap-out page identifier corresponding to the logical page number. If so, call read_swap_cache_async() to perform the swap operation, wait until the data transfer is completed, and then end and return the address of the page descriptor.

  6. Otherwise, the page is not in the swap area; therefore a new page frame is allocated from the partner system, inserted into the page cache, and its address is returned. The do_no_page() function sets the entry corresponding to the address that caused the page fault in the page table of the process, so that the function points to the page frame returned by the nopage method.

POSIX message queue

The POSIX standard (IEEE Std 1003.1-2001) defines an IPC mechanism based on the message queue, which is known as the POSIX message queue. It is very similar to the System V IPC message queue introduced in the "IPC Messages" section earlier in this chapter. However, POSIX message queues have many advantages over older queues:
1. Simpler file-based application interface
2. Full support for message priority (priority ultimately determines the position of messages in the queue)
3. Full support for asynchronous notification of message arrival, This is achieved through signals or thread creation.
4. The timeout mechanism for blocking send and receive operations
POSIX message queues are implemented through a set of library functions, see Table 19-15.
Insert image description here

Let's look at how applications typically use these functions. First, the application calls the mq_open() library function to open a POSIX message queue. The first parameter of the function is a string specifying the queue name, which is similar to the file name and must start with "/".
This library function receives a subset of the flags of the open() system call: 0_RDONLY, 0_WRONLY, O_RDWR, O_CREAT, O_EXCL, and O_NONBLOCK (for non-blocking send and receive).

Note that an application can create a new POSIX message queue by specifying a 0_CREAT flag. The mq_open() function returns a queue descriptor that is very similar to the file descriptor returned by the open() system call. Once a POSIX message queue is open, applications can send and receive messages through the library functions mq_send() and mq_receive(), passing them the queue descriptor returned by mq_open() as a parameter.

Applications can also specify the maximum time for the application to wait for send and receive operations to complete through mq_timedsend() and mq_timedreceive(). In addition to blocking on mq_receive(), or continuing to poll the message queue if the O_NONBLOCK flag is set, the application can also establish an asynchronous notification mechanism by executing the mq_notify() library function.
In fact, when a message is inserted into an empty queue, the application can request that either a signal be sent to the specified process or a new thread be created.
Finally, when the application is finished using the message queue, it calls the mq_close() library function, passing it the queue descriptor.

Note that this function does not delete the queue, just like the close() system call does not delete the file. To delete a queue, the application needs to call the mq_unlink() function.

In Linux 2.6, the implementation of POSIX message queue is simple. A special file system called mqueue has been introduced, in which each existing queue has a corresponding index node. The kernel provides several system calls: mq_open(), mq_unlink(), mq_timedsend(), mq_timedreceive(), mq_notify() and mq_getsetattr(). These system calls roughly correspond to the library functions in Table 19-15 above.

These system calls transparently operate on files in the mqueue file system, and most of the work is handled by the VFS layer. For example, notice that the kernel does not provide the mq_close() function, and in fact the queue descriptor returned to the application is actually a file descriptor, so the work of mq_close() can be done by the close() system call. The mqueue special file system cannot be installed in the system directory tree. However, if installed, users can create POSIX message queues by using files in the root directory of the file system, or read the corresponding files to obtain information about the queue. Finally, applications can use select() and poll() to be notified of queue state changes.

Each queue has an mqueue_inode_info descriptor, which contains an inode object that corresponds to a file in the mqueue special file system.

When the POSIX message queue system call receives a queue descriptor as a parameter, it calls the VFS fget() function to calculate the address of the corresponding file object. Then, the system call gets the index node object of the file in the mqueue file system. Finally, you can get the mqueue_inode_info descriptor address corresponding to the index node object.

Pending messages in the queue are collected into a one-way linked list in the mqueue_inode_info descriptor. Each message is represented by a descriptor of type msg_msg, which is exactly the same as the message descriptor used in System VIPC.

Guess you like

Origin blog.csdn.net/x13262608581/article/details/132465685