Why anonymous pipes can communicate between parent and child processes

There are already many uses of anonymous pipes on the Internet, so I won't introduce them too much here, just focus on the theme of this article.

The underlying implementation of the anonymous pipe creation function pipe() system call is equivalent to a special file system. Each time it is called, an inode is created and associated with two files, one for reading and one for writing, so as to realize the single to flow.

A pipe is actually an invisible (existing only in memory) file, and operations on this file are performed through two open files, representing the two ends of the pipe respectively.

Each file is represented by an inode data structure. Although a pipe is actually an intangible file, it also has an inode data structure. Since this file does not exist before the pipeline is created, an inode structure needs to be temporarily created when the pipeline is created.

The general process is to first allocate a memory page as a buffer for the pipe, and then allocate a buffer for the pipe_inode_info data structure. Why do you want to do this? The file used to implement the pipe is invisible, it does not appear on the disk or other file system storage media, but only exists in the memory space, and other processes cannot "open" or access the file. Therefore, this so-called file is essentially just a memory page used as a buffer, but it is only incorporated into the mechanism of the file system, and is managed by borrowing various data structures and operations of the file system.

The functions read and write for reading and writing files are also applicable to anonymous pipes. They will make a difference at the bottom layer, calling pipe_read() and pipe_wrtie() respectively.

Pipe write functions write data by copying bytes to the physical memory pointed to by the VFS inode, and pipe read functions read data by copying bytes in physical memory. Of course, the kernel must use some mechanism to synchronize access to the pipe, and for this, the kernel uses locks, wait queues, and signals.

When the writing process writes to the pipe, it uses the standard library function write(), and the system can find the file structure of the file according to the file descriptor passed by the library function. The address of the function (ie the write function) used for the write operation is specified in the file structure, so the kernel calls this function to complete the write operation.

Before the write function writes data to the memory, it must first check the information in the VFS inode, and the actual memory copy work can only be performed when the following conditions are met:
    There is enough space in the memory to accommodate all the data to be written. ;
    The memory is not locked by the reader.

If both of the above conditions are met, the write function first locks the memory and then copies the data from the address space of the writing process to the memory. Otherwise, the writing process sleeps in the waiting queue of the VFS inode, and then the kernel calls the scheduler, which chooses another process to run.

The writing process is actually in an interruptible waiting state. When there is enough space in the memory to accommodate the written data, or the memory is unlocked, the reading process will wake up the writing process. At this time, the writing process will receive the signal. After the data is written to the memory, the memory is unlocked, and all read processes sleeping in the inode are awakened.

The reading process of a pipe is similar to the writing process. However, instead of blocking the process, a process can return an error message immediately when there is no data or memory is locked, depending on the open mode of the file or pipe. Conversely, the process can sleep in the waiting queue of the inode waiting for the writing process to write data. When all processes complete the pipeline operation, the inode of the pipeline is discarded, and the shared data pages are released.

Well, after writing so much and checking so much information, it is estimated that I am still being forced, so let’s talk about it in a way that we can understand:

In essence, an anonymous pipe is a piece of memory (such as a memory page, usually 4KB) requested by the operating system in the process kernel space, and then the operating system treats this memory as a first-in, first-out (FIFO) circular queue to access data, All this is done by the operating system.

That is to say, write data to one end of the anonymous pipe, and the other end will receive the data. It's that simple. Of course, this is possible in the same process, but it doesn't make much sense.

If a process creates an anonymous pipe, and then fork out the child process, then the good game comes. Since the child process shares some data structures of the parent process, and the anonymous pipe happens to be in it, the parent process and the child process can pass anonymous Pipeline communication. Exactly how the child process and the parent process are shared, let's talk about this -- the process calls fork and the sharing of file descriptors (fork, dump)

In the Linux process description task_struct{}, there is an array specially used to record an open file, in which the file descriptor is used as the subscript of the array, and the array element is the file entry created by pointing to the opened file. As shown in the figure below, the file table entry is used to describe the status information of the file after it is currently opened by a certain process, including the file status flag, recording the displacement of the current file read (which can be set through the interface lseek), and the i-node of the file Pointer (i-node describes the specific information of the file, such as: creation, modification time, file size, file storage block information). After different processes open the same file, the relationship between the process table and the file table is shown in the following figure:


The open files of the process and the structure diagram after fork are shown below. The child process is the file table entry that shares the parent process:


As mentioned earlier, anonymous pipes are also designed in the form of files, so after the parent process forks the child process, they share the file entry of the anonymous pipe. The operation of the anonymous pipe by the parent and child processes seems to be carried out in the same process, of course, communication can be achieved. .

Finally, let's talk about the dup and dup2 functions:

When the dup function is called, the kernel creates a new file descriptor in the process. This descriptor is the smallest value of the currently available file descriptor. This file descriptor points to the file table entry owned by oldfd. 

The difference between dup2 and dup is that you can use the newfd parameter to specify the value of the new descriptor. If newfd is already open, close it first. If newfd is equal to oldfd, dup2 returns newfd without closing it. The new file descriptor returned by the dup2 function also shares the same file table entry with the parameter oldfd.  

refer to:

https://blog.csdn.net/ordeder/article/details/21716639

https://blog.csdn.net/silent123go/article/details/71108501

https://www.linuxidc.com/Linux/2017-11/148216.htm

https://blog.csdn.net/vonzhoufz/article/details/44494669

https://segmentfault.com/a/1190000009528245

https://blog.csdn.net/cywosp/article/details/38965239

https://blog.csdn.net/u014379540/article/details/53456070

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324661189&siteId=291194637