Linux kernel: process management - pipes, sockets

pipeline

Pipeline is one of the most basic inter-process communication methods provided by the operating system. Every time a pipe is created, there are two file descriptors, one is responsible for reading the pipe and the other is responsible for writing the pipe. Therefore, when using pipeline communication, it can be regarded as two file descriptors plus a piece of memory in the kernel space, as shown in the figure.

Pipelines can only coordinate inter-process communication with kinship, so-called kinship, such as parent-child processes and brother processes. When a process creates a pipe, it has two file descriptors of the pipe, and its child process will inherit these two file descriptors, so the child process can also read and write the pipe. As shown in the picture.

However, in order to make pipeline communication safer and more convenient, generally each process at both ends of the pipeline will close the file descriptor of a pipeline. For example, the parent process closes the read descriptor, so that the parent process can only write data to the pipeline, and the child process closes the write Descriptor, so that the child process can only read data from the pipe. Or the other way around. As shown in the picture.

Shell also provides pipes, just use a vertical bar to connect two commands. For example:

[root@docker-03 ~]# ps -elf | grep "sshd"
4 S root        939      1  0  80   0 - 26519 poll_s 18:15 ?        00:00:00 /usr/sbin/sshd -D
4 S root       1306    939  0  80   0 - 37099 poll_s 18:16 ?        00:00:00 sshd: root@pts/0
0 S root       1417   1308  0  80   0 - 28182 pipe_w 19:23 pts/0    00:00:00 grep --color=auto sshd
[root@docker-03 ~]# cat a.log | grep "hello world"

Under the shell, this kind of pipeline is called an anonymous pipeline, that is, a pipeline without a name. It is very convenient for writing command lines, and the logic is clear and easy to understand. Shell scripts and shell command lines almost rely on it to lay half of the world.

Under the shell, it also supports the use of mkfifocommands to create named pipes (named pipes), that is, named pipes, which are also called FIFOs, which can coordinate data communication between arbitrary processes.

For example, to create a named pipe file a.fifo, a.fifo is the name of the named pipe. Although it exists on the disk as a file, the way it transfers data does not go through disk IO, but directly in memory, so the speed is very fast. The file name is just the name of the named pipe, which refers to this The inlet and outlet of the pipe.

$ mkfifo a.fifo
$ ls -l a.fifo
prw-r--r-- 1 root root 0 Apr 30 23:52 a.fifo  # 文件类型为p

The named pipe is a blocking two-way communication pipe, which can be read and written by either party, but only when the named pipe is opened by the read and write ends at the same time, the data will be written and read. For example, the figure below shows that all write operations to a named pipe are blocked when the read-side named pipe is not open. If cat a.fifo presses the Enter key to open the read-side named pipe, both write and read operations will be performed normally. Similarly, when only the read end is opened but the write end of the named pipe is not opened, the read operation will also be blocked.

socket

Sockets are used to coordinate inter-process communication on different computers, that is, network-based communication. Of course, you can also use sockets for inter-process communication on this machine.

There are many ways of socket communication, such as Unix domain socket, TCP socket, UDP socket, link layer socket and so on. But the most commonly used is definitely TCP sockets. Therefore, here is the TCP Socket communication method, and the Unix domain socket will be introduced separately later.

TCP Socket is used for communication based on the TCP protocol between the client and the server, so a socket needs to be created on both the client and the server. When creating a TCP socket, the file descriptor of this socket will be returned, and the socket can be read and written through this file descriptor.

In contrast, when a program needs to read and write a disk file at the same time (this command does not seem to be found on the command line, but it is easy to achieve by programming), since only a single file descriptor is responsible for reading and writing at the same time And writing, it is likely to need to change the position of reading and writing by constantly moving the file pointer, otherwise the data will be easily confused.

The TCP socket also reads and writes the socket through a single file descriptor. In order to ensure that the reading and writing positions are not messy, the operating system maintains two buffer spaces for each TCP socket in the kernel space, one buffer One for writing and one buffer for reading. The buffer space for reading is called recv buffer, the buffer space for writing is called send buffer, and they are collectively called socket buffer.

Therefore, it is simple for the server and the client to communicate through two sockets. One end writes data to the send buffer, and the data in the buffer will be sent to the recv buffer at the other end through the established TCP connection, so the other end only needs to Inter-process communication on different computers can be realized by reading data from the recv buffer. The process is shown in the figure.

Unix domain socket

Unix domain socket is a kind of socket, which is used for inter-process communication of the machine, and is generally used to realize the pipeline of two-way communication. Unix domain sockets are much more lightweight and efficient than network sockets, because they do not involve network communication, do not need to listen for connections, do not need to bind addresses, do not need to care about protocol types, and so on.

After the Unix domain socket is created, two file descriptors are returned, both of which are readable and writable to the socket, thus realizing full-duplex two-way communication.

Similarly, in order to avoid data confusion caused by simultaneous reading and writing with a single file descriptor, Unix domain sockets also have two buffer spaces.

Kernel information through train: Linux kernel source code technology learning route + video tutorial code information

Learning through train: Linux kernel source code/memory tuning/file system/process management/device driver/network protocol stack

Original Author: Extreme Linux Kernel

Original address: Linux Kernel: Process Management - Pipes, Sockets - Zhihu (Copyright belongs to the original author, contact to delete infringement message)

Guess you like

Origin blog.csdn.net/m0_74282605/article/details/130134631