Unix socket (UDS, Unix Domain Socket)

[Introduction to knowledge]

In Linux systems, there are many inter-process communication methods, and sockets are one of them. However, traditional socket usage is based on the TCP/IP protocol stack and requires specifying an IP address. Of course there is nothing wrong with this if two processes on different hosts communicate. However, if you only need to communicate between two different processes on one machine, using IP addresses is a bit overkill.

In fact, many people don't necessarily know that for sockets, there is a category called Unix domain sockets, which is specifically used to solve this problem. The calling method of its API is basically the same as that of ordinary TCP/IP sockets, with only slight differences.

Unix domain sockets are used for inter-process communication on the same computer. Although Internet domain sockets can be used for the same purpose, Unix domain sockets are more efficient. Unix domain sockets do not perform protocol processing, do not need to add or delete network headers, do not need to calculate checksums, do not need to generate sequence numbers, and do not need to send confirmation messages. Unix domain sockets provide two data types: byte stream and datagram. The Unix domain datagram service is reliable and will not lose messages or deliver errors. Simply put, Unix domain sockets are a hybrid between sockets and pipes.

[Process comparison]

 

The upper figure represents a stream-oriented socket, and for TCP/IP sockets, it represents the TCP protocol; the lower figure represents a packet-oriented socket, and for TCP/IP sockets, it represents the UDP protocol. . Next, we will explain some of these APIs, specifically focusing on the differences between ordinary TCP/IP sockets and Unix domain sockets.

【Instructions】

1)socket()

The first step is to create a socket:

1. int socket (int domain, int type, int protocol);

 

The API definition is the same, but the first parameter here, that is, the domain must be set to AF_UNIX or AF_LOCAL, not AF_INET for ordinary TCP/IP sockets. The second parameter indicates the type of socket, which is divided into stream socket (SOCK_STREAM) and packet socket (SOCK_DGRAM). Different from ordinary AF_INET Sockets, since they communicate locally through the kernel, SOCK_STREAM and SOCK_DGRAM are reliable, and there will be no packet loss or inconsistency between the order of sending packets and the order of receiving packets. The difference between them is that SOCK_STREAM will not be truncated no matter how much data is sent, while for SOCK_DGRAM, if the data sent exceeds the maximum length of a message, the data will be truncated. The last parameter represents the protocol. For Unix domain sockets, it must be set to 0. Therefore, a Unix domain socket is generally created in the following way:

1. int sockfd = socket(AF_UNIX, SOCK_STREAM, 0);  // 流式Unix域套接字  
2. int sockfd = socket(AF_UNIX,SOCK_DGRAM, 0);    // 数据包式套接字

2)bind()

For the server side of the streaming socket, after using the socket() function to obtain the file descriptor of the newly created socket, it must be bound to an address:

1. int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

 

In Unix domain sockets, the socket address is represented by the sockaddr_un structure, whose structure is as follows:

struct sockaddr_un {
     sa_family_t  sun_family;       /* AF_UNIX 或 AF_LOCAL */
     char         sun_path[108];    /* pathname */
}

The first field in the structure must be set to "AF_UNIX". The second field represents a path name. Therefore, to bind a Unix domain socket to a local address, you need to create and initialize a sockaddr_un structure, and pass the pointer to this structure as the addr parameter (requires type conversion) to the bind() function. And set the addrlen parameter to the actual size of this structure.

I would also like to mention that this path name is actually divided into two types, one is the ordinary path name and the other is the abstract path name.

First, let’s talk about the common path name. This is easy to understand. It is a basic Linux file path, which must end with NULL ('\0'). When binding a Unix domain socket, a file will be created at the corresponding location in the file system, and the type of this file is marked as "Socket", so this file cannot be opened with the open() function. When this Unix domain socket is no longer needed, you can use the remove() function or unlink() function to delete the corresponding file. If there is already a file with the specified pathname in the file system, the bind will fail (returning the error EADDRINUSE). Therefore, a socket can only be bound to one path, and similarly, a path can only be bound to one socket.

Next, let's take a look at what abstract path names are. This is actually a feature unique to Linux. It allows a Unix domain socket to be bound to a name without creating a file with this name in the file system. If you want to create an abstract namespace binding, you must set the first byte of the sun_path field to NULL ('\0'), and unlike the ordinary file system namespace, the system will use sun_path to divide the first byte. All remaining bytes after one byte are treated as abstract names. That is to say, when parsing abstract path names, all bytes in the sun_path field need to be used, instead of parsing ordinary path names and stopping when the first NULL is parsed. Because files are no longer created in the file system, there is no need to worry about name conflicts with existing files in the file system for abstract path names, and there is no need to delete the socket after use. This file is generated with it, and this abstract name will be automatically deleted when the socket is closed.

Finally, let me mention the issue of permissions, because the corresponding file needs to be created in the file system. For ordinary path names, the process using the bind() function must have writable and accessible permissions for the directory part of the path name. Also, by default, when the bind() function is called, all permissions (i.e. 777) will be given to the owner, group and other users. If you want to change this behavior, you can modify the created file after bind(). File permissions and attributes.

3)listen()

For the server side of streaming sockets, the listen() function is called in the same way in TCP/IP sockets and Unix domain sockets, there is no difference:

int listen(int sockfd, int backlog);

4)accept()

For the server side of the streaming socket, after calling bind() to bind the local path, it still needs to receive the client's request. This is achieved by calling the accept() function:

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

 

Unlike ordinary TCP/IP sockets, Unix domain sockets do not have the problem of client addresses (all are on one machine), so the addr and addrlen parameters here must be set to NULL. Here, the stream data sent by different processes like this server-side process is distinguished in the kernel and bound to the socket created by accept(). The data packet socket does not have this correspondence, so it still needs to be distinguished in the code, which will be introduced later.

5)connect()

For streaming socket clients, after using the socket() function to obtain the file descriptor of the newly created socket, the connect() function can be called to connect to the server:

int connect(int sockfd, struct sockaddr *addr,int addrlen);

 

This function is called in TCP/IP sockets in basically the same way as in Unix domain sockets, except that like the bind() function, the address addr must be represented by the sockaddr_un structure.

6)read()和write()

For the server side of streaming sockets, the read() and write functions are called in the same way in TCP/IP sockets and Unix domain sockets, there is no difference:

1. ssize_t read(int sockfd, void *buf, size_t length);   
2. ssize_t write(int sockfd, const void *buf, size_t length);

 

7) recvfrom() and sendto()

For data packet sockets, recvfrom() on the server side is used to receive requests sent by the client, and on the client side this function is used to receive responses sent by the server:

int recvfrom(int sockfd, void *buf, int length, unsigned int flags, struct sockaddr *addr, int *addrlen);

 

At the same time, sendto() on the client side is used to send request data to the server side, and the server side uses this function to send response data to the client side:

int sendto (int sockfd, const void *buf, int length, unsigned int flags, const struct sockaddr *addr, int addrlen);

As mentioned earlier, for packet sockets, the server needs to know which client it is when sending response data, so that the corresponding response data can be sent to the correct client later. The client also needs to know which server it is sending data to, or which server the response data received comes from (of course, if it only guarantees communication with one server, there will be no such problem).

However, according to the ordinary packet socket creation and connection process, only an address is bound using the bind() function on the server side, but the client does not have an address. This is no problem in streaming sockets. The kernel has created a new socket when the accept() function is called on the server side to receive a client connection, thereby binding the one-to-one correspondence to this new socket. The words are on. Therefore, for packet sockets, the client needs to call the bind() function to bind again to artificially create a client address, and this client path name address obviously cannot be the same as the server path name.

The rest is the same as an ordinary TCP/IP socket, except that the address addr must be represented by the sockaddr_un structure.

[Required header files]

#include <sys/types.h>
#include <sys/socket.h>

【references】

Introduction to Unix Domain Socket (Unix Domain Socket) [Repost]_8465449's Technology Blog_51CTO Blog Introduction to Unix Domain Socket (Unix Domain Socket) [Repost], the article is an original article by the blogger and may not be reproduced without the permission of the blogger . Copyright statement: This article is an original article by the blogger and may not be reproduced without the blogger's permission. In Linux systems, there are many inter-process communication methods, such as Socket https://blog.51cto.com/u_8475449/5954776

Guess you like

Origin blog.csdn.net/weixin_39538031/article/details/130562287