Inter-process communication (7) - sockets

0. Preface

The concept of process communication originally originated from stand-alone systems. Since each process runs within its own address range, in order to ensure that two processes communicate with each other

Processes do not interfere with each other and work in harmony. The operating system provides corresponding facilities for process communication, such as:

UNIX BSD: pipe , named pipe (fifo) , signal (sinal)

UNIX system: message queue (message) , shared memory (shm) , semaphore (semaphore)

They are both limited to local inter-process communication. What inter-network communication needs to solve is the communication problem between different host processes (communication between processes on the same machine can be regarded as a special case). On the same host, different processes can be uniquely identified by process IDs. However, in a network environment, the process number assigned independently by each host cannot uniquely identify the process. For example, host A assigns process number 5 to a certain process, and process number 5 can also exist in host B. Therefore, the sentence "process number 5" is meaningless. Secondly, the operating system supports many network protocols, and different protocols work in different ways and have different address formats. Therefore, inter-network process communication must also solve the problem of identifying multiple protocols.

The TCP/IP protocol suite has helped us solve this problem. The " ip address " of the network layer can uniquely identify the host in the network, while the " protocol + port " of the transport layer can uniquely identify the application (process) in the host. In this way, the triplet ( ip address, protocol, port ) can be used to identify the process of the network, and process communication in the network can use this mark to interact with other processes.

Applications that use the TCP/IP protocol usually use application programming interfaces: sockets of UNIX BSD and TLI of UNIX System V (already obsolete) to achieve communication between network processes. For now, almost all applications use sockets, and now is the Internet age. Process communication on the network is ubiquitous. This is why I say "everything is socket " .

1. socket socket

socket A socket is an abstraction for communication breakpoints. Just as an application uses a file descriptor to access a file, a socket descriptor is required to access a socket . Socket descriptors are implemented as file descriptors in UNIX systems, so many functions that process file descriptors (such as read and write) can also call socket descriptors.

2. socket()

#include <sys/socket.h>

int socket(int domain, int type, int protocol);

This function is used to create a socket descriptor, which uniquely identifies a socket.

return value:

If successful, return the socket descriptor;
If an error occurs, -1 is returned;

parameter:

domain: domain, determines the characteristics of communication, see Section 2.1 for details ;
type: Determine the type of socket and further determine the communication characteristics. See Section 2.2 for details ;
protocol: Usually 0, indicating that the default protocol is selected according to the given domain and socket type. Of course, when supporting multiple protocols for the same domain and socket type, you can use the protocol parameter to select a specific protocol.
- In the AF_INET communication domain, the socket type is SOCK_STREAM and the default protocol is TCP;
- In the AF_INET communication domain, the socket type is SOCK_DGRAM and the default protocol is UDP;

2.1 Parameter domain

The types of domains are roughly divided into:

AF_INET: IPv4 Internet domain;
AF_INET6: IPv6 Internet domain;
AF_UNIX: UNIX domain, most systems also define AF_LOCAL, which is the alias of AF_UNIX;
AF_UNSPEC: Not specified, can represent any domain;

UNIX domain sockets are used for communication between processes running on the same machine. Although Internet domain sockets can be used for the same purpose, UNIX domain sockets are more efficient. UNIX domains only copy data; they do not perform protocol processing, do not add or remove network headers, calculate checksums, generate sequence numbers, or send acknowledgment messages.

UNIX domain sockets provide both stream and datagram interfaces. The UNIX domain datagram service is reliable, with neither message loss nor delivery errors. UNIX domain sockets are a hybrid between sockets and pipes .

To create a pair of unnamed, interconnected UNIX domain sockets, users can use their network-facing domain socket interface, or they can use the socketpair() function.

#include <sys/socket.h>

int socketpair(int domain, int type, int protocol, int sockfd[2]);

                                            返回值：若成功返回0，若出错返回-1

2.2 Parameter type

type is used to determine the type of socket:

SOCK_STREAM: also known as byte stream, an ordered, reliable, bidirectional connection-oriented byte stream ;
SOCK_DGRAM: Also known as datagram, unreliable message delivery with fixed length and no connection;
SOCK_RAW: Datagram interface of IP protocol;
SOCK_SEQPACKET: Fixed-length, ordered, and reliable connection-oriented message delivery;

Datagram (SOCK_DGRAM) is a self-contained message. Sending a datagram is similar to mailing a letter to someone. Many letters can be mailed , but delivery order is not guaranteed and some letters may be lost along the way. Each letter contains the recipient's address, making this letter independent of all other letters. Each letter may reach a different recipient.

For byte streams (SOCK_STREAM), the application is not aware of message boundaries because the socket provides byte stream services. This means that when reading data from the socket, it may not return all the bytes written by the sending process. Eventually you can get all the data sent, but it may take several function calls to get it.

SOCK_SEQPACKET is similar to the SOCK_STREAM socket, but the socket is based on packet services rather than byte stream services. This means that the amount of data received from the SOCK_SEQPACKET socket is consistent with what was sent by the other party. The Stream Control Transmission Protocol (SCTP) provides sequential packet services on the Internet domain.

SOCK_RAW sockets provide a datagram interface for direct access to the underlying network layer (IP in the Internet domain). Using this socket, the application is responsible for constructing its own protocol headers to prevent malicious programs from bypassing the built-in security mechanisms to create messages.

3. bind()

#include <sys/socket.h>

int bind(int sockfd, const struct sockaddr *addr, socklen_t len);

                                    返回值：若成功返回0，若出错返回-1

For the server, a well-known address needs to be bound to a socket that receives client requests. Use the bind() function to bind the address to the socket.

The client does not need to specify, the system automatically assigns a port number and its own IP address combination.

parameter:

sockfd: The socket descriptor created through socket() will have addr bound to the socket;
addr: points to the protocol address to be bound to sockfd . This address structure varies according to the address protocol family when creating the socket;
len: the length of the corresponding address;

Focus on the parameter addr. This address structure differs according to the address protocol family when creating the socket.

The format of the address is associated with a specific communication domain. In order to allow addresses in different formats to be passed into the socket function, the address is forced to be converted into a universal sockaddr representation:

struct sockaddr {
	sa_family_t	sa_family;	/* address family, AF_xxx	*/
	char		sa_data[];
    .
    .
    .
};

For example, in linux, the structure is defined as follows:

struct sockaddr {
	sa_family_t	sa_family;	/* address family, AF_xxx	*/
	char		sa_data[14];	/* 14 bytes of protocol address	*/
};

In freeBSD, this structure is defined as follows:

struct sockaddr {
    unsigned char sa_len;     /*total length*/
	sa_family_t	sa_family;    /*address family*/
	char		sa_data[14];  /*variable-length address*/
};

In the IPv4 Internet domain, the structure is defined as follows:

struct sockaddr_in {
  sa_family_t	    sin_family;	/* Address family		*/
  in_port_t		    sin_port;	/* Port number			*/
  struct in_addr	sin_addr;	/* Internet address		*/
}

struct in_addr {
	uint32_t    s_addr;        /* address in network byte order */
};

There are some restrictions on the addresses that can be used:

The specified address must be valid on the machine where the process is running, and the address of another machine cannot be specified;
The address must match the format supported by the address family when the socket was created;
The port number must be no less than 1024, unless the process has corresponding privileges (that is, it is a super user);
Typically only socket breakpoints can be bound to addresses, although some protocols allow multiple bindings;

3.1 Network byte order and host byte order

Host byte order is what we usually call big-endian and little-endian modes: different CPUs have different byte order types. These byte order refer to the order in which integers are stored in memory. This is called host order. Quoting the standard definitions of Big-Endian and Little-Endian are as follows:

Little-Endian means that the low-order bytes are arranged at the low address end of the memory, and the high-order bytes are arranged at the high address end of the memory.
Big-Endian means that the high-order bytes are arranged at the low address end of the memory, and the low-order bytes are arranged at the high address end of the memory.

In network byte order , 4-byte 32-bit values are transmitted in the following order: first 0~7bit, then 8~15bit, then 16~23bit, and finally 24~31bit. This transfer order is called big-endian . Since all binary integers in the TCP/IP header are required to be in this order when transmitted over the network, it is also called network byte order. Byte order, as the name suggests, is the order in which data larger than one byte is stored in memory. There is no order issue with data of one byte.

Therefore, when binding an address to a socket, please first convert the host byte order to network byte order , and do not assume that the host byte order uses Big-Endian like the network byte order. There have been murders caused by this problem! This problem has caused many inexplicable problems in the company's project code, so please remember not to make any assumptions about the host byte order, and be sure to convert it into network byte order before assigning it to the socket.

3.2 getsockname()

#include <sys/socket.h>

int getsockname(int sockfd, struct sockaddr *restrict addr, socklen_t *restrict alenp);

                                         返回值：若成功返回0，若出错返回-1

The address bound to a socket can be discovered by calling the function getsockname().

Before calling getsockname(), set alenp to a pointer to an integer specifying the size of buffer sockaddr. On return, this integer will be set to the size of the return address. If the address does not match the supplied buffer length, it is truncated without error. If there is no address currently bound to the socket, the result is undefined.

4. connect()

If you are dealing with a connection-oriented network service (SOCK_STREAM or SOCK_SEQPACKET), before starting to exchange data, you need to establish a connection between the process socket requesting the service (client) and the process socket providing the service (server). connect.

#include <sys/socket.h>

int connect(int sockfd, const struct sockaddr *addr, socklen_t len);

                                           返回值：若成功返回0，若出错返回-1

The address specified in connect() is the address of the server you want to communicate with. If sockfd is not bound to an address, connect() will bind a default address to the caller.

When connecting to a server, the connection may fail for a number of reasons. The machine you want to connect to must be up and running, the server must be bound to an address you want to connect to, and there should be enough space in the server's queue waiting for connections. Therefore, the application must be able to handle errors returned by connect(), which may be caused by some transiently changing conditions.

5. listen()

#include <sys/socket.h>

int listen(int sockfd, int backlog);

                                             返回值：若成功返回0，若出错返回-1

parameter:

sockfd: the socket descriptor to be monitored;
backlog: used to limit the number of connection requests. Once the queue is full, the system will reject excess connection requests;

The socket created by the socket() function is an active type by default. listen() turns the socket into a passive type, waiting for the client's connection request.

6. accept()

#include <sys/socket.h>

int accept(int sockfd, struct sockaddr *restrict addr, socklen_t *restrict len);

                                             返回值：若成功返回socket描述符，若出错返回-1

After the server calls the listen() listening function, the socket can accept the connection request, use accept() to obtain the connection request and establish the connection.

When accept() returns successfully, the return value is a socket descriptor used to connect to the client that called connect() . This new socket descriptor (also known as the connection socket) has the same socket type and address family as the original sockfd (also known as the listening socket) .

If you don't care about the client identity, you can set the parameters addr and len to NULL. Otherwise, before calling accept(), you should set the parameter addr to a buffer large enough to store the address, and set len to point to the buffer size. A pointer to an integer. When returning, accept() will fill the buffer with the client's address and update the integer pointed by pointer len to the size of the address.

If no connection comes, accept() will block until the connection request comes.

7. Data transmission

read() / write()
sned() / recv()
sendto() / recvfrom()
sendmsg() / recvmsg()

Socket data transmission is roughly divided into the above four groups. In addition to the read() and write() system calls, socket also provides three groups of data transmission interfaces.

7.1 send()

#include <sys/socket.h>

ssize_t send(int sockfd, const void *buf, size_t nbytes, int flags);

                                         返回值：若成功返回发送字节数，若出错返回-1

Similar to write(), the socket must be connected when using send().

If send() returns successfully, it does not necessarily mean that the process on the other end of the connection received the data. All that is guaranteed is that when send() returns successfully, the data has been sent to the network without errors.

There are three possible return values of send():

Greater than 0: indicates the length of data sent;
Equal to 0: indicates timeout or peer host is shut down;
Less than 0: abnormal;

For protocols that support setting limits for packets, if a single packet exceeds the maximum size supported by the protocol, send() fails and sets errno to EMSGSIZE; for byte stream protocols, sned() blocks until the entire data is transmitted .

The fourth parameter flags (usually 0) of sned() specifies flags to change the way the transmitted data is processed:

MSG_DONTROUTE: Do not route data out of local data;
MSG_DONTWAI: allows non-blocking operations, equivalent to using O_NONBLOCK;
MSG_EOR: If the protocol supports it, this is the end of the record;
MSG_OOB: If the protocol supports it, send out- of-band data ;

7.1.1 Out-of-band data

Out-of-band data is an optional feature supported by some communication protocols, allowing higher priority data to be transmitted before ordinary data. Even if the transmission queue already has data, the code data is transmitted first.

TCP supports out-of-band data, but UDP does not. The socket interface's support for out-of-band data is largely affected by the specific implementation of TCP out-of-band data.

7.2 recv()

#include <sys/socket.h>

ssize_t recv(int sockfd, const void *buf, size_t nbytes, int flags);

                                         返回值：若成功返回接收消息字节数，若出错返回-1

recv() is like read(), but allows specifying options to control how data is received:

MSG_OOB: If the protocol supports it, receive out-of-band data ;
MSG_PEEK: Returns the message content without actually taking away the message;
MSG_TRUNC: Even if the message is truncated, the actual length of the message is required to be returned;
MSG_WAITALL: Wait until all data is available (only for SOCK_STREAM type);

For SOCK_STREAM type sockets, less data can be received than requested. But the flag MSG_WAITALL prevents this behavior, and the recv() function cannot return until all required data has been received.

For SOCK_DGRAM and SOCK_SEQPACKET type sockets, the MSG_WAITALL flag does not change the behavior because these message-based socket types return the entire message in one read.

If the sender has called shutdown() to end the transmission, or the network protocol supports the default sequential shutdown and the sender has been shut down, recv() returns 0 when all data has been received.

7.3 sendto()

#include <sys/socket.h>

ssize_t sendto(int sockfd, const void *buf, size_t nbytes, int flags,
                          const struct sockaddr *destaddr, socklen_t destlen);

                                         返回值：若成功返回发送字节数，若出错返回-1

sendto() is similar to send(), except that sendto() allows specifying a target address on a connectionless socket.

For connection-oriented socket communication, the destination address is ignored because the destination address is implicit in the connection. For connectionless sockets, send() cannot be used unless the target address is preset when calling connect(), or sendto() is used to provide another way to send messages.

7.4 recvfrom()

#include <sys/socket.h>

ssize_t recvfrom(int sockfd, void *restrict buf, size_t len, int flags,
                          struct sockaddr *restrict addr,
                          socklen_t *restrict addrlen);

                                         返回值：若成功返回接收消息字节数，若出错返回-1

If addr is non-empty, it will contain the socket endpoint address of the data sender. When calling recvfrom(), you need to set the addrlen parameter to point to an integer containing the size of the socket buffer pointed to by addr. On return, this integer is set to the actual byte size of the address.

Because the sender's address can be obtained, recvfrom() is usually used for connectionless sockets. Otherwise, recvfrom() is equivalent to recv().

The return value is the same as recv().

7.5 sendmsg()

#include <sys/socket.h>

ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags);

                                         返回值：若成功返回发送字节数，若出错返回-1

parameter:

sockfd: specifies the sockt fd to send data;

msg: Pointer to the msghdr structure, which contains the message to be sent and related metadata;

flags: flags;

Let’s take a look at the msghdr structure:

/* Structure describing messages sent by
    `sendmsg' and received by `recvmsg'.  */
struct msghdr
{
    void *msg_name;     /* Address to send to/receive from.  */
    socklen_t msg_namelen;  /* Length of address data.  */

    struct iovec *msg_iov;  /* Vector of data to send/receive into.  */
    size_t msg_iovlen;      /* Number of elements in the vector.  */

    void *msg_control;      /* Ancillary data (eg BSD filedesc passing). */
    size_t msg_controllen;  /* Ancillary data buffer length.
                   !! The type should be socklen_t but the
                   definition of the kernel is incompatible
                   with this.  */

    int msg_flags;      /* Flags on received message.  */
};

msg_name: a structure pointer pointing to the socket address to send/receive information;
msg_namelen: socket address structure length;
msg_iov: iov array pointer, each element contains a buffer and length of data to be sent;
msg_iovlen: The number of elements in the iov array;
msg_control: points to the buffer of auxiliary data;
msg_controllen: auxiliary data buffer length;
msg_flags: flag bit;

The sendmsg() function uses these members to describe the message to be sent and sends it to the target socket. In terms of specific implementation, the sendmsg function will divide the message to be sent into multiple parts (i.e., elements in the iov array), and then send them one by one. Various situations may be encountered during the transmission process (such as network delay, packet loss, etc.). At this time, you can set the flags parameter accordingly to achieve the desired effect.

7.6 recvmsg()

#include <sys/socket.h>

ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags);

                                         返回值：若成功返回接收消息字节数，若出错返回-1

Same as sendmsg(), in order to send the received data into multiple buffers, or if you want to receive auxiliary data, you can use the recvmsg() function.

The structure msghdr is used by recvmsg() to specify the input buffer to receive data. The parameter flags can be set to change the default behavior of recvmsg. On return, the msg_flags field in the msghdr structure is set to various characteristics of the received data.

msg_flags flags returned from recvmsg():

MSG_CTRUNC: Control data is truncated;
MSG_DONTWAIT: recvmsg() is in non-blocking mode;
MSG_EOR: End of record character received;
MSG_OOB: Out-of-band data received;
MSG_TRUNC: General data is truncated;

8. close()

After the server establishes a connection with the client, some read and write operations will be performed. After completing the read and write operations, the corresponding socket descriptor must be closed. For example, after operating an open file, you must call fclose to close the open file.

#include <unistd.h>

int close(int fd);

close The default behavior of a TCP socket is to mark the socket for closure and then immediately return to the calling process. This descriptor can no longer be used by the calling process, that is, it can no longer be used as the first parameter of read or write.

Note: The close operation only reduces the reference count of the corresponding socket descriptor by -1. Only when the reference count is 0 will the TCP client be triggered to send a termination request to the server.

9. TCP communication process in socket

The picture above shows the communication process of TCP in socket, which is roughly divided into four parts:

In the initialization phase, the server creates socket fd through the socket() interface, binds the server address and port through the bind() interface, and finally monitors the socket fd through listen(). The server officially enters the listening phase; the client uses socket() ) interface creates socket fd;
In the connection creation phase, the server calls accept() after listen() to block and wait for the client connection, and the client tries to connect through connect();
In the data transmission stage, the server and client communicate with each other;
In the communication termination phase, the client and server will enter the waving process;

9.1 Three-way handshake to establish connection

Before the handshake, the server has two states:

The initial default CLOSED state, the client was also in the CLOSED state before connect();
The listen state after listen() is called;

When the client calls connect(), it officially enters the connection creation phase. There are three handshakes in the process:

In the first handshake, the client calls connect() and sends a syn packet ( Synchronize Sequence Numbers ) to the server , syn = j, the client enters the SYN_SEND state and waits for confirmation from the server;
In the second handshake, the server receives the client's syn packet and confirms it (ack=j+1), and at the same time sends a syn packet (syn=k) to the client, that is, SYN + ACK, and the server enters the SYN_RECV state;
In the third handshake, the client receives the SYN + ACK packet from the server and sends an acknowledgment packet (ack=k+1) to the server. After the packet is sent, the client and server enter the ESTABLISHED state and complete the handshake;

When the three-way handshake is completed, the client and server enter the ESTABLISHED state, and data can be transmitted between the two ends.

The TCP handshake process is actually informing the other party of its Initial Sequence Number (ISN). This sequence number will be used as a basis in subsequent data transmission to ensure that TCP messages will not be confused during transmission.

Let's go back to the TCP Header structure. Sequence Number and Acknowledgment Number both occupy 32 bits, so the value range of seq and ack is 0 ~ 2^32 -1.
Every time seq and ack increase to 2^32-1, they start from 0 again. It is worth mentioning that the initial value (ISN) of seq does not start from 0 every time. Let's imagine that if it starts from 0, then after the TCP three-way handshake to establish the connection, the Client sends 30 messages, and then the Client is disconnected. So the Client reconnects and uses 0 as the initial seq again. In this way, two packets will have the same seq, and confusion will occur.
In fact, TCP's approach is to add 1 to the ISN every 4 microseconds. When the ISN reaches 2^32-1 and starts from 0 again, several hours have passed, and the previous seq=0 report The text no longer exists in this connection, thus avoiding the above problem.

9.2 Wave four times to disconnect

When waving for the first time, the Client sends a disconnection request (seq=m) to the Server, which is used to close the data transfer from the Client to the Server. The Client enters the FIN_WAIT-1 state. m is the last byte sequence number of the last message segment sent by the Client to the Server + 1;
After waving for the second time, the Server receives the Client request, sends a confirmation message (seq=n, ack=m+1) to the Client, and the Server enters the CLOSE_WAIT state. At this time, the TCP connection is in a half-open and half-closed state. If the Server sends data, the Client can receive it. n is the last byte sequence number of the last message segment sent by Server to Client + 1;
After waving for the third time, the Server sends a disconnection confirmation message (seq=u, ack=m+1) to the Client, and the Server enters the LAST_ACK state. u is the last byte sequence number + 1 of the last message segment sent by the Server to the Client in the half-open and half-closed state;
After waving for the fourth time, the Client sends a confirmation message to the Server (seq=m+1, ack=u+1) after receiving the Server disconnection confirmation message. Client enters TIME_WAIT state.

When the Server receives the Client's ACK, it enters the CLOSED state and disconnects the TCP connection. The Client waits in the TIME_WAIT state for a period of time to confirm that the last disconnection confirmation between the Client and the Server has arrived (if it does not arrive, the Server will resend the disconnection confirmation message segment in step (3) to the Client and tell the Client your last confirmation. Disconnect not received). If the Client does not receive the server's message segment again during the TIME-WAIT process, it enters the CLOSES state. The TCP connection is now disconnected.