[Computer Network] 1. Concept, TCP | UDP | Local socket programming

1. Basic concepts of network

Project source address

1.1 port (port)

The "IP address" of a computer is fixed, but there can be multiple connections (for example, a mobile phone as a client can use WeChat and Feishu at the same time; a server as a server can have ssh and http services at the same time). How to distinguish these? The concept of "port" is introduced: the port of each connection is different.

port is a 16-bit integer, up to 65535, and those less than 5000 are generally occupied by the system. When the client makes its request, the port of the client is temporarily allocated by the os; while the port of the server is generally a well-known one.

Use (clientip:clientport, serverip, serverport) quadruple to uniquely identify a connection.

insert image description here

1.2 IP address = network address (network) and host (host)

The IP address is divided into two parts: the network address (network) and the host (host):

  • Network address (network): refers to the common part of this group of IPs. For example, in the range of 192.168.1.1~192.168.1.255, the common part is 192.168.1.0.
  • Host (host): refers to the different parts of this group of IPs. For example, 1~255 in the above example means that there are 255 different IPs.
    • For example, the IPv4 address 192.0.2.12. If the first three bytes are regarded as network, the last byte is host. Then its subnet mask (netmask) can be expressed as 192.0.2.0/24 (255.255.255.0).

Among them, network address (network): is calculated by bitwise AND of IP and subnet mask (netmask).

Example 1: If the IP address is 192.168.2.99 (binary is 11000000.10101000.00000010.01100011).

  • There is a netmask of 255.255.255.252 (11111111.11111111.11111111.11111100 in binary). It means a 30-digit network address (network) and a 2-digit host (host), which means a maximum of 4 hosts (because only the last two digits of netmask remain unchanged, that is, 2 2 = 4 2^2= 422=4 hosts).
  • Then the network address (network) is the result of 255.255.255.252 & 192.168.2.163, which is binary 11000000.10101000.00000010.01100000 (that is, expressed as decimal 192.168.2.96)

Example 2: If the IP is 192.168.0.12 and the subnet mask is 255.255.255.252, then the network address is 192.0.2.0. The calculation process is as follows.

concept byte1 byte2 byte3 byte4 visualized values
IP address 11000000 00000000 00000010 00000000 192.0.2.12
subnet mask 11111111 11111111 11111111 00000000 255.255.255.0
website address 11000000 00000000 00000010 00000000 192.0.2.0

1.3 Subnet (subnet)

Subnet: It means that the first, first two, and first three bytes of an IPv4 address are part of the network.

  • "Class A network (Class A)": refers to a network with one byte and a host with three bytes, then you will have 2 24 = 16777216 2^{24} = 16777216224=16777216 addresses. We denote the one-byte subnet as 255.0.0.0.
  • "Class B network (Class B)": refers to the network (network) with two bytes, and the host (host) with two bytes, then you will have 2 16 = 65536 2^{16} = 65536216=65536 addresses. We refer to the two-byte subnet as 255.255.0.0.
  • "Class C network (Class C)": refers to a network with three bytes and a host with one byte, then you will have 2 8 = 256 2^{8} = 25628=256 addresses. We denote the three-byte subnet as 255.255.255.0.

insert image description here

insert image description here

1.4 Subnet mask (netmask)

  • Always the first half of the binary is all 1, and the second half of the binary is all 0
  • Can accept any single bit, not limited to 8, 16, 24 bits discussed above
  • However, a large string of numbers will be a bit difficult to use, such as a subnet mask like 255.192.0.0, people cannot intuitively know how many 1s and how many 0s there are. Put the slash after the IP address, and then use a decimal number to represent the number of digits in the network, like this: 192.0.2.12/30, so it is easy to know that there are 30 1s and 2 0s, so the number of hosts is 4.

1.5 reserved network segment

A relatively common phenomenon is that our units or organizations generally use IP addresses such as 10.0.xx or 192.168.xx. You may wonder, what does such an IP represent? Will different organizations using the same IP cause conflicts?

The reason behind this is that the International Standards Organization has specially designated some network segments in the IPv4 address space. These network segments will not be used as IP on the public network, but are only reserved for internal use. We call these addresses Reserve the network segment.

The following table shows three reserved network segments, which can accommodate 16777216, 1048576 and 65536 hosts respectively.

  • The second line in the figure below: Largest CIDR block (subnet mask) is 172.16.0.0/12, Classful description is described as 16 consecutive B-segment addresses:
    • It is because the information is obtained from 172.16.0.0/12, 172.16.0.0 is a class B network, 12 is the network number, the default network number of a class B network is 2*8=16 bits, and here is 12 bits, then There are 2^(16-12) = 16 consecutive subnets
  • The Largest CIDR block (subnet mask) in the third line in the figure below is 192.168.0.0/16, and the Classful description is described as 256 consecutive C-segment addresses:
    • It is because the information obtained from 192.168.0.0/16, 192.168.0.0 is a class C network, 16 is the network number, the default network number of a class C network is 3*8=24 bits, and here is 16 bits, then there is 2^(24-16) = 256 consecutive subnets

insert image description here

1.6 CIDR Notation: Meaning of 192.168.0.1/27

First of all, you have to understand that 192.168.0.1 is an IP address, more specifically, it belongs to the C type (because the C type is 24 bits, and then borrowing 3 bits, it is exactly 27 bits), and the following /27 indicates the length of the network number, also called VLSM (Variable Length Subnet Mask, variable length subnet mask), 192.168.0.1/27 belongs to CIDR (Classless Inter-Domain Routing) expression.

The IP address is divided into four parts by dots, and each part is 8 bits (bits), that is, a byte (byte). In a class C address, the network number occupies 24 bits, and the host number occupies 8 bits

network number host number (host)
11111111 11111111 11111111 00000000

The /27 above means that the network number occupies 27 bits

network number host number (host)
11111111 11111111 11111111 11100000
  • Network number network: that is, the network number borrows 3 bits from the host number, indicating that there are 2 3 = 8 2^3=823=8 subnets, the number of available hosts in each subnet is2 5 − 2 = 30 2^5-2=30252=30 , here -2 is because the head and tail network addresses and broadcast addresses are unavailable.
  • Subnet mask netmask: It can be expressed as 255.255.255.224, or 255.255.255.224/27, which is the decimal representation of the above binary conversion.
  • Network address: It is the bitwise AND of the IP address and the subnet mask, and the result is 192.168.0.0. The calculation process can be seen in the table below
  • Broadcast address: on the basis of the network address, the host number is obtained from "5 digits from right to left" and "binary 1 bit" to obtain 192.168.0.31. So valid IP addresses are 192.168.0.1 to 192.168.0.30. When a request is sent to a broadcast address, the request is sent to a group of hosts on the Ethernet network.
concept byte1 byte2 byte3 byte4 visualized values
IP address 11000000 10101000 00000000 00000001 192.168.0.1/27
subnet mask 11111111 11111111 11111111 11100000 255.255.255.224 or 255.255.255.224/27
website address 11000000 10101000 00000000 00000000 192.168.0.0
broadcast address 11000000 10101000 00000000 00011111 192.168.0.31

There are 8 subnets calculated above, then 192.168.0.1 falls in the first available subnet 192.168.0.1 ~ 192.168.0.30, each subnet has 32 IPs (determined by 11111 at the end of the broadcast address above), the subnet The network distribution is as follows:

subnet IP segment available hosts
one 192.168.0.0 ~ 192.168.0.31 192.168.0.1 ~ 192.168.0.30
two 192.168.0.32 ~ 192.168.0.63 192.168.0.33 ~ 192.168.0.62
three 192.168.0.64 ~ 192.168.0.95 192.168.0.65 ~ 192.168.0.94
Four 192.168.0.96 ~ 192.168.0.127 192.168.0.97 ~ 192.168.0.126
five 192.168.0.128 ~ 192.168.0.159 192.168.0.129 ~ 192.168.0.158
six 192.168.0.160 ~ 192.168.0.191 192.168.0.161 ~ 192.168.0.190
seven 192.168.0.192 ~ 192.168.0.223 192.168.0.193 ~ 192.168.0.222
eight 192.168.0.224 ~ 192.168.0.255 192.168.0.225 ~ 192.168.0.254

Two, socket

The figure below shows the process of handshake, communication, and wave between client and server. These connect, accppt, read, write, etc. are all realized through the socket concept:
insert image description here

2.1 sockaddr format

The sockaddr format is as follows:

typedef unsigned short int sa_family_t; // POSIX.1g 规范规定了地址族为 2 字节的值
struct sockaddr {
    
     // 描述通用套接字地址
    sa_family_t sa_family;  // 地址族.  16-bit
    char sa_data[14];   // 具体的地址值 112-bit
};

Among them sa_familyis the address family, indicating the way to interpret and save the address, <sys/socket.h> defined as follows in :

// 各种地址族的宏定义,AF 表示 Address Family,PF 表示 Protocal Family,二者是一一对应的,如下:
#define AF_UNSPEC PF_UNSPEC
#define AF_LOCAL  PF_LOCAL // 表示本地通信,和 AF_UNIX、AF_FILE 含义相同
#define AF_UNIX   PF_UNIX
#define AF_FILE   PF_FILE
#define AF_INET   PF_INET // 表示IPv4
#define AF_AX25   PF_AX25
#define AF_IPX    PF_IPX
#define AF_APPLETALK  PF_APPLETALK
#define AF_NETROM PF_NETROM
#define AF_BRIDGE PF_BRIDGE
#define AF_ATMPVC PF_ATMPVC
#define AF_X25    PF_X25
#define AF_INET6  PF_INET6 // 表示IPv6

There are three protocols: IPv4, IPv6, and local address, as shown in the following figure:

insert image description here

The sockaddr is a general address format, which is usually a function parameter. In the implementation, the family field of the first 16 bits can be used to determine which type is sockaddr_in, sockaddr_in6, and sockaddr_un. So it doesn't need to be designed so long, it just needs to be consistent with the shortest IPV4. (The general network address structure is an abstraction of all specific address structures. With a unified and operable address structure, a set of unified interfaces can be involved, which simplifies the interface design. The first field in the general address structure indicates the type of address , the following data can be parsed out through the specific type, generally only the pointer of the specific address type is forced to be converted into a general type, so that the operation will not cause the memory to go out of bounds.)

2.1.1 IPv4 sockaddr format

// IPv4 套接字地址,32bit 值
typedef uint32_t in_addr_t;
struct in_addr {
    
    
    in_addr_t s_addr;
};
  
// 描述 IPv4 的套接字地址格式
struct sockaddr_in {
    
    
    sa_family_t sin_family; // 16-bit, 为 "AF_INET" 常量, 表示 IPv4
    in_port_t sin_port;     // 端口, 16-bit, 即最多2^16=65536个端口, 通常为了防止冲突,大于 5000 的端口号可供应用程序使用
    struct in_addr sin_addr;    // Internet address, 32-bit
    unsigned char sin_zero[8]; // 这里仅仅用作占位符, 不做实际用处
};

// glibc定义的保留端口(Standard well-known ports)如下:
enum
  {
    
    
    IPPORT_ECHO = 7,    /* Echo service.  */
    IPPORT_DISCARD = 9,   /* Discard transmissions service.  */
    IPPORT_SYSTAT = 11,   /* System status service.  */
    IPPORT_DAYTIME = 13,  /* Time of day service.  */
    IPPORT_NETSTAT = 15,  /* Network status service.  */
    IPPORT_FTP = 21,    /* File Transfer Protocol.  */
    IPPORT_TELNET = 23,   /* Telnet protocol.  */
    IPPORT_SMTP = 25,   /* Simple Mail Transfer Protocol.  */
    IPPORT_TIMESERVER = 37, /* Timeserver service.  */
    IPPORT_NAMESERVER = 42, /* Domain Name Service.  */
    IPPORT_WHOIS = 43,    /* Internet Whois service.  */
    IPPORT_MTP = 57,
 
    IPPORT_TFTP = 69,   /* Trivial File Transfer Protocol.  */
    IPPORT_RJE = 77,
    IPPORT_FINGER = 79,   /* Finger service.  */
    IPPORT_TTYLINK = 87,
    IPPORT_SUPDUP = 95,   /* SUPDUP protocol.  */
 
    IPPORT_EXECSERVER = 512,  /* execd service.  */
    IPPORT_LOGINSERVER = 513, /* rlogind service.  */
    IPPORT_CMDSERVER = 514,
    IPPORT_EFSSERVER = 520,

    /* UDP ports.  */
    IPPORT_BIFFUDP = 512,
    IPPORT_WHOSERVER = 513,
    IPPORT_ROUTESERVER = 520,
 
    /* Ports less than this value are reserved for privileged processes.  */
    IPPORT_RESERVED = 1024,
 
    /* Ports greater this value are reserved for (non-privileged) servers.  */
    IPPORT_USERRESERVED = 5000
}

2.1.2 IPv6 sockaddr format

The actual IPv4 address is a 32-bit field. It is conceivable that the maximum number of addresses supported is 2 to the 32nd power, which is about 4.2 billion. It should be said that this number was still very large at the beginning of the design. However, with the rapid development of the Internet, the global As more and more devices are connected, this number gradually becomes insufficient, so the well-known IPv6 has made its grand debut.

struct sockaddr_in6 {
    
    
    sa_family_t sin6_family; // 16-bit, 为"AF_INET6"常量, 表示 IPv6
    in_port_t sin6_port;  // 传输端口号 16-bit
    uint32_t sin6_flowinfo; // IPv6 流控信息 32-bit
    struct in6_addr sin6_addr;  // IPv6 地址 128-bit
    uint32_t sin6_scope_id; // IPv6 域ID 32-bit
};

The entire structure length is 28 bytes

  • Among them, flow control information and domain IP are ignored for now. One of these two fields does not appear on the official website of glibc at all, and the other is currently an unused field.
  • The port is the same as the IPv4 address. The key address is upgraded from 32 bits to 128 bits. This number is terrifyingly large, which completely solves the problem of insufficient addressing numbers.

2.1.3 Local sockaddr format

To communicate with the outside world, it is necessary to at least tell the computer the address of the other party and which address is used. Communication with a remote computer also requires a port number. A remote socket is a process that directly sends a byte stream to a remote computer, and the remote computer may have multiple processes listening at the same time, so the port number is used to mark which process to send to.

AF_LOCAL is a local socket format used for communication between local processes. The local socket is essentially accessing the local file system, so naturally no port is needed.

The length of the path name is variable, such as /var/a.sock, /var/lib/a.sock, etc.

struct sockaddr_un {
    
    
    unsigned short sun_family; // 为 "AF_LOCAL" 常量
    char sun_path[108];   // 路径名
};

If two programs are running on the same machine, such as redis server and redis client, we also need to specify the port number when establishing the client to server. Because this is still a network communication protocol stack, it just communicates through the local localhost (127.0.0.1).

2.2 http and websocket

Http is an application layer protocol, which is based on the implementation of Tcp socket. Websocket is an enhancement of http, which utilizes the bidirectional characteristics of Tcp and enhances the transmission capability from server to client.

In the past, the client needed to continuously obtain information from the server through polling. After using websocket, the server can directly push information to the client.

3. TCP programming

Project source code-c language
tcp server and client Project source code-c language

3.1 server side

On the server side, the process of preparing to connect is as follows:

3.1.1 socket() Create a socket

  • domain:PF_INET、PF_INET6、PF_LOCAL
  • type: SOCK_STREAM is TCP, SOCK_DGRAM is UDP, SOCK_RAW is raw socket
  • protocol: Obsolete, generally fill in 0
int socket(int domain, int type, int protocol)

3.1.2 bind() set phone number

If the created socket needs to be used by others, you need to call the bind function to bind the socket and the socket address, just like registering our phone number at the telecommunications bureau.

bind(int fd, sockaddr *addr, socklen_t len)

Among them sockaddr * addris the general address format, which can be understood as void* addralthough the general address format is received, the actual incoming parameters may be in IPv4, IPv6 or local socket format. Among them lenis the length of the incoming address, it is a variable value, use it to resolve addr. The usage is as follows:

struct sockaddr_in name; // IPv4 格式
bind (sock, (struct sockaddr *)&name, sizeof(name)) // 转为通用格式

The server will only process the set addr address, for example, if a machine has two network cards (IPs are 202.61. Wildcard address".

The "wildcard address" configuration method is:

  • Address: set to INADDR_ANY if IPv4, set to IN6ADDR_ANY if IPv6
struct sockaddr_in name;
name.sin_addr.s_addr = htonl(INADDR_ANY); // IPV4 通配地址
  • Port: Usually specified, otherwise if specified as 0, the operating system will randomly select a free port

Examples are as follows:

// https://github.com/datager/yolanda/blob/master/chap-4/make_socket.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netinet/in.h>

int make_socket(uint16_t port) {
    
    
    int sock;
    struct sockaddr_in name;

    sock = socket(PF_INET, SOCK_STREAM, 0); // 创建字节流类型的IPV4 socket
    if (sock < 0) {
    
    
        perror("socket");
        exit(EXIT_FAILURE);
    }

    // 绑定到port和ip
    name.sin_family = AF_INET; // IPV4
    name.sin_port = htons (port);  // 指定端口
    name.sin_addr.s_addr = htonl (INADDR_ANY); // 通配地址
    if (bind(sock, (struct sockaddr *) &name, sizeof(name)) < 0) {
    
     // 把IPV4地址转换成通用地址格式,同时传递长度
        perror("bind");
        exit(EXIT_FAILURE);
    }

    printf("bind success with sock: %d", sock);
    return sock;
}

int main(int argc, char **argv) {
    
    
    int sockfd = make_socket(12345);
    exit(0);
}

// 输出如下
root@node:/home# gcc a.c
root@node:/home# ./a.out
bind success with sock: 3

3.1.3 listen() Connect the phone line, everything is ready

bind() just associates our socket with an address, just like registering a phone number. If we want to let others make a call, we also need to connect the telephone device to the telephone line so that the server can really be answered. This process needs to rely on listen().

The socket created by initialization can be regarded as an "active" socket, whose purpose is to actively initiate requests later (by calling the connect function, which will be mentioned later). Through the listen function, the original "active" socket can be converted into a "passive" socket, telling the operating system kernel: "My socket is used to wait for user requests." Of course, the operating system kernel will This makes all preparations for receiving user requests, such as completing the connection queue.

int listen(int socketfd, int backlog)
  • The first parameter socketfd is the socket descriptor
  • The second parameter, backlog, is officially interpreted as the size of the unfinished connection queue. The size of this parameter determines the number of concurrency that can be received. The larger this parameter is, the larger the number of concurrency will theoretically be. However, too large a parameter will occupy too much system resources, and some systems, such as Linux, do not allow this parameter to be changed.

3.1.4 accept() The phone rings

When the client's connection request arrives, the server responds successfully and the connection is established. At this time, the operating system kernel needs to notify the application of this event and let the application perceive the connection. This process is like a telecom operator completing the establishment of a telephone connection, and the answering party's phone rings to notify that someone has dialed a number. At this time, you need to pick up the phone and start answering.

int accept(int listensockfd, struct sockaddr *cliaddr, socklen_t *addrlen)
  • The first parameter listensockfd is a socket, which can be called a listen socket, because this is the socket obtained through a series of bind and listen operations.
  • The return value of a function has two parts:
    • first part
      • cliaddr is the address of the client obtained by pointer
      • addrlen tells us the size of the address
      • These two parameters can be understood as when we pick up the phone, we see the caller ID and know the other party's number
    • The other part is the return value of the function, which is a brand new descriptor representing the connection with the client

This function has two socket descriptors, the first is the listening socket descriptor listensockfd (which is used as an input parameter), and the second is the returned connected socket descriptor. You may ask, why separate the two sockets? Wouldn't it be nice to use one?

  • The situation here is very different from the situation of making a call. Once a connection is established for a call, no one else can call in. They will only get a voice broadcast: "The number you dialed is currently in a call." And the network program's An important feature is "concurrent" processing. It is impossible for an application to serve only one customer after it is running. If so, how many servers will be needed to meet the needs of the "chopping hands" nationwide?
  • So the listening socket always exists, and it is to serve thousands of customers until the listening socket is closed;
  • Once a client and server are successfully connected and the TCP three-way handshake is completed, the operating system kernel generates a "connected socket" for the client, allowing the application server to use the connected socket to communicate with the client.
  • If the application server completes the service for this customer, such as placing an online shopping order and making a successful payment, then the "connected socket" is closed, thus completing the release of the TCP connection. Please note that only this client connection is released at this time, and other serviced client connections may still exist. Most importantly, the listening socket is always in the "listening" state, waiting for new client requests to arrive and be serviced.

The complete code on the server side is as follows:

// https://github.com/datager/yolanda/blob/master/chap-5/tcp_server.c
#include "lib/common.h"

size_t readn(int fd, void *buffer, size_t size) {
    
    
    char *buffer_pointer = buffer;
    int length = size;

    while (length > 0) {
    
    
        int result = read(fd, buffer_pointer, length);

        if (result < 0) {
    
    
            if (errno == EINTR)
                continue;     /* 考虑非阻塞的情况,这里需要再次调用read */
            else
                return (-1);
        } else if (result == 0)
            break;                /* EOF(End of File)表示套接字关闭 */

        length -= result;
        buffer_pointer += result;
    }
    return (size - length);        /* 返回的是实际读取的字节数*/
}

void read_data(int sockfd) {
    
    
    ssize_t n;
    char buf[1024];

    int time = 0;
    for (;;) {
    
    
        fprintf(stdout, "block in read\n");
        if ((n = readn(sockfd, buf, 1024)) == 0)
            return;

        time++;
        fprintf(stdout, "1K read for %d \n", time);
        usleep(1000);
    }
}


int main(int argc, char **argv) {
    
    
    int listenfd, connfd;
    socklen_t clilen;
    struct sockaddr_in cliaddr, servaddr;

    listenfd = socket(AF_INET, SOCK_STREAM, 0);

    bzero(&servaddr, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(12345);
    bind(listenfd, (struct sockaddr *) &servaddr, sizeof(servaddr)); // bind到本地地址,端口为12345

    listen(listenfd, 1024); // listen的backlog为1024

    for (;;) {
    
     // 循环处理用户请求
        clilen = sizeof(cliaddr);
        connfd = accept(listenfd, (struct sockaddr *) &cliaddr, &clilen);
        read_data(connfd);   // 读取数据
        close(connfd);       // 关闭连接套接字,注意不是监听套接字
    }
}

3.2 client side

The process of initiating a connection request on the client side is as follows:

  • The first step is still the same as the server, to create a socket, the method is the same as before.
  • The difference is that the client needs to call connect to initiate a request to the server.

3.2.1 connect() to make a call

int connect(int sockfd, const struct sockaddr *servaddr, socklen_t addrlen)
  • The first parameter of the function, sockfd, is the connection socket, which is created by the socket function described above.
  • The second and third parameters servaddr and addrlen respectively represent the pointer to the socket address structure and the size of the structure. The socket address structure must contain the server's IP address and port number.

The client does not have to call the bind function before calling the function connect, because if necessary, the kernel will determine the source IP address and select a temporary port as the source port according to a certain algorithm.

The client code is as follows:

// https://github.com/datager/yolanda/blob/master/chap-5/tcpclient.c
#include "lib/common.h"

#define MESSAGE_SIZE 102400

void send_data(int sockfd) {
    
    
    char *query;
    query = malloc(MESSAGE_SIZE + 1);
    for (int i = 0; i < MESSAGE_SIZE; i++) {
    
    
        query[i] = 'a';
    }
    query[MESSAGE_SIZE] = '\0';

    const char *cp;
    cp = query;
    size_t remaining = strlen(query);
    while (remaining) {
    
    
        int n_written = send(sockfd, cp, remaining, 0);
        fprintf(stdout, "send into buffer %ld \n", n_written);
        if (n_written <= 0) {
    
    
            error(1, errno, "send failed");
            return;
        }
        remaining -= n_written;
        cp += n_written;
    }

    return;
}

int main(int argc, char **argv) {
    
    
    int sockfd;
    struct sockaddr_in servaddr;

    if (argc != 2)
        error(1, 0, "usage: tcpclient <IPaddress>");

    sockfd = socket(AF_INET, SOCK_STREAM, 0);

    bzero(&servaddr, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(12345);
    inet_pton(AF_INET, argv[1], &servaddr.sin_addr); // 把ip地址转化为用于网络传输的二进制数值: 函数名的p和n分别代表表达(presentation)和数值(numeric)
    int connect_rt = connect(sockfd, (struct sockaddr *) &servaddr, sizeof(servaddr));
    if (connect_rt < 0) {
    
    
        error(1, errno, "connect failed ");
    }
    send_data(sockfd);
    exit(0);
}

3.3 Three-way handshake

The essence of three times is that the channel is unreliable, but the double communication needs to reach an agreement on a certain problem. To solve this problem, no matter what information you include in the message, three times communication is the theoretical minimum. So the three-way handshake is not TCP itself However, it is caused by the requirement of "transmitting information reliably over unreliable channels".

If it is a TCP socket, calling the connect function will trigger the TCP three-way handshake process, and it will only return when the connection is successfully established or an error occurs. The error return may have the following situations:

  • The three-way handshake cannot be established, and the SYN packet sent by the client has no response, so a TIMEOUT error is returned. The common reason for this situation is that the corresponding server IP is wrongly written.
  • The client has received the RST (reset) response, and the client will immediately return a CONNECTION REFUSED error. This situation is more common when the request port is wrongly written when the client sends a connection request, because RST is a TCP segment sent by TCP when an error occurs. The three conditions for generating RST are: a SYN destined for a certain port arrives, but there is no listening server on the port (as mentioned above); TCP wants to cancel an existing connection; TCP receives a connection that does not exist at all section above.
  • The SYN packet sent by the client caused a "destination unreachable" error on the network, that is, the destination was unreachable. The more common reason for this situation is that the client and server cannot communicate with each other.

insert image description here

The following describes the blocking programming model (that is, the call will not return directly after being initiated, and will return after being processed by the operating system kernel):

  • First, the server completes the preparation of the passive socket through socket, bind and listen. Passive means waiting for others to connect, and then calling accept, it will block here, waiting for the connection of the client to come;
  • After the client calls the socket and connect functions, it will also block.
  • The next thing is done by the operating system kernel, more specifically, the operating system kernel network protocol stack is working.

The following is the specific process:

  • The client's protocol stack sends a SYN packet to the server, and tells the server to send the serial number j, and the client enters the SYNC_SENT state;
  • After the protocol stack of the server receives this packet, it responds with an ACK with the client, and the value of the response is j+1, which means the confirmation of the SYN packet j, and the server also sends a SYN packet, telling the client that my current sending sequence number is k , the server enters the SYNC_RCVD state;
  • After the client protocol stack receives the ACK, it makes the application from connect 阻塞调用返回, indicating that the one-way connection between the client and the server is successfully established, the state of the client is ESTABLISHED, and the client protocol stack will also respond to the SYN packet of the server, and the response data is k+1 ;
  • After the response packet reaches the server, the server protocol stack makes accept 阻塞调用返回the one-way connection from the server to the client successfully established at this time, and the server also enters the ESTABLISHED state.

A more figurative analogy is this: A and B want to make a call:

  • A first said to B: "Hello, are you there? I am, and my password is j."
  • After B received it, he replied loudly: "I have received your password j and am ready. Are you ready? My password is k."
  • After receiving it, A also replied loudly: "I have received your password k and am ready, let's start."

It can be seen that such a response process has been carried out a total of three times, which is why the establishment of a TCP connection is called a "three-way handshake".

3.4 read/write data

The socket description itself is no different from a local file descriptor, everything is a file in the UNIX world, which means that socket descriptors can be passed to functions that were originally designed to deal with local files. These functions include write and read functions for exchanging data:

3.4.1 write

The following three functions can be sent:

ssize_t write(int socketfd, const void *buffer, size_t size)
ssize_t send(int socketfd, const void *buffer, size_t size, int flags)
ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags)
  • The first function is a common file writing function. If you replace socketfd with a file descriptor, it is an ordinary file writing function.
  • If you want to specify options and send out-of-band data, you need to use the second function with flag. The so-called out-of-band data is an urgent data based on the TCP protocol, which is used for client-server emergency processing in specific scenarios.
  • If you want to specify multiple buffers to transmit data, you need to use the third function to send data in the form of structure msghdr.

write()Functions can write to files and networks, but with slightly different effects:

  • For ordinary file descriptors, a file descriptor represents an open file handle. By calling the write function, the operating system kernel helps us continuously write byte streams to the file system. Note that the size of the byte stream to be written is usually the same as the value of the input parameter size, otherwise it indicates an error.
  • For a socket descriptor, it represents a two-way connection. The number of bytes written by calling write on the socket descriptor may be less than the requested number, which is abnormal in the case of ordinary file descriptors. of.
  • The reason for this phenomenon is that the operating system kernel does a lot of work that we can't see on the surface to read and send data. Next, I will take the write function as an example, focusing on the concept of sending buffer.

3.4.1.1 Send buffer

When the TCP three-way handshake is successful and the TCP connection is successfully established, the operating system kernel will create supporting infrastructure for each connection, such as a sending buffer.

The size of the send buffer can be changed through socket options. When our application calls the write function, what we actually do is to copy the data from the application to the send buffer of the operating system kernel, which is not necessarily It is to write the data out through the socket.

Here are a few scenarios:

  • The first case is very simple, the sending buffer of the operating system kernel is large enough to directly accommodate this data, then everyone is happy, our program exits from the write call, and the returned number of written bytes is the data size of the application.
  • The second situation is that the sending buffer of the operating system kernel is large enough, but there is still data not sent, or the data is sent, but the sending buffer of the operating system kernel is not enough to accommodate the application data, in this case Now, what is your expected result? error? Or return directly?
    • The operating system kernel will not return or report an error, but the application program is blocked, that is to say, the application program stays at the write function call and does not return directly. The term "hang" also expresses the same meaning, but "hang" is from the perspective of the operating system kernel.
    • When will it return? Actually, each operating system kernel handles it differently. The practice of most UNIX systems is to wait until the application data can be completely placed in the send buffer of the operating system kernel before returning from the system call. How do you understand it? Don't forget, our operating system kernel is very smart, when the TCP connection is established, it starts to work. You can think of the sending buffer as a package pipeline. There is a smart and busy worker who constantly takes out packages (data) from the pipeline. This worker will encapsulate the taken out packages (data) into TCP according to the semantics of TCP/IP The MSS packet of IP and the MTU packet of IP finally send the data through the data link layer. In this way, our sending buffer is partially empty again, so we can continue to move some data from the application to the sending buffer, and this continues until at a certain point, the data of the application can be completely placed in the sending buffer . At this point, the blocking call to write returns. Note that at the time of return, not all application data is sent out, there is still some data in the send buffer, and this part of data will be sent out by the operating system kernel through the network later.
      insert image description here

3.4.2 read

The read function requires the operating system kernel to read the maximum number of bytes (size) from the socket description word socketfd, and store the result in the buffer. The return value tells us the actual number of bytes read, there are some special cases:

  • If the return value is 0, it means EOF (end-of-file), which means that the peer has sent a FIN packet in the network, and the disconnection needs to be handled
  • If the return value is -1, it means an error
  • Of course, if it is non-blocking I/O, the situation will be slightly different. We will focus on the characteristics of non-blocking I/O in the following improvement articles.
ssize_t read (int socketfd, void *buffer, size_t size)

Note that here is a maximum of size bytes to read. If we want the application to read size bytes each time, we need to write the following function to read in a loop:

// 从 socketfd 描述字中读取 "size" 个字节
ssize_t readn(int fd, void *vptr, size_t size) {
    
    
    size_t  nleft;
    ssize_t nread;
    char    *ptr;
 
    ptr = vptr;
    nleft = size;

    while (nleft > 0) {
    
     // 在没读满 size 个字节之前,一直都要循环下去
        if ( (nread = read(fd, ptr, nleft)) < 0) {
    
    
            if (errno == EINTR) // 非阻塞 I/O 的情况下,没有数据可以读,需要继续调用 read
                nread = 0;
            else
                return(-1);
        } else if (nread == 0) // 读到对方发出的 FIN 包,表现形式是 EOF,此时需要关闭套接字
            break;

        nleft -= nread; ptr += nread; // 需要读取的字符数减少,缓存指针往下移动。
    }
    return(n - nleft); // 读取 EOF 跳出循环后,返回实际读取的字符数
}

3.4.3 Buffer experiment

We use a client-server example to explain the concept of read buffer and send buffer. In this example, the client sends data continuously, and the server sleeps after reading a piece of data to simulate the time required for actual business processing. See Sections 3.1 and 3.2 for code details (https://github.com/datager/yolanda/blob/master/chap-5/). The effect is as follows:

Experiment 1: Observe client data sending behavior

  • The client program sends a large byte stream ( define MESSAGE_SIZE 102400), which is not printed out until all the byte streams are sent fprintf(stdout, "send into buffer %ld \n", n_written);, indicating that it send()has , that is, the actual write returned by the blocking socket The number of incoming bytes and the number of requested bytes are equal.
  • The server continuously prints the process of reading the byte stream on the screen
    insert image description here

Experiment 2: Server processing slows down

  • If we slightly increase the sleep time of the server, adjust the number of bytes sent by the client from 10240000 to 1024000, and run the example just now again, we will find that the client prints quickly.
  • But at the same time, the server reading program is still printing the progress of reading data on the screen, showing that the client reading program is still working hard to read data from 缓冲区it .
    insert image description here

in conclusion:

  • Successful sending only means that the data has been copied into the sending buffer, and it does not mean that the connected peer has received all the data.
  • As for when it is sent to the receive buffer of the other end, or further, when it is received by the other party's application buffer, it is completely transparent to us.
  • It is definitely not possible to increase the buffer infinitely: because the write() function sends data only to the kernel buffer, and when to send it is determined by the kernel.
    • When the kernel buffer is always full of data, there will be sticky packets
    • The transmission size MTU of the network will also limit the size of each transmission
    • Since data blockage consumes a large amount of memory resources, the efficiency of resource usage is not high.

4. UDP programming

TCP is a connection-oriented "stream" protocol, and UDP is a "datagram" protocol.

  • TCP is similar to making a phone call: dial a number, connect to the phone, and start communicating, which correspond to TCP's three-way handshake and message transmission. Once the connection between the two parties is established, the two parties must know who each other is when they talk. At this time we say that this kind of dialogue has a context.
  • UDP is similar to sending a postcard: the sender fills in the address and zip code of the recipient in the postcard, and after it is delivered to the mailbox of the post office, it can be ignored.
    • The sender can also mail the second, third, or even fourth postcard to the recipient, but there is no relationship between these postcards, and the order of their arrival is not guaranteed. The fourth postcard sent last reached the receiver first, because there is no serial number, and the receiver does not know that this is the fourth postcard sent;
    • Also, even if the recipient does not receive the postcard, there is no way to resend the postcard.

The specific differences are as follows:

  • TCP is a connection-oriented protocol. On the basis of IP packets, TCP adds capabilities such as retransmission, confirmation, orderly transmission, and congestion control. The two parties to the communication work in a definite context.
  • UDP does not have such a definite context, it is an unreliable communication protocol, there is no retransmission and acknowledgment, no order control, and no congestion control. We can simply understand that on the basis of IP packets, the ability of UDP to increase is limited. UDP does not guarantee the effective delivery of packets, and does not guarantee the order of packets. That is to say, when using UDP, the application must do packet loss, retransmission, and packet assembly by itself.
  • The sending and receiving of TCP is in a context every time, a process like this:
    • A Connected: receive→send→receive→send→…
    • B is connected: receive→send→receive→send→…
  • And each receiving and sending of UDP is an independent context, like this:
    • Receive A→Send A→Receive B→Send B→Receive C→Send C→…

Because UDP is relatively simple, there are still many suitable scenarios:

  • DNS service, SNMP service
  • Multiplayer communication scenarios, such as chat rooms, multiplayer games, etc.

The procedure of the UDP program is as follows:

insert image description here

recvfrom() is defined as follows:

  • sockfd, buff and nbytes are the first three parameters. sockfd is a socket descriptor created locally, buff points to the local cache, and nbytes indicates the maximum received data bytes.
  • The fourth parameter flags is a parameter related to I/O, we don't use it here, set it to 0.
  • The latter two parameters, from and addrlen, actually return information such as the address and port of the peer sender, which is very different from TCP, which determines the peer information through the descriptor information obtained by accept(). However, every time a UDP message is received, the information of the peer end will be obtained, that is to say, there is no context between the message and the message.
  • The return value of the function: the actual number of bytes received.
#include <sys/socket.h>
ssize_t recvfrom(int sockfd, void *buff, size_t nbytes, int flags, 
          struct sockaddr *from, socklen_t *addrlen); 

The sendto() function is defined as follows:

  • The first three parameters are sockfd, buff and nbytes. sockfd is a socket descriptor created locally, buff points to the sent buffer, and nbytes indicates the number of bytes sent.
  • The fourth parameter flags is still set to 0.
  • The latter two parameters to and addrlen indicate the information such as the address and port of the peer to send.
  • The return value of the function: the actual number of bytes sent.
  • The maximum length of data that can be sent is: 65535- IP header (20) - UDP header (8) = 65507 bytes. When using the sendto function to send data, if the length of the sent data is greater than this value, the function will return an error.
    • And because IP has a maximum MTU, so
      • UDP packet size should be 1500 - IP header(20) - UDP header(8) = 1472(Bytes)
      • The size of the TCP packet should be 1500 - IP header(20) - TCP header(20) = 1460(Bytes)
#include <sys/socket.h>
ssize_t sendto(int sockfd, const void *buff, size_t nbytes, int flags,
                const struct sockaddr *to, socklen_t *addrlen); 

4.1 server side

// https://github.com/datager/yolanda/blob/master/chap-6/udpserver.c
#include "lib/common.h"
 
static int count;
static void recvfrom_int(int signo) {
    
    
    printf("\nreceived %d datagrams\n", count);
    exit(0);
}
 
int main(int argc, char **argv) {
    
    
    int socket_fd;
    socket_fd = socket(AF_INET, SOCK_DGRAM, 0); // 创建udp类型(SOCK_DGRAM) 的 socket
 
    struct sockaddr_in server_addr;
    bzero(&server_addr, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_addr.s_addr = htonl(INADDR_ANY);
    server_addr.sin_port = htons(SERV_PORT);
	
    bind(socket_fd, (struct sockaddr *) &server_addr, sizeof(server_addr)); // 绑定到本地端口上
 
    socklen_t client_len;
    char message[MAXLINE];
    count = 0;
 
    signal(SIGINT, recvfrom_int); // 创建了一个信号处理函数,以便在响应“Ctrl+C”退出时,打印出收到的报文总数
 
    struct sockaddr_in client_addr;
    client_len = sizeof(client_addr);
    for (;;) {
    
    
        int n = recvfrom(socket_fd, message, MAXLINE, 0, (struct sockaddr *) &client_addr, &client_len); // 接收
        message[n] = 0;
        printf("received %d bytes: %s\n", n, message);
 
        char send_line[MAXLINE];
        sprintf(send_line, "Hi, %s", message);
        sendto(socket_fd, send_line, strlen(send_line), 0, (struct sockaddr *) &client_addr, client_len); // 发送
 
        count++;
    }
}

4.2 client side

In this example, after reading the input string from stdin, send it to the server, and print the processed message of the server to stdout.

// https://github.com/datager/yolanda/blob/master/chap-6/udpclient.c
#include "lib/common.h"
#define    MAXLINE     4096

int main(int argc, char **argv) {
    
    
    if (argc != 2) {
    
    
        error(1, 0, "usage: udpclient <IPaddress>");
    }
    
    int socket_fd;
    socket_fd = socket(AF_INET, SOCK_DGRAM, 0); // 创建udp类型(SOCK_DGRAM) 的 socket
 
 	// 初始化: 目标地址server_addr.sin_addr 和 端口server_addr.sin_port
    struct sockaddr_in server_addr;
    bzero(&server_addr, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons(SERV_PORT);
    inet_pton(AF_INET, argv[1], &server_addr.sin_addr);
 
    socklen_t server_len = sizeof(server_addr);
 
    struct sockaddr *reply_addr; // 用于接收
    reply_addr = malloc(server_len);
 
    char send_line[MAXLINE], recv_line[MAXLINE + 1];
    socklen_t len;
    int n;
 
    while (fgets(send_line, MAXLINE, stdin) != NULL) {
    
     // 从 stdin 读取的字符
        int i = strlen(send_line);
        if (send_line[i - 1] == '\n') {
    
    
            send_line[i - 1] = 0;
        }
 
        printf("now sending %s\n", send_line);
        size_t rt = sendto(socket_fd, send_line, strlen(send_line), 0, (struct sockaddr *) &server_addr, server_len); // 发送
        if (rt < 0) {
    
    
            error(1, errno, "send failed ");
        }
        printf("send bytes: %zu \n", rt);
 
        len = 0;
        n = recvfrom(socket_fd, recv_line, MAXLINE, 0, reply_addr, &len); // 接收
        if (n < 0)
            error(1, errno, "recvfrom failed");
        recv_line[n] = 0;
        fputs(recv_line, stdout); // 打印到 stdout
        fputs("\n", stdout);
    }
 
    exit(0);
}

4.3 Experiment

To better understand the difference between UDP and TCP, let's simulate three operating scenarios of UDP. You may wish to think about the difference between the results of these three scenarios and TCP?


Scenario 1: Only run the client

  • If we only run the client, first input "1" on stdin will successfully call sendto(), and then it will be blocked on recvfrom().
    insert image description here
  • Remember the TCP program? If the server is not enabled, the connect function of the TCP client will directly return the "Connection refused" error message (this information is sent by the TCP protocol stack of the operating system kernel of the other party, not by the unstarted server of the other party). In the UDP program, it will always be blocked here.
    • By default, this blocking behavior is unreasonable. We can add a timeout for processing. Of course, we can implement a complex request-confirmation mode by ourselves. This is similar to TCP, and HTTP/3 does this.

Scenario 2: Start the server first, and then start the client

  • In this scenario, we first enable the server to listen on the port, and then enable the client:
  • We input g1 and g2 once on the client, and the server prints the received characters on the screen, and we can see that our client also received the server's response: "Hi, g1" and "Hi, g2".
    insert image description here

Scenario 3: Start the server and start two clients again

  • In this experiment, after the server is started, two clients are started one by one and send messages. We see that the messages sent by the two clients are received by the server in turn, and the client can also receive the messages processed by the server.
    insert image description here

If we kill the server process at this time, we can see that the signal function prints out the number of packets received by the server before the process exits.
insert image description here

After that, we restart the server process and use client1 and client2 to continue sending new messages. We can see a very different result from TCP: the server can continue to receive client messages after restarting, but TCP cannot, and TCP After disconnection, you must reconnect to send message information. However, the "connectionless" feature of UDP packets can continue to send packets after the UDP server is restarted. This is the best description of the "no context" of UDP packets. The experimental process is as follows:
insert image description here

4.4 udp 的 connect()

4.4.1 client 的 connect()

Let's start with an example of a client. In this example, the client calls connect() on a UDP socket, then sends a string from standard input to the server, and receives processed messages from the server. Of course, sending and receiving messages with the server is done by calling the functions sendto() and recvfrom().

Lines 20-22 call connect() to "bind" the UDP socket and IPv4 address. The name of connect() here is a bit misleading. In fact, a better choice may be called setpeername();

// 示例代码就是本代码段
#include "lib/common.h"
#define    MAXLINE     4096
 
int main(int argc, char **argv) {
    
    
    if (argc != 2) {
    
    
        error(1, 0, "usage: udpclient1 <IPaddress>");
    }
 
    int socket_fd;
    socket_fd = socket(AF_INET, SOCK_DGRAM, 0);
 
    struct sockaddr_in server_addr;
    bzero(&server_addr, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_port = htons(SERV_PORT);
    inet_pton(AF_INET, argv[1], &server_addr.sin_addr);
 
    socklen_t server_len = sizeof(server_addr);
 
    if (connect(socket_fd, (struct sockaddr *) &server_addr, server_len)) {
    
     // 调用 connect() 将 UDP 套接字和 IPv4 地址进行了“绑定”,这里 connect() 的名称有点让人误解,其实可能更好的选择是叫做 setpeername()
        error(1, errno, "connect failed");
    }
 
    struct sockaddr *reply_addr;
    reply_addr = malloc(server_len);
 
    char send_line[MAXLINE], recv_line[MAXLINE + 1];
    socklen_t len;
    int n;
 
    while (fgets(send_line, MAXLINE, stdin) != NULL) {
    
    
        int i = strlen(send_line);
        if (send_line[i - 1] == '\n') {
    
    
            send_line[i - 1] = 0;
        }
 
        printf("now sending %s\n", send_line);
        size_t rt = sendto(socket_fd, send_line, strlen(send_line), 0, (struct sockaddr *) &server_addr, server_len);
        if (rt < 0) {
    
    
            error(1, errno, "sendto failed");
        }
        printf("send bytes: %zu \n", rt);
        
        len = 0;
        recv_line[0] = 0;
        n = recvfrom(socket_fd, recv_line, MAXLINE, 0, reply_addr, &len);
        if (n < 0)
            error(1, errno, "recvfrom failed");
        recv_line[n] = 0;
        fputs(recv_line, stdout);
        fputs("\n", stdout);
    }
 
    exit(0);
}

When the server is not enabled, the effect of running this client is as follows:
insert image description here

Unlike the TCP connect call that causes the TCP three-way handshake to establish a valid TCP connection, the call of the UDP connect function does not cause network interaction with the server target, that is, it does not trigger the so-called "handshake" message sending and response.

So what's the point of connecting() the UDP socket? In fact, the above example has already given the answer, which is mainly to allow the application to receive "asynchronous error" information.

  • If we think back to the client that does not call the connect() operation (section 4.2 of this article), the client will not report an error if the server is not turned on, and the program will only block on recvfrom, waiting for the return (or timeout).
  • And here (section 4.4 of this article), by performing connect() on the UDP socket, a "context" is established for the UDP socket, and the socket is associated with the address and port of the server. It is this binding The determined relationship gives the operating system kernel the necessary information, and can associate the information received by the operating system kernel with the corresponding socket.
    • In fact, when we call sendto() or send(), the application message is sent, our application program returns, the operating system kernel takes over the message, and then the operating system starts to try to send to the corresponding address and port, Because the corresponding address and port are unreachable, an ICMP message will be returned to the operating system kernel, and the ICMP message contains information such as the destination address and port.
    • If we do not perform connect() to establish the mapping relationship between ("UDP socket" - "destination address + port"), the operating system kernel will have no way to associate the ICMP unreachable information with the UDP socket , there is no way to notify the application of ICMP information.
    • If we perform connect(), it helps the operating system kernel to calmly establish the mapping relationship between ("UDP socket" - "destination address + port"). When an ICMP unreachable message is received, the operating system The kernel can find out which UDP socket has the destination address and port from the mapping table. Don't forget that the socket is globally unique within the operating system. When we call recvfrom() or recv(), you can receive the "Connection Refused" message returned by the operating system kernel.

After connecting() to UDP, the sending and receiving functions suggest using send() and recv():

  • Use send() or write() to send, if you use sendto(), you need to set the relevant to address information to zero;
  • Use recv() or read() to receive. If you use recvfrom(), you need to set the corresponding from address information to zero.
  • However, different UNIX implementations behave differently in this regard.
    • In my Linux 4.4.0 environment, the system will automatically ignore to and from information when using sendto() and recvfrom().
    • In my macOS 10.13, I really need to abide by this rule: some strange results will be obtained with sendto() or recvfrom(), but normal after switching back to send() and recv().
    • Conclusion: Considering compatibility, we also recommend these conventional practices, that is, send() and recv() are recommended. So in the next program, I will use this approach to achieve.

4.4.2 server 的 connect()

Generally speaking, the server will not actively initiate the connect() operation, because once it does, the server can only respond to one client. However, sometimes such a situation is not ruled out: once a client and server send UDP packets, the server must serve the only client.

The server is as follows: Lines 39-41 call the connect() operation to bind the UDP socket to client_addr:

// 示例代码就是本代码段
#include "lib/common.h"
 
static int count;
 
static void recvfrom_int(int signo) {
    
    
    printf("\nreceived %d datagrams\n", count);
    exit(0);
}
 
int main(int argc, char **argv) {
    
    
    int socket_fd;
    socket_fd = socket(AF_INET, SOCK_DGRAM, 0);
 
    struct sockaddr_in server_addr;
    bzero(&server_addr, sizeof(server_addr));
    server_addr.sin_family = AF_INET;
    server_addr.sin_addr.s_addr = htonl(INADDR_ANY);
    server_addr.sin_port = htons(SERV_PORT);
 
    bind(socket_fd, (struct sockaddr *) &server_addr, sizeof(server_addr));
 
    socklen_t client_len;
    char message[MAXLINE];
    message[0] = 0;
    count = 0;
 
    signal(SIGINT, recvfrom_int);
 
    struct sockaddr_in client_addr;
    client_len = sizeof(client_addr);
 
    int n = recvfrom(socket_fd, message, MAXLINE, 0, (struct sockaddr *) &client_addr, &client_len);
    if (n < 0) {
    
    
        error(1, errno, "recvfrom failed");
    }
    message[n] = 0;
    printf("received %d bytes: %s\n", n, message);
 
    if (connect(socket_fd, (struct sockaddr *) &client_addr, client_len)) {
    
     // 39-41 行调用 connect(),将 UDP 套接字和 client_addr 绑定
        error(1, errno, "connect failed");
    }
 
    while (strncmp(message, "goodbye", 7) != 0) {
    
    
        char send_line[MAXLINE];
        sprintf(send_line, "Hi, %s", message);
 
        size_t rt = send(socket_fd, send_line, strlen(send_line), 0);
        if (rt < 0) {
    
    
            error(1, errno, "send failed ");
        }
        printf("send bytes: %zu \n", rt);
 
        size_t rc = recv(socket_fd, message, MAXLINE, 0);
        if (rc < 0) {
    
    
            error(1, errno, "recv failed");
        }
        
        count++;
    }
 
    exit(0);
}

Next, we start the server first, and then start two clients in turn (clients are as in Section 4.4.1 of this article), namely client1 and client2, and let client1 send UDP packets first.

We see that client1 sends the message first, and then the server "binds" with client1 through connect. In this way, client 2 gets an ICMP error from the operating system kernel, and the error is returned in the recv function, showing " Connection refused" error message.
insert image description here

Using connect() to bind the local address and port for UDP is to allow our application to quickly obtain notifications of asynchronous error messages, and also to obtain certain performance improvements.

  • Because if you do not use the connect() method, every time you send a message, you will need this process: connect socket→send message→disconnect socket→connect socket→send message→disconnect socket→ ………
  • And if you use the connect() method, it will become as follows: connect socket→send message→send message→...→finally disconnect the socket
  • We know that connecting a socket requires a certain amount of overhead, such as looking up routing table information. Therefore, the UDP client can obtain a certain performance improvement through connect().

5. Local socket programming

Local sockets are IPC, which is an implementation of local interprocess communication. In addition to local sockets, other technologies, such as pipelines and shared message queues, are also common methods for inter-process communication. However, local sockets are generally suitable for inter-process communication on the same host because of their convenient development and high acceptance. of various scenes.

"Local socket" was also known as "UNIX domain socket".

  • TCP/UDP: Even if it is set to 127.0.0.1 to communicate locally, the network protocol stack must be used
  • Local socket: It is a method of inter-process calling on a single machine, which reduces the complexity of protocol stack implementation and is much more efficient than TCP/UDP. Similar mechanisms include UNIX pipes, shared memory, and RPC calls.

The essence of local sockets is still inter-process communication, but with the help of programming semantics of sockets, such as stream and datagram, the bottom must not use the IP protocol.

5.1 Local byte stream socket

The biggest difference between "local byte stream socket" and TCP server and client programming is the difference in socket type. When the local byte stream socket identifies the server, it no longer uses the IP address and port, but uses the local file.

The server side is as follows: After opening the local socket, the server receives the byte stream sent by the client, and returns a new byte stream to the client.

// https://github.com/datager/yolanda/blob/master/chap-7/unixstreamserver.c
#include  "lib/common.h"
int main(int argc, char **argv) {
    
    
    if (argc != 2) {
    
    
        error(1, 0, "usage: unixstreamserver <local_path>");
    }
 
    int 		listenfd, connfd;
    socklen_t 	clilen;
    struct 		sockaddr_un cliaddr, servaddr;
 
    listenfd = socket(AF_LOCAL, SOCK_STREAM, 0); // TCP 的类型是 AF_INET 和字节流类型;UDP 的类型是 AF_INET 和数据报类型; 本地 socket 是 AF_UNIX(其和 AF_LOCAL 等价)
    if (listenfd < 0) {
    
    
        error(1, errno, "socket create failed");
    }
 
 	// 创建了一个本地地址,这里的本地地址和 IPv4、IPv6 地址可以对应,数据类型为 sockaddr_un
    char *local_path = argv[1]; // 必须是绝对路径才能在任意目录启动/管理程序。是文件(而不是目录),用户要有文件的chown/chmod权限
    unlink(local_path); // 把存在的文件删除掉,来保持幂等性
    bzero(&servaddr, sizeof(servaddr));
    servaddr.sun_family = AF_LOCAL;
    strcpy(servaddr.sun_path, local_path); // 对 sun_path 设置一个本地文件路径
 
    if (bind(listenfd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0) {
    
     // bind(如果文件不存在, bind 会创建此文件)
        error(1, errno, "bind failed");
    }
 
    if (listen(listenfd, LISTENQ) < 0) {
    
     // listen
        error(1, errno, "listen failed");
    }
 
    clilen = sizeof(cliaddr);
    if ((connfd = accept(listenfd, (struct sockaddr *) &cliaddr, &clilen)) < 0) {
    
    
        if (errno == EINTR)
            error(1, errno, "accept failed"); // back to for()
        else
            error(1, errno, "accept failed");
    }
 
    char buf[BUFFER_SIZE];

    while (1) {
    
    
        bzero(buf, sizeof(buf));
        if (read(connfd, buf, BUFFER_SIZE) == 0) {
    
    
            printf("client quit");
            break;
        }
        printf("Receive: %s", buf);
 
        char send_line[MAXLINE];
        sprintf(send_line, "Hi, %s", buf);
        int nbytes = sizeof(send_line);
        if (write(connfd, send_line, nbytes) != nbytes)
            error(1, errno, "write error");
    }
 
    close(listenfd);
    close(connfd);
 
    exit(0);
}

The client side is as follows:

// https://github.com/datager/yolanda/blob/master/chap-7/unixstreamclient.c
#include "lib/common.h"
 
int main(int argc, char **argv) {
    
    
    if (argc != 2) {
    
    
        error(1, 0, "usage: unixstreamclient <local_path>");
    }
 
    int sockfd;
    struct sockaddr_un servaddr;
 
    sockfd = socket(AF_LOCAL, SOCK_STREAM, 0);
    if (sockfd < 0) {
    
    
        error(1, errno, "create socket failed");
    }
 
    bzero(&servaddr, sizeof(servaddr));
    servaddr.sun_family = AF_LOCAL;
    strcpy(servaddr.sun_path, argv[1]); // 因为是本地 socket,所以是目标文件路径(而不是 ip 和 port)
 
    if (connect(sockfd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0) {
    
     // 因为是本地 socket,所以内部无三次握手过程
        error(1, errno, "connect failed");
    }
 
    char send_line[MAXLINE];
    bzero(send_line, MAXLINE);
    char recv_line[MAXLINE];
 
    while (fgets(send_line, MAXLINE, stdin) != NULL) {
    
    
        int nbytes = sizeof(send_line);
        if (write(sockfd, send_line, nbytes) != nbytes)
            error(1, errno, "write error");
 
        if (read(sockfd, recv_line, MAXLINE) == 0)
            error(1, errno, "server terminated prematurely");
 
        fputs(recv_line, stdout);
    }
 
    exit(0);
}

Next, we will run this program to deepen our understanding of this.

5.1.1 Only start the client

In the first scenario, we only start the client program: we see that because the server is not started, there is no local socket listening on the file /tmp/unixstream.sock, and the client directly reports an error, prompting us that there is no file.
insert image description here

5.1.2 The server listens on the file path without permission

Under Linux, executing any application has the concept of an application owner. Here, we let the application owner of the server program not have permission to the /var/lib/ directory, and then try to start the server program, and the error will be reported as follows:

$ ./unixstreamserver /var/lib/unixstream.sock
bind failed: Permission denied (13)

This result tells us that the user who starts the server program must have permission to the local listening path.

Try to start the program as the root user, we see that the server program is running normally:
insert image description here

Open another shell, we see that a local file is created /var/libunder , the size is 0, and there is an (=) sign at the end of the file. In fact, this is the file that is automatically created when bind:
insert image description here

If you use the netstat command to check the UNIX domain socket, you will find that the unixstreamserver process is listening on the file path /var/lib/unixstream.sock. As we expected, the program we wrote runs on the same machine as the famous Kubernetes, and the principle and behavior are exactly the same. As shown below:
insert image description here

5.1.3 server-client response

Now let both the server and client start normally, and the client sends characters sequentially:
insert image description here

We can see that the server has received the bytes sent by the client one after another, and at the same time, the client has also received the response from the server; finally, when we use Ctrl+C to let the client program exit, the server also exits normally.

5.2 Local datagram socket

The server side is as follows: "local datagram socket" is different from the previous "local byte stream socket" in the following points:

  • The local socket created in line 9, the socket type is AF_LOCAL, and the protocol type is SOCK_DGRAM.
  • Lines 21-23 bind() to the local address, and do not call listen() and accept() again. Recall that this is actually the same as UDP.
  • Lines 28 to 45 use recvfrom() and sendto() to send and receive datagrams instead of read() and send(), which is actually consistent with the UDP network program.
// https://github.com/datager/yolanda/blob/master/chap-7/unixdataserver.c
#include  "lib/common.h"
 
int main(int argc, char **argv) {
    
    
    if (argc != 2) {
    
    
        error(1, 0, "usage: unixdataserver <local_path>");
    }
 
    int socket_fd;
    socket_fd = socket(AF_LOCAL, SOCK_DGRAM, 0); // AF_LOCAL, SOCK_DGRAM
    if (socket_fd < 0) {
    
    
        error(1, errno, "socket create failed");
    }
 
    struct sockaddr_un servaddr;
    char *local_path = argv[1];
    unlink(local_path);
    bzero(&servaddr, sizeof(servaddr));
    servaddr.sun_family = AF_LOCAL;
    strcpy(servaddr.sun_path, local_path);
 
    if (bind(socket_fd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0) {
    
    
        error(1, errno, "bind failed");
    }
 
    char buf[BUFFER_SIZE];
    struct sockaddr_un client_addr;
    socklen_t client_len = sizeof(client_addr);
    while (1) {
    
    
        bzero(buf, sizeof(buf));
        if (recvfrom(socket_fd, buf, BUFFER_SIZE, 0, (struct sockadd *) &client_addr, &client_len) == 0) {
    
    
            printf("client quit");
            break;
        }
        printf("Receive: %s \n", buf);
 
        char send_line[MAXLINE];
        bzero(send_line, MAXLINE);
        sprintf(send_line, "Hi, %s", buf);
 
        size_t nbytes = strlen(send_line);
        printf("now sending: %s \n", send_line);
 
        if (sendto(socket_fd, send_line, nbytes, 0, (struct sockadd *) &client_addr, client_len) != nbytes)
            error(1, errno, "sendto error");
    }
 
    close(socket_fd);
 
    exit(0);
}

The client side is as follows. This program is basically the same as the UDP network programming example. We can regard it as a UDP program that replaces the IP address and port with a local file, but there is still a very big difference. The difference lies in the bind() of the local socket to a local path in lines 16-22, but the UDP client program does not need to do this:

  • The reason why the local datagram socket does this is that it needs to specify a local path so that the address can be found correctly when the server returns the packet;
  • For the UDP client, the data can be matched by the local address and port of the UDP packet.
// https://github.com/datager/yolanda/blob/master/chap-7/unixdataclient.c
#include "lib/common.h"
 
int main(int argc, char **argv) {
    
    
    if (argc != 2) {
    
    
        error(1, 0, "usage: unixdataclient <local_path>");
    }
 
    int sockfd;
    struct sockaddr_un client_addr, server_addr;
 
    sockfd = socket(AF_LOCAL, SOCK_DGRAM, 0);
    if (sockfd < 0) {
    
    
        error(1, errno, "create socket failed");
    }
 
    bzero(&client_addr, sizeof(client_addr)); // bind an address for us
    client_addr.sun_family = AF_LOCAL;
    strcpy(client_addr.sun_path, tmpnam(NULL));
 
    if (bind(sockfd, (struct sockaddr *) &client_addr, sizeof(client_addr)) < 0) {
    
    
        error(1, errno, "bind failed");
    }
 
    bzero(&server_addr, sizeof(server_addr));
    server_addr.sun_family = AF_LOCAL;
    strcpy(server_addr.sun_path, argv[1]);
 
    char send_line[MAXLINE];
    bzero(send_line, MAXLINE);
    char recv_line[MAXLINE];
 
    while (fgets(send_line, MAXLINE, stdin) != NULL) {
    
    
        int i = strlen(send_line);
        if (send_line[i - 1] == '\n') {
    
    
            send_line[i - 1] = 0;
        }
        size_t nbytes = strlen(send_line);
        printf("now sending %s \n", send_line);
 
        if (sendto(sockfd, send_line, nbytes, 0, (struct sockaddr *) &server_addr, sizeof(server_addr)) != nbytes)
            error(1, errno, "sendto error");
 
        int n = recvfrom(sockfd, recv_line, MAXLINE, 0, NULL, NULL);
        recv_line[n] = 0;
 
        fputs(recv_line, stdout);
        fputs("\n", stdout);
    }
 
    exit(0);
}

The following piece of code shows the scene where the server and client respond through datagrams: we can see that the server receives datagrams sent by the client one after another, and at the same time, the client also receives responses from the server. The effect is as follows:

 ./unixdataserver /tmp/unixdata.sock
Receive: g1
now sending: Hi, g1
Receive: g2
now sending: Hi, g2
Receive: g3
now sending: Hi, g3
$ ./unixdataclient /tmp/unixdata.sock
g1
now sending g1
Hi, g1
g2
now sending g2
Hi, g2
g3
now sending g3
Hi, g3
^C

5.3 Socket case of k8s and docker

k8s has many excellent designs: k8s CRI (Container Runtime Interface), its idea is to decouple the main logic of k8s from the implementation of Container Runtime.

You can view the local socket status in the Linux system through the netstat command

  • The following figure lists the stream-type local socket with the path /var/run/dockershim.socket, and it can be clearly seen that the process that opens this socket is kubelet. Kubelet is a component of k8s, which is responsible for converting the commands of the controller and scheduler into container instances on a single machine. In order to achieve decoupling from the container runtime, kubelet designed a client-server GRPC call based on a local socket.
  • docker-containerd.sock is a socket for Docker
NETSTAT(8)                                                                                              Linux Programmer's Manual                                                                                             NETSTAT(8)

NAME
       netstat - Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships

   -a, --all
       Show both listening and non-listening sockets.  With the --interfaces option, show interfaces that are not up

   --protocol=family , -A
       Specifies the address families (perhaps better described as low level protocols) for which connections are to be shown.  
       family is a comma (',') separated list of address family keywords like inet, unix,  ipx,  ax25,  netrom, and ddp.  
       This has the same effect as using the --inet, --unix (-x), --ipx, --ax25, --netrom, and --ddp options.
       The address family inet includes raw, udp and tcp protocol sockets.

insert image description here

In /var/run, you can see the docker socket as follows:
insert image description here

If you don't know the missing header files, you can use man to query:

# 可以在linux系统里执行man命令,例如man socket:

SOCKET(2) Linux Programmer's Manual SOCKET(2)

NAME
       socket - create an endpoint for communication

SYNOPSIS
       #include <sys/types.h> /* See NOTES */
       #include <sys/socket.h>

       int socket(int domain, int type, int protocol);

Guess you like

Origin blog.csdn.net/jiaoyangwm/article/details/128550594