1. Create a Socket

#include<sys/types.h>
#include<sys/socket.h>

int sock = ::socket(PF_INET, SOCK_STREAM, 0);

原型：int socket(int domain, int type, int protocol);

domain: protocol family, can be PF_INET, PF_INET6, PF_UNIX

type: socket type, which can be SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, or (|) on SOCK_NONBLOCK (non-blocking), SOCK_CLOEXEC (close on exec: the child process of fork automatically closes the opened sock when executing exec)

Among them: SOCK_STREAM: transport layer, TCP

SOCK_DGRAM: transport layer, UDP

SOCK_RAW: Network layer (handle ICMP, IGMP and other network messages, special IPv4 messages, IP headers can be constructed by the user through the IP_HDRINCL socket option)

protocol: The corresponding protocol type in the corresponding domain, usually one-to-one correspondence, set to 0

reference:

The difference between SOCK_RAW and SOCK_STREAM, SOCK_DGRAM_zhu2695's Blog-CSDN Blog

2. Set socket options

struct linger opt = {1, 1};
setsockopt(sock, SOL_SOCKET, SO_LINGER, (void *)&opt, sizeof(opt));

原型：int setsockopt(int sockfd, int level, int optname,
const void *optval, socklen_t optlen);

sockfd: created fd

level: It is necessary to set the protocol layer where the option is located, which can be SOL_SOCKET (socket option), IPPROTO_IP (IP option), IPPROTO_TCP (TCP option), IPPROTO_UDP (UDP option)

Some of these SOL_SOCKET options are: man 7 socket

         SO_REUSEADDR: When bind(), even if the address to be bound is occupied by other sockets in TIME_WAIT state, it can be reused immediately (you can also change /proc/sys/net/ipv4/tcp_tw_recycle to quickly recycle TIME_WAIT sockets, not recommended)

SO_KEEPALIVE: tcp keepalive. After setting, if the tcp connection has no data interaction within 7200s, the tcp keepalive mechanism will be triggered, and (usually the server side) will send a keep-alive probe tcp packet (containing 1 byte or without data):

                        If the other party replies ack, wait 7200s again to send keep-alive;

                        If the other party crashes and restarts, it will reply RST. At this time, the socket pending error is set to ECONNRESET, and the socket itself is closed;

                        If there is no reply from the other party, it will send once every 75s, and send 9 times. If there is still no response, the tcp connection will be disconnected, and the socket pending error will be set as ETIMEOUT and closed. For example, the ICMP error is "host unreachable (the host is unreachable) )", indicating that the other host has not crashed, but is unreachable. In this case, the pending error is set to EHOSTUNREACH.

SO_LINGER: Control whether the socket needs to return after the data in the buffer is sent out when the socket is closed or shut down

                        By default, after calling close(), close will return immediately, and the TCP module of the kernel will be responsible for sending the data in the buffer corresponding to the socket to the other party.

If l_onoff is 0, SO_LINGER has no effect, and close closes the socket with the default behavior

If l_onoff is 1 and l_linger is 0, close will return immediately, and the TCP module will discard the data in the sending buffer and send a RST message to the other party (time_wait can be avoided, but data may be lost)

If l_onoff is 1 and l_linger is greater than 0:

                                If close is blocked, close will wait for l_linger time until the TCP module sends the data in the buffer and receives ACK. If it is not sent and confirmed within l_linger time, close will return -1 and errno will be set to EWOULD BLOCK.

                                If close is non-blocking, close returns immediately and discards the buffer data.
struct linger {
    int l_onoff;    /* linger active */
    int l_linger;   /* how many seconds to linger for */
};
SO_OOBINLINE: Receive out-of-band data in the regular data stream

SO_SNDBUF: Send buffer size (udp is useless, because it is sent directly to the network)

                SO_RCVBUF: Receive buffer size (socket level, so it can be tcp/udp)

SO_SNDLOWAT: The low-level flag bit of the send buffer (performance optimization measures to prevent cpu from being occupied all the time)

                                               It only works on non-blocking sockets. For non-blocking sockets, send can only send all data or data above SO_SNDLOWAT (if SO_SNDLOWAT is 100, if sending 50 bytes, send will successfully send 50 bytes or return Failed, if sending 200 bytes, send may successfully send data between 100 and 200 bytes, or return failure. The default SO_SNDLOWAT is 1, send can send 1 byte or return EWOULDBLOCK)

                                                If the socket is block, if send cannot send all the data, it will block until there is enough buffer space or timeout (SO_SNDTIMEO).

                                                If it is select/poll/epoll, the socket will be writable only after the buffer has at least SO_SNDLOWAT data writable.

SO_RCVLOWAT: Receive buffer low level flag bit

( How about non-blocking? It is worth verifying ) For blocking recv, it will return the number of bytes requested or data exceeding SO_RCVLOWAT bytes

select/poll/epoll will return socket readable only after the buffer has at least SO_RCVLOWAT data readable

SO_RCVTIMEO: Timeout time for receiving data. If this option is set, the recv/recvfrom function will have a block time limit. If no more data is received beyond this time, some data will be returned or no data will be received, and the return will fail (errno is set to EWOULDBLOCK or EAGAIN ), the default value is 0 (never timeout). The structure used is struct timeval
#include <sys/time.h>

struct timeval {
    time_t      tv_sec;     /* seconds */
    suseconds_t tv_usec;    /* microseconds */
};
SO_SNDTIMEO: Timeout for sending data. Similar to SO_RCVTIMEO

optname: the SO_SOCKET option mentioned above

optval: the structure corresponding to the option, usually int/bool

SO_LINGER/SO_SNDTIMEO/SO_RCVTIMEO have special structures respectively

optlen: byte size of optval

reference:

setsockopt

Network Sockets and Communication (Guile Reference Manual)

Setsockopt Detailed Explanation_Zhang Zhonglin's Blog-CSDN Blog_linux setsockopt

10.2. Changing Network Kernel Settings Red Hat Enterprise Linux 5 | Red Hat Customer Portal

linux - What's the purpose of the socket option SO_SNDLOWAT - Stack Overflow

Change the TCP keep-alive interval:

Detailed Explanation of TCP keepalive (confusion) - Xiangyun 123456 - 博客园

Method 1 - System wide:

7200 seconds, 75 seconds, 9 times these 3 values can be changed by changing these values
/proc/sys/net/ipv4/tcp_keepalive_time
/proc/sys/net/ipv4/tcp_keepalive_intvl
/proc/sys/net/ipv4/tcp_keepalive_probes
Or in Linux, we can modify the global configuration of /etc/sysctl.conf, enter the following configuration to sysctl -p make it take effect, and use the command sysctl -a | grep keepalive to view the current default configuration
net.ipv4.tcp_keepalive_time=7200
net.ipv4.tcp_keepalive_intvl=75
net.ipv4.tcp_keepalive_probes=9
Method 2 - TCP for a single connection: man 7 tcp

Use tcp options: TCP_KEEPCNT, TCP_KEEPIDLE,TCP_KEEPINTVL
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>

int keepalive = 1;
int count = 5;
setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, (void *)&keepalive, sizeof(keepalive));
setsockopt(sock, IPPROTO_TCP, TCP_KEEPCNT, (void *)&count, sizeof(count));

The difference between TCP keep-alive and HTTP keep-alive:

Interviewer: Are TCP Keepalive and HTTP Keep-Alive the same thing? _Kobayashi coding technology blog_51CTO blog

The two are implemented at different levels and have different functions:

TCP keep-alive is implemented by the TCP layer (kernel state), and is called the TCP keep-alive mechanism, which is used to ensure the validity of the TCP connection and detect connection errors;

HTTP keep-alive is implemented in the application layer (user mode), called HTTP long connection, used for HTTP connection multiplexing, serial transmission of request-response data on the same connection, reducing the need to establish multiple HTTP short connections/ Release the resources consumed by the TCP connection;

With TCP keep-alive, the reasons why application layer heartbeat packets are still required:

There is already a SO_KEEPALIVE option in TCP, why should a heartbeat packet mechanism be added to the application layer? bazyd

Tcp keepalive can only detect whether the connection is alive, not whether the connection is available. For example, the server cannot respond to the request because the load is too high but the connection still exists, or the process is deadlocked or blocked. At this time, the keepalive can send and receive normally, but a connection error is detected.

There are four layers of proxy or load balancing between the client and the server, that is, the proxy above the transport layer, and only the data above the transport layer is forwarded, such as socks5, etc.

If the other party in the TCP connection suddenly disconnects from the network due to a power outage, we do not know that the connection is disconnected. At this time, if the data sent fails, it will be retransmitted. Since the priority of the retransmitted packet is higher than that of the keepalive data packet, the data of the keepalive Package could not be sent. Only after a long period of failed retransmissions can we tell that the connection is broken.

TCP read and write buffer size: man tcp

Via sysctl -a | grep tcp_rmem and tcp_wmem
net.ipv4.tcp_rmem = 4096	131072	6291456
net.ipv4.tcp_wmem = 4096	16384	4194304
or cat /proc/sys/net/ipv4/tcp_rmem cat /proc/sys/net/ipv4/tcp_wmem
# cat /proc/sys/net/ipv4/tcp_rmem 
4096	131072	6291456
# cat /proc/sys/net/ipv4/tcp_wmem 
4096	16384	4194304
The left, middle and right of these three values are the minimum, default and maximum values of the tcp receiving and sending buffers in the system. The default value can be obtained with getsockopt.
int wmem = 0;
socklen_t wmem_len = sizeof(wmem);
getsockopt(sock, SOL_SOCKET, SO_SNDBUF, (void *)&wmem, &wmem_len);
You can set the tcp send/receive buffer of a single socket by setting the SO_SNDBUF/SO_RCVBUF value. After verification, within a certain range, the system will double the set value. For example, after setting 10240, the value obtained by getsockopt is 20480
int wmem = 10240;
setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (void *)&wmem, sizeof(wmem));

UDP read buffer, no write buffer (because useless):

Via sysctl -a | grep rmem
net.core.rmem_default = 212992
net.core.rmem_max = 212992
You can set the receive buffer of a single udp socket (SOCK_DGRAM) by setting the value of SO_RCVBUF. Like the tcp buffer, the system will double the set value
int rmem = 10240;
socklen_t rmem_len = sizeof(rmem);
setsockopt(sock, SOL_SOCKET, SO_RCVBUF, (void *)&rmem, sizeof(rmem));
getsockopt(sock, SOL_SOCKET, SO_RCVBUF, (void *)&rmem, &rmem_len);

Relationship between SO_SNDBUF/SO_RCVBUF and sliding window:

SO_SNDBUF and SO_RCVBUF

TCP sliding window (send window and receive window) - hongdada - 博客园

[Necessary series for Dachang interviews] Sliding window protocol- Feitian Veal- Blog Garden

TCP/IP Study Notes: TCP Congestion Control - Mingming 1109 - 博客园

3. Socket binding address (named socket)

1. Several different address structures:

- Generic socket address

//总长度16字节
struct sockaddr  {
    sa_family_t sa_family; // unsigned short int，16bit,2字节，AF_xxx
    char        data[14];
};

//总长度128字节
struct sockaddr_storage {
    sa_family_t sa_family;
    char __ss_padding[_SS_PADSIZE];
    __ss_aligntype __ss_align;	/* Force desired alignment.  */
};

- Dedicated socket address

// IP协议族 IPv4
// 总长度16字节
struct sockaddr_in {
    sa_family_t sin_family; // AF_INET
    uint16_t    sin_port; // 网络字节序，需要从主机字节序转换 e.g. htons(3490)
    struct in_addr sin_addr;
    
    /* Pad to size of `struct sockaddr'.  8 字节*/
    unsigned char sin_zero[sizeof (struct sockaddr)
			   - sizeof (sa_family_t)
			   - sizeof (uint16_t)
			   - sizeof (struct in_addr)];
};
struct in_addr {
    uint32_t s_addr;
};

// IP协议族 IPv6
// 总长度28字节
struct sockaddr_in6 {
    sa_family_t sin6_family; // AF_INET6
    uint16_t sin6_port;	/* Transport layer port # */
    uint32_t sin6_flowinfo;	/* IPv6 flow information */
    struct in6_addr sin6_addr;	/* IPv6 address */
    uint32_t sin6_scope_id;	/* IPv6 scope-id */
};
struct in6_addr {
    union {
	    uint8_t	__u6_addr8[16];
	    uint16_t __u6_addr16[8];
	    uint32_t __u6_addr32[4];
    } __in6_u;
};

//UNIX本地协议族
struct sockaddr_un {
    sa_family_t sun_family; // AF_LOCAL/AF_UNIX
    char sun_path[108];
};

Conversions between the above addresses usually use casts.

2. IP conversion between addresses such as sin_addr and human-readable point form:

#include <iostream>
#include <arpa/inet.h>

char addr_point[] = "192.168.1.1";
char addr_bytes[sizeof(struct in_addr)];
int rc = inet_pton(AF_INET, addr_point, addr_bytes);

char buf[INET_ADDRSTRLEN];
if(inet_ntop(AF_INET, addr_bytes, buf, INET_ADDRSTRLEN) == NULL) {
    std::cout << "Error" << std::endl;
    return -1;
}
std::cout << "addr:" << buf << std::endl;

Among them, macros can be used to set the length of the address in the form of point:

#define INET_ADDRSTRLEN 16
#define INET6_ADDRSTRLEN 46

Function prototype:

- int inet_pton(int af, const char *src, void *dst);

af - address family, AF_INET or AF_INET6

src - address in point form

dst - network byte order address

Return value - success returns 1, failure - returns 0 to indicate that there is no convertible address in src, returns -1 to indicate that af does not support and sets errno to EAFNOSUPPORT

- const char *inet_ntop(int af, const void *src, char *dst, socklen_t size);

af - address family, AF_INET or AF_INET6

src - network byte order address

dst - address in point form

size - the length of dst, INET_ADDRSTRLEN or INET6_ADDRSTRLEN

Return value - success, returns a non-empty address; failure returns empty, and sets errno, ENOSPC indicates that the address exceeds size, EAFNOSUPPORT indicates that af does not support

PF_INET: protocol family, AF_INET: address family

reference:

illumos: manual page: sockaddr_storage.3socket

Implementing AF-independent application

3. bind binding address (named socket)

#include <sys/types.h> 
#include <sys/socket.h>

struct sockaddr_in address;
address.sin_family = AF_INET;
address.port = htons(666);
address.sin_addr.s_addr = htonl(INADDR_ANY);

int rt = bind(sock, (struct sockaddr *)&address, sizeof(address));
if (rt == -1 ) {
    return -1;
}

Function prototype:

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

sockfd - the socket that needs to bind address

addr - the address to bind to

addrlen - the length of addr

Return value - return 0 for success, -1 for failure, and set errno

The role of bind:

It is usually used on the server side to bind a specific address-port, because only after the server binds the address, the client knows where to connect.

Use of bind:

When binding, sometimes INADDR_ANY is used to assign a value to the address, because sometimes there are multiple network cards, and it is not possible to determine which address is used until the time of connect. Using INADDR_ANY can be bind delayed binding.

When the port of addr is set to 0, the system will randomly assign a port

Why does the socket server side need bind, but the client side generally does not need bind - Wang Hongjun Personal Blog

4. listen socket

#include <sys/types.h>
#include <sys/socket.h>

int rc = listen(sock, 5);

prototype:

int listen(int sockfd, int backlog);

sockfd - the socket to which the address has been bound

backlog - the maximum length of the socket queue that stores established connections and waits for accept to return. If the backlog exceeds the value in /proc/sys/net/core/somaxconn, then the backlog will be set to this value. The maximum length of the queue for an incomplete connection socket can be achieved by setting /proc/sys/net/ipv4/tcp_max_syn_backlog (after Linux 2.2)

Return value - returns 0 on success, -1 on failure, and sets errno

How TCP backlog works in Linux

In-depth exploration of the meaning of the Linux listen() function backlog_Yang Bodong's Blog Blog-CSDN Blog_linux listen function

SYN and Accept queue:

The socket in the Listening state has two queues: SYN queue and Accept queue

1. SYN queue (semi-connection queue): After receiving the SYN message connection sent by the client, the kernel establishes a SYN connection (struct inet_request_sock) and stores the connection in the SYN queue and sends SYN+ACK. The kernel will be responsible for the SYN queue The connection does not receive the timeout retransmission of the ACK message until the number of retransmissions is exceeded. Connections stored in this queue are in SYN-RECV state.

2. Accept queue (full connection queue): After the kernel receives the ACK message, it will find the matching connection in the SYN queue, then delete the connection from the SYN queue, and establish a complete connection (struct inet_sock) and store it in the Accpet queue, waiting for the application The program removes the connection. Connections stored in this queue are in the ESTABLISHED state. After the application call accept returns, the connection will be removed from the Accept queue.

View the number of connections in the SYN queue: ss -n state syn-recv sport = :80 | wc -l //View the SYN queue of port 80

View the number of connections in the Accept queue: ss -plnt sport =: 6443

Check whether the SYN queue and Accept queue overflow: nstat -az | grep -E 'TcpExtListenOverflows|TcpExtListenDrops'

Among them, TcpExtListenOverflows: Add 1 when the Accept queue exceeds the upper limit

TcpExtListenDrops: Any reason, including Accept queue overrun, creating a new connection, inheriting port failure, etc., plus 1

What is the relationship between socket and tcp (analysis of the principle of difference between TCP socket SYN queue and Accept queue) - happy learning

https://blog.cloudflare.com/syn-packet-handling-in-the-wild/

How to see Listen and connection queue of Socket TCP from Linux source code- System Operation and Maintenance- Billion Speed Cloud

Summary of 10,000-level concurrent server kernel tuning - Zwjsec's Blog - CSDN Blog

Various usage examples of ss command (socket statistics) under Linux system_Linux command_Cloud Network Bull Station

5. accept connection

#include <sys/types.h>
#include <sys/socket.h>

int fd;
struct sockaddr_in addr;
socklen_t addr_len = sizeof(addr);//需设置,告诉内核要返回的地址长度,如果不设置会EINVAL
fd = accept(sock, (struct sockaddr *)&addr, &addr_len);
if (fd < 0) {
    return -1;
}

prototype:

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

sockfd - socket bound with bind

addr - After the accpet is successful, store the address of the peer. If you are not interested in the address of the peer, you can leave it empty.

addrlen - After the accept is successful, store the length of the peer address. If you are not interested in the peer address, you can leave it empty.

Return value - successfully return a new file descriptor (use this fd to communicate with the client), fail to return -1, and set errno

accept() function Unix/Linux -Unix/Linux system call

The usage of accept (man 2 accept):

For connection-oriented socket types (SOCK_STREAM, SOCK_SEQPACKET), a new connection will be created from the first in the queue of the listening socket, and a file descriptor (used for data transmission with the client) will be returned.

If there are no connection requests in the listening socket queue:

1. The socket is blocked, then accept will be blocked until there is a connection request

2. The socket is non-blocking, then accept will return failure, errno: EAGAIN or EWOULDBLOCK

In order to be notified of new connection requests, use:

1. select, poll, epoll: when a new request is initiated, a readable event will be returned

2. SIGIO: When the connection request on the listen socket is completed, SIGIO will be generated

Signal-Driven I/O - Stories, - Blog Garden

The case where accept returns a non-fatal error and retries:

1. In Linux, accept will return already-pending errors in the network, so you need to fail the accept, treat these errors set by errno as EAGAIN and try again. On a TCP/IP network, these errors include: ENETDOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and ENETUNREACH (man 2 accept - Error Handling)

2. For accept failure caused by signal interruption, errno is set to EINTR, and retry is also required

3. The connection is terminated before accept returns: the connection status has changed to ESTABLISHED, and before accept returns, the client sends a RST message. (Although UNIX network programming says it is ECONNABORT or EPROTO, in Ubuntu, it will return successfully, but when reading, it will make an error ECONNRESET, indicating that RST has been received), and this error can be ignored.

21-Non-blocking accept_songly_'s blog-CSDN blog_accept non-blocking

UNIX Network Programming Volume 1: Socket Networking API

What to do when the semi-connection/full connection queue is full:

The semi-connection queue is full: (after it is full and syncookies is not enabled, the kernel discards subsequent sync requests by default)

- Increase the semi-join queue length /proc/sys/net/ipv4/tcp_max_syn_backlog

- Enable /proc/sys/net/ipv4/tcp_syncookies (syncookies is the seq number in the ACK+SYN returned by the server to the client)

- Reduce the number of retransmissions of sync+ack /proc/sys/net/ipv4/tcp_syncack_retries

The full connection queue is full: (when it is full, the kernel will discard subsequent sync requests by default. If tcp_abort_on_overflow is set to 1, an RST packet will be sent)

- Increase the full connection queue length, full connection queue length=min(backlog, /proc/sys/net/core/somaxconn)

What happens when the TCP semi-connection queue and full connection queue are full? How should we deal with it? - Kobayashi coding - Blog Garden

The function, principle and defect of TCP SYN cookie

A simple explanation of SYN-Cookies in TCP Develop Paper

6. connect

#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h> // for inet_pton & htons

char address_p[] = "127.0.0.1";
struct in_addr address_bytes;
inet_pton(AF_INET, address_p, &address_bytes);

struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(6666);
addr.sin_addr.s_addr = address_bytes.s_addr;

int rc = connect(sock, (struct sockaddr *)&addr, sizeof(addr));
if (rc == -1) {
    return -1;
}

prototype:

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen) ;

sockfd - the fd created by the client

addr - the address of the server to connect to

addrlen - the length of addr

Return value - return 0 for success, -1 for failure, and set errno, man 2 connect

The use of connect:

Both connection-oriented (SOCK_STREAM or SOCK_SEQPACKET) and non-connection-oriented (SOCK_DGRAM) sockets can use connect

1. For sockets of type SOCK_DGRAM:

        connect indicates the default address of the socket to send data, and the only address to receive data.

        If you want to change the address, you can call connect again.

        If you want to disconnect from the address, you can call connect again and set the sa_family of the parameter addr to AF_UNSPEC.

2. For sockets of type SOCK_STREAM/SOCK_SEQPACKET:

Create a connection to the socket bound to the addr address

The reason for connect failure:

1. After receiving the RST packet from the server, the reason why the server sends the RST packet:

- The server port of connect is not in the listening state

- The port is in TIME_WAIT state

2. Timeout did not receive a reply from the server

- The server load is too high, the server receives the SYN message from the client but has no time to respond (eg, the semi-connection queue is full)

- Network congestion, the response message sent by the server is lost during network transmission, resulting in the client not receiving the confirmation

- The route is unreachable, there is a problem with the intermediate route, the SYN packet is discarded, and an ICMP error message of Destination unreachable is sent to the client

ps: When no reply is received, the linux kernel will try multiple times, the maximum number of attempts is changed by changing tcp_syn_retries, the default value is 6