1. Create a Socket
#include<sys/types.h>
#include<sys/socket.h>
int sock = ::socket(PF_INET, SOCK_STREAM, 0);
原型:int socket(int domain, int type, int protocol);
domain: protocol family, can be PF_INET, PF_INET6, PF_UNIX
type: socket type, which can be SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, or (|) on SOCK_NONBLOCK (non-blocking), SOCK_CLOEXEC (close on exec: the child process of fork automatically closes the opened sock when executing exec)
Among them: SOCK_STREAM: transport layer, TCP
SOCK_DGRAM: transport layer, UDP
SOCK_RAW: Network layer (handle ICMP, IGMP and other network messages, special IPv4 messages, IP headers can be constructed by the user through the IP_HDRINCL socket option)
protocol: The corresponding protocol type in the corresponding domain, usually one-to-one correspondence, set to 0
reference:
The difference between SOCK_RAW and SOCK_STREAM, SOCK_DGRAM_zhu2695's Blog-CSDN Blog
2. Set socket options
struct linger opt = {1, 1};
setsockopt(sock, SOL_SOCKET, SO_LINGER, (void *)&opt, sizeof(opt));
原型:int setsockopt(int sockfd, int level, int optname,
const void *optval, socklen_t optlen);
sockfd: created fd
level: It is necessary to set the protocol layer where the option is located, which can be SOL_SOCKET (socket option), IPPROTO_IP (IP option), IPPROTO_TCP (TCP option), IPPROTO_UDP (UDP option)
Some of these SOL_SOCKET options are: man 7 socket
SO_REUSEADDR: When bind(), even if the address to be bound is occupied by other sockets in TIME_WAIT state, it can be reused immediately (you can also change /proc/sys/net/ipv4/tcp_tw_recycle to quickly recycle TIME_WAIT sockets, not recommended)
SO_KEEPALIVE: tcp keepalive. After setting, if the tcp connection has no data interaction within 7200s, the tcp keepalive mechanism will be triggered, and (usually the server side) will send a keep-alive probe tcp packet (containing 1 byte or without data):
If the other party replies ack, wait 7200s again to send keep-alive;
If the other party crashes and restarts, it will reply RST. At this time, the socket pending error is set to ECONNRESET, and the socket itself is closed;
If there is no reply from the other party, it will send once every 75s, and send 9 times. If there is still no response, the tcp connection will be disconnected, and the socket pending error will be set as ETIMEOUT and closed. For example, the ICMP error is "host unreachable (the host is unreachable) )", indicating that the other host has not crashed, but is unreachable. In this case, the pending error is set to EHOSTUNREACH.
SO_LINGER: Control whether the socket needs to return after the data in the buffer is sent out when the socket is closed or shut down
By default, after calling close(), close will return immediately, and the TCP module of the kernel will be responsible for sending the data in the buffer corresponding to the socket to the other party.
If l_onoff is 0, SO_LINGER has no effect, and close closes the socket with the default behavior
If l_onoff is 1 and l_linger is 0, close will return immediately, and the TCP module will discard the data in the sending buffer and send a RST message to the other party (time_wait can be avoided, but data may be lost)
If l_onoff is 1 and l_linger is greater than 0:
If close is blocked, close will wait for l_linger time until the TCP module sends the data in the buffer and receives ACK. If it is not sent and confirmed within l_linger time, close will return -1 and errno will be set to EWOULD BLOCK.
If close is non-blocking, close returns immediately and discards the buffer data.
struct linger { int l_onoff; /* linger active */ int l_linger; /* how many seconds to linger for */ };
SO_OOBINLINE: Receive out-of-band data in the regular data stream
SO_SNDBUF: Send buffer size (udp is useless, because it is sent directly to the network)
SO_RCVBUF: Receive buffer size (socket level, so it can be tcp/udp)
SO_SNDLOWAT: The low-level flag bit of the send buffer (performance optimization measures to prevent cpu from being occupied all the time)
It only works on non-blocking sockets. For non-blocking sockets, send can only send all data or data above SO_SNDLOWAT (if SO_SNDLOWAT is 100, if sending 50 bytes, send will successfully send 50 bytes or return Failed, if sending 200 bytes, send may successfully send data between 100 and 200 bytes, or return failure. The default SO_SNDLOWAT is 1, send can send 1 byte or return EWOULDBLOCK)
If the socket is block, if send cannot send all the data, it will block until there is enough buffer space or timeout (SO_SNDTIMEO).
If it is select/poll/epoll, the socket will be writable only after the buffer has at least SO_SNDLOWAT data writable.
SO_RCVLOWAT: Receive buffer low level flag bit
( How about non-blocking? It is worth verifying ) For blocking recv, it will return the number of bytes requested or data exceeding SO_RCVLOWAT bytes
select/poll/epoll will return socket readable only after the buffer has at least SO_RCVLOWAT data readable
SO_RCVTIMEO: Timeout time for receiving data. If this option is set, the recv/recvfrom function will have a block time limit. If no more data is received beyond this time, some data will be returned or no data will be received, and the return will fail (errno is set to EWOULDBLOCK or EAGAIN ), the default value is 0 (never timeout). The structure used is struct timeval
#include <sys/time.h> struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ };
SO_SNDTIMEO: Timeout for sending data. Similar to SO_RCVTIMEO
optname: the SO_SOCKET option mentioned above
optval: the structure corresponding to the option, usually int/bool
SO_LINGER/SO_SNDTIMEO/SO_RCVTIMEO have special structures respectively
optlen: byte size of optval
reference:
Network Sockets and Communication (Guile Reference Manual)
Setsockopt Detailed Explanation_Zhang Zhonglin's Blog-CSDN Blog_linux setsockopt
10.2. Changing Network Kernel Settings Red Hat Enterprise Linux 5 | Red Hat Customer Portal
linux - What's the purpose of the socket option SO_SNDLOWAT - Stack Overflow
Related:
Change the TCP keep-alive interval:
Detailed Explanation of TCP keepalive (confusion) - Xiangyun 123456 - 博客园
Method 1 - System wide:
7200 seconds, 75 seconds, 9 times these 3 values can be changed by changing these values
/proc/sys/net/ipv4/tcp_keepalive_time /proc/sys/net/ipv4/tcp_keepalive_intvl /proc/sys/net/ipv4/tcp_keepalive_probes
Or in Linux, we can modify the global configuration of /etc/sysctl.conf, enter the following configuration to
sysctl -p
make it take effect, and use the command sysctl -a | grep keepalive to view the current default configurationnet.ipv4.tcp_keepalive_time=7200 net.ipv4.tcp_keepalive_intvl=75 net.ipv4.tcp_keepalive_probes=9
Method 2 - TCP for a single connection: man 7 tcp
Use tcp options:
TCP_KEEPCNT
,TCP_KEEPIDLE
,TCP_KEEPINTVL
#include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h> int keepalive = 1; int count = 5; setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, (void *)&keepalive, sizeof(keepalive)); setsockopt(sock, IPPROTO_TCP, TCP_KEEPCNT, (void *)&count, sizeof(count));
The difference between TCP keep-alive and HTTP keep-alive:
The two are implemented at different levels and have different functions:
TCP keep-alive is implemented by the TCP layer (kernel state), and is called the TCP keep-alive mechanism, which is used to ensure the validity of the TCP connection and detect connection errors;
HTTP keep-alive is implemented in the application layer (user mode), called HTTP long connection, used for HTTP connection multiplexing, serial transmission of request-response data on the same connection, reducing the need to establish multiple HTTP short connections/ Release the resources consumed by the TCP connection;
With TCP keep-alive, the reasons why application layer heartbeat packets are still required:
- Tcp keepalive can only detect whether the connection is alive, not whether the connection is available. For example, the server cannot respond to the request because the load is too high but the connection still exists, or the process is deadlocked or blocked. At this time, the keepalive can send and receive normally, but a connection error is detected.
- There are four layers of proxy or load balancing between the client and the server, that is, the proxy above the transport layer, and only the data above the transport layer is forwarded, such as socks5, etc.
- If the other party in the TCP connection suddenly disconnects from the network due to a power outage, we do not know that the connection is disconnected. At this time, if the data sent fails, it will be retransmitted. Since the priority of the retransmitted packet is higher than that of the keepalive data packet, the data of the keepalive Package could not be sent. Only after a long period of failed retransmissions can we tell that the connection is broken.
TCP read and write buffer size: man tcp
Via sysctl -a | grep tcp_rmem and tcp_wmem
net.ipv4.tcp_rmem = 4096 131072 6291456 net.ipv4.tcp_wmem = 4096 16384 4194304
or cat /proc/sys/net/ipv4/tcp_rmem cat /proc/sys/net/ipv4/tcp_wmem
# cat /proc/sys/net/ipv4/tcp_rmem 4096 131072 6291456 # cat /proc/sys/net/ipv4/tcp_wmem 4096 16384 4194304
The left, middle and right of these three values are the minimum, default and maximum values of the tcp receiving and sending buffers in the system. The default value can be obtained with getsockopt.
int wmem = 0; socklen_t wmem_len = sizeof(wmem); getsockopt(sock, SOL_SOCKET, SO_SNDBUF, (void *)&wmem, &wmem_len);
You can set the tcp send/receive buffer of a single socket by setting the SO_SNDBUF/SO_RCVBUF value. After verification, within a certain range, the system will double the set value. For example, after setting 10240, the value obtained by getsockopt is 20480
int wmem = 10240; setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (void *)&wmem, sizeof(wmem));
UDP read buffer, no write buffer (because useless):
Via sysctl -a | grep rmem
net.core.rmem_default = 212992 net.core.rmem_max = 212992
You can set the receive buffer of a single udp socket (SOCK_DGRAM) by setting the value of SO_RCVBUF. Like the tcp buffer, the system will double the set value
int rmem = 10240; socklen_t rmem_len = sizeof(rmem); setsockopt(sock, SOL_SOCKET, SO_RCVBUF, (void *)&rmem, sizeof(rmem)); getsockopt(sock, SOL_SOCKET, SO_RCVBUF, (void *)&rmem, &rmem_len);
Relationship between SO_SNDBUF/SO_RCVBUF and sliding window:
TCP sliding window (send window and receive window) - hongdada - 博客园
[Necessary series for Dachang interviews] Sliding window protocol- Feitian Veal- Blog Garden
TCP/IP Study Notes: TCP Congestion Control - Mingming 1109 - 博客园
3. Socket binding address (named socket)
1. Several different address structures:
- Generic socket address
//总长度16字节
struct sockaddr {
sa_family_t sa_family; // unsigned short int,16bit,2字节,AF_xxx
char data[14];
};
//总长度128字节
struct sockaddr_storage {
sa_family_t sa_family;
char __ss_padding[_SS_PADSIZE];
__ss_aligntype __ss_align; /* Force desired alignment. */
};
- Dedicated socket address
// IP协议族 IPv4
// 总长度16字节
struct sockaddr_in {
sa_family_t sin_family; // AF_INET
uint16_t sin_port; // 网络字节序,需要从主机字节序转换 e.g. htons(3490)
struct in_addr sin_addr;
/* Pad to size of `struct sockaddr'. 8 字节*/
unsigned char sin_zero[sizeof (struct sockaddr)
- sizeof (sa_family_t)
- sizeof (uint16_t)
- sizeof (struct in_addr)];
};
struct in_addr {
uint32_t s_addr;
};
// IP协议族 IPv6
// 总长度28字节
struct sockaddr_in6 {
sa_family_t sin6_family; // AF_INET6
uint16_t sin6_port; /* Transport layer port # */
uint32_t sin6_flowinfo; /* IPv6 flow information */
struct in6_addr sin6_addr; /* IPv6 address */
uint32_t sin6_scope_id; /* IPv6 scope-id */
};
struct in6_addr {
union {
uint8_t __u6_addr8[16];
uint16_t __u6_addr16[8];
uint32_t __u6_addr32[4];
} __in6_u;
};
//UNIX本地协议族
struct sockaddr_un {
sa_family_t sun_family; // AF_LOCAL/AF_UNIX
char sun_path[108];
};
Conversions between the above addresses usually use casts.
2. IP conversion between addresses such as sin_addr and human-readable point form:
#include <iostream>
#include <arpa/inet.h>
char addr_point[] = "192.168.1.1";
char addr_bytes[sizeof(struct in_addr)];
int rc = inet_pton(AF_INET, addr_point, addr_bytes);
char buf[INET_ADDRSTRLEN];
if(inet_ntop(AF_INET, addr_bytes, buf, INET_ADDRSTRLEN) == NULL) {
std::cout << "Error" << std::endl;
return -1;
}
std::cout << "addr:" << buf << std::endl;
Among them, macros can be used to set the length of the address in the form of point:
#define INET_ADDRSTRLEN 16
#define INET6_ADDRSTRLEN 46
Function prototype:
- int inet_pton(int af, const char *src, void *dst);
af - address family, AF_INET or AF_INET6
src - address in point form
dst - network byte order address
Return value - success returns 1, failure - returns 0 to indicate that there is no convertible address in src, returns -1 to indicate that af does not support and sets errno to EAFNOSUPPORT
- const char *inet_ntop(int af, const void *src, char *dst, socklen_t size);
af - address family, AF_INET or AF_INET6
src - network byte order address
dst - address in point form
size - the length of dst, INET_ADDRSTRLEN or INET6_ADDRSTRLEN
Return value - success, returns a non-empty address; failure returns empty, and sets errno, ENOSPC indicates that the address exceeds size, EAFNOSUPPORT indicates that af does not support
PF_INET: protocol family, AF_INET: address family
reference:
illumos: manual page: sockaddr_storage.3socket
Implementing AF-independent application
3. bind binding address (named socket)
#include <sys/types.h>
#include <sys/socket.h>
struct sockaddr_in address;
address.sin_family = AF_INET;
address.port = htons(666);
address.sin_addr.s_addr = htonl(INADDR_ANY);
int rt = bind(sock, (struct sockaddr *)&address, sizeof(address));
if (rt == -1 ) {
return -1;
}
Function prototype:
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
sockfd - the socket that needs to bind address
addr - the address to bind to
addrlen - the length of addr
Return value - return 0 for success, -1 for failure, and set errno
The role of bind:
It is usually used on the server side to bind a specific address-port, because only after the server binds the address, the client knows where to connect.
Use of bind:
When binding, sometimes INADDR_ANY is used to assign a value to the address, because sometimes there are multiple network cards, and it is not possible to determine which address is used until the time of connect. Using INADDR_ANY can be bind delayed binding.
When the port of addr is set to 0, the system will randomly assign a port
4. listen socket
#include <sys/types.h>
#include <sys/socket.h>
int rc = listen(sock, 5);
prototype:
int listen(int sockfd, int backlog);
sockfd - the socket to which the address has been bound
backlog - the maximum length of the socket queue that stores established connections and waits for accept to return. If the backlog exceeds the value in /proc/sys/net/core/somaxconn, then the backlog will be set to this value. The maximum length of the queue for an incomplete connection socket can be achieved by setting /proc/sys/net/ipv4/tcp_max_syn_backlog (after Linux 2.2)
Return value - returns 0 on success, -1 on failure, and sets errno
How TCP backlog works in Linux
SYN and Accept queue:
The socket in the Listening state has two queues: SYN queue and Accept queue
1. SYN queue (semi-connection queue): After receiving the SYN message connection sent by the client, the kernel establishes a SYN connection (struct inet_request_sock) and stores the connection in the SYN queue and sends SYN+ACK. The kernel will be responsible for the SYN queue The connection does not receive the timeout retransmission of the ACK message until the number of retransmissions is exceeded. Connections stored in this queue are in SYN-RECV state.
2. Accept queue (full connection queue): After the kernel receives the ACK message, it will find the matching connection in the SYN queue, then delete the connection from the SYN queue, and establish a complete connection (struct inet_sock) and store it in the Accpet queue, waiting for the application The program removes the connection. Connections stored in this queue are in the ESTABLISHED state. After the application call accept returns, the connection will be removed from the Accept queue.
View the number of connections in the SYN queue: ss -n state syn-recv sport = :80 | wc -l //View the SYN queue of port 80
View the number of connections in the Accept queue: ss -plnt sport =: 6443
Check whether the SYN queue and Accept queue overflow: nstat -az | grep -E 'TcpExtListenOverflows|TcpExtListenDrops'
Among them, TcpExtListenOverflows: Add 1 when the Accept queue exceeds the upper limit
TcpExtListenDrops: Any reason, including Accept queue overrun, creating a new connection, inheriting port failure, etc., plus 1
https://blog.cloudflare.com/syn-packet-handling-in-the-wild/
Summary of 10,000-level concurrent server kernel tuning - Zwjsec's Blog - CSDN Blog
5. accept connection
#include <sys/types.h>
#include <sys/socket.h>
int fd;
struct sockaddr_in addr;
socklen_t addr_len = sizeof(addr);//需设置,告诉内核要返回的地址长度,如果不设置会EINVAL
fd = accept(sock, (struct sockaddr *)&addr, &addr_len);
if (fd < 0) {
return -1;
}
prototype:
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
sockfd - socket bound with bind
addr - After the accpet is successful, store the address of the peer. If you are not interested in the address of the peer, you can leave it empty.
addrlen - After the accept is successful, store the length of the peer address. If you are not interested in the peer address, you can leave it empty.
Return value - successfully return a new file descriptor (use this fd to communicate with the client), fail to return -1, and set errno
accept() function Unix/Linux -Unix/Linux system call
The usage of accept (man 2 accept):
For connection-oriented socket types (SOCK_STREAM, SOCK_SEQPACKET), a new connection will be created from the first in the queue of the listening socket, and a file descriptor (used for data transmission with the client) will be returned.
If there are no connection requests in the listening socket queue:
1. The socket is blocked, then accept will be blocked until there is a connection request
2. The socket is non-blocking, then accept will return failure, errno: EAGAIN or EWOULDBLOCK
In order to be notified of new connection requests, use:
1. select, poll, epoll: when a new request is initiated, a readable event will be returned
2. SIGIO: When the connection request on the listen socket is completed, SIGIO will be generated
Signal-Driven I/O - Stories, - Blog Garden
The case where accept returns a non-fatal error and retries:
1. In Linux, accept will return already-pending errors in the network, so you need to fail the accept, treat these errors set by errno as EAGAIN and try again. On a TCP/IP network, these errors include: ENETDOWN, EPROTO, ENOPROTOOPT, EHOSTDOWN, ENONET, EHOSTUNREACH, EOPNOTSUPP, and ENETUNREACH (man 2 accept - Error Handling)
2. For accept failure caused by signal interruption, errno is set to EINTR, and retry is also required
3. The connection is terminated before accept returns: the connection status has changed to ESTABLISHED, and before accept returns, the client sends a RST message. (Although UNIX network programming says it is ECONNABORT or EPROTO, in Ubuntu, it will return successfully, but when reading, it will make an error ECONNRESET, indicating that RST has been received), and this error can be ignored.
21-Non-blocking accept_songly_'s blog-CSDN blog_accept non-blocking
UNIX Network Programming Volume 1: Socket Networking API
What to do when the semi-connection/full connection queue is full:
The semi-connection queue is full: (after it is full and syncookies is not enabled, the kernel discards subsequent sync requests by default)
- Increase the semi-join queue length /proc/sys/net/ipv4/tcp_max_syn_backlog
- Enable /proc/sys/net/ipv4/tcp_syncookies (syncookies is the seq number in the ACK+SYN returned by the server to the client)
- Reduce the number of retransmissions of sync+ack /proc/sys/net/ipv4/tcp_syncack_retries
The full connection queue is full: (when it is full, the kernel will discard subsequent sync requests by default. If tcp_abort_on_overflow is set to 1, an RST packet will be sent)
- Increase the full connection queue length, full connection queue length=min(backlog, /proc/sys/net/core/somaxconn)
The function, principle and defect of TCP SYN cookie
A simple explanation of SYN-Cookies in TCP Develop Paper
6. connect
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h> // for inet_pton & htons
char address_p[] = "127.0.0.1";
struct in_addr address_bytes;
inet_pton(AF_INET, address_p, &address_bytes);
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(6666);
addr.sin_addr.s_addr = address_bytes.s_addr;
int rc = connect(sock, (struct sockaddr *)&addr, sizeof(addr));
if (rc == -1) {
return -1;
}
prototype:
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen) ;
sockfd - the fd created by the client
addr - the address of the server to connect to
addrlen - the length of addr
Return value - return 0 for success, -1 for failure, and set errno, man 2 connect
The use of connect:
Both connection-oriented (SOCK_STREAM or SOCK_SEQPACKET) and non-connection-oriented (SOCK_DGRAM) sockets can use connect
1. For sockets of type SOCK_DGRAM:
connect indicates the default address of the socket to send data, and the only address to receive data.
If you want to change the address, you can call connect again.
If you want to disconnect from the address, you can call connect again and set the sa_family of the parameter addr to AF_UNSPEC.
2. For sockets of type SOCK_STREAM/SOCK_SEQPACKET:
Create a connection to the socket bound to the addr address
The reason for connect failure:
1. After receiving the RST packet from the server, the reason why the server sends the RST packet:
- The server port of connect is not in the listening state
- The port is in TIME_WAIT state
2. Timeout did not receive a reply from the server
- The server load is too high, the server receives the SYN message from the client but has no time to respond (eg, the semi-connection queue is full)
- Network congestion, the response message sent by the server is lost during network transmission, resulting in the client not receiving the confirmation
- The route is unreachable, there is a problem with the intermediate route, the SYN packet is discarded, and an ICMP error message of Destination unreachable is sent to the client
ps: When no reply is received, the linux kernel will try multiple times, the maximum number of attempts is changed by changing tcp_syn_retries, the default value is 6
https://blog.csdn.net/weixin_42226134/article/details/104284380
TCP Source Code Analysis-Connect Process of Three-way Handshake-51CTO.COM
7. close/shutdown to disconnect
8. Transfer data
recv/send
recvfrom/sendto
UDP transmission: analysis of recvfrom function and sendto function - Programmer Sought
recvfrom ancillary data
recvmsg(2) - OpenBSD manual pages
UNIX Network Programming Reading Notes: Auxiliary Data - ITtecman - 博客园
Explanation of Linux msghdr structure | Ivanzz