Detailed explanation of the working principle of the TCP network protocol stack

This article is shared from Huawei Cloud Community " The Magical Journey of Network Communication: Deciphering the Working Principle of Linux TCP Network Protocol Stack ", author: Lion Long.

1. TCP network development API

TCP, the full name of Transmission Control Protocol (Transmission Control Protocol), is a connection-oriented, reliable, byte stream-based transport layer communication protocol.

1.1, the API called by the TCP server

#include <sys/types.h> /* See NOTES */

#include <sys/socket.h>

// 1

int socket(int domain, int type, int protocol);

// 2

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

// 3

int listen(int sockfd, int backlog);

// 4

ssize_t recv(int sockfd, void *buf, size_t len, int flags);

// 5

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

// 6

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

// 7

int close(int fd);

// 8

int shutdown(int sockfd, int how);

1.2. API called by TCP client

#include <sys/types.h> /* See NOTES */

#include <sys/socket.h>

// 1

int socket(int domain, int type, int protocol);

// 2

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

// 3

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

// 4

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

// 5

ssize_t recv(int sockfd, void *buf, size_t len, int flags);

// 6

int close(int fd);

// 7

int shutdown(int sockfd, int how);

1.3. Functions of API functions

(1)int socket(int domain, int type, int protocol)

Allocate an fd in the file system and create the TCB data structure.

(2)int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen)

Bind the local IP address and port for the TCP socket.

(3)int listen(int sockfd, int backlog)

Put TCP in LISTEN state.

(4)int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen)

Take a node from the fully connected queue and assign a fd.

(5)ssize_t recv(int sockfd, void *buf, size_t len, int flags)

In the corresponding fd, copy the data from the read buffer.

(6)ssize_t send(int sockfd, const void *buf, size_t len, int flags)

Copy the TCB data corresponding to fd to the write buffer.

(7)int close(int fd)

Prepare a FIN packet and put it in the write buffer, whether it is fd.

(8)int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen)

Prepare a SYN packet, send it to the protocol stack, and wait for the three-way handshake to complete before returning.

Two, the three phases of TCP

2.1 TCP establishes a connection

The establishment of a TCP connection mainly depends on the functions of socket(), bind(), listen(), connect(), and accept().

2.1.1, TCP three-way handshake

Schematic diagram:

cke_122.png

The three-way handshake is performed in the kernel protocol stack, so in which functions are the three-way handshake sent?

For the first time, the handshake is triggered by the connect() function, that is, the syn packet is sent to the server;

The second time, before accept() after listen(), the server sends the syn&&ack packet to the client after receiving the syn packet;

For the third time, the client sends an ack packet to the server to complete the establishment of the connection.

TCP header:

0 |1 |2 |3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-------------------------------+-------------------------------+

| Source Port | Destination Port |

+---------------------------------------------------------------+

| Sequence Number |

+---------------------------------------------------------------+

| Acknowledgment Number |

+-------+-----------+-+-+-+-+-+-+-------------------------------+

| Header| Reserve |U|A|P|R|S|F| Window |

| Length| |R|C|S|S|Y|I| |

| | |G|K|H|T|N|N| |

+-------------------------------+-------------------------------+

| Checksum | Urgent Pointer |

+---------------------------------------------------------------+

| Option |

+---------------------------------------------------------------+

| Data |

| ... |

+---------------------------------------------------------------+
  • SYN: Synchronous, synchronous.
  • ACK: acknowledgment, confirmation.
  • PSH: push, push.
  • FIN: that is, finish, end.
  • RST: namely reset, reset.
  • URG: namely urgent, urgent.
  • Sequence Number: is the sequence number of the first byte of the packet itself.
  • Acknowledge Number: It is the sequence number of the acknowledgment packet that is expected to be sent by the other party. Its value is generally the received Sequence Number plus 1.

It can be seen from the message that the most important thing for a SYN packet is to set the SYN bit to 1 and set the Sequence Number; the most important thing for an ACK packet is to set the ACK bit to 1 and set the Acknowledgment Number.

Semi-connected queues and fully connected queues:

In the three-way handshake, the Linux kener protocol stack maintains two queues: a semi-connected queue and a fully connected queue.

Semi-connected queue (also called SYN queue):  In the first handshake, when the client sends a SYN packet to the server, the semi-connected queue of the server will add a node, indicating that the connection is in a semi-connected state.

Full connection queue (also called ACCEPT queue):  The full connection queue is in the third handshake. When the client sends an ACK packet to the server, the server will check whether the connection node exists in the semi-connection queue (search by quintuple). If it exists, the connection node will be added to the full connection queue; otherwise, the connection will be discarded.

After the three-way handshake is completed, the accpt() function takes out the connection node from the full connection queue, allocates socket fd for the node, and returns to the user state.

So, how does the accept() function know that there are nodes in the fully connected queue?

When the three-way handshake is completed, the full connection queue will release a connection access signal (single or semaphore) at the same time as the node is created. This signal determines whether the accept() function can take nodes from the full connection queue; it also determines whether IO multiplexers such as epoll can check whether the connection fd is readable.

In blocking mode, the accept() function waits for the signal until there are nodes in the fully connected queue before returning.

In non-blocking mode, the accept() function returns -1 if the fully connected queue is empty, otherwise it returns socket fd.

In the listen() function, there is a backlog parameter. Does this parameter indicate the size of the fully connected queue or the size of the semi-connected queue?

With the continuous iteration of the TCP protocol, the backlog parameter has different meanings in different versions; it can be the size of the semi-connected queue, the size of the fully connected queue, or the sum of the sizes of the semi-connected queue + the fully connected queue. However, the effect will not be much different. The current version mainly indicates the size of the fully connected queue.

DDOS attack:

According to the three-way handshake principle, an attack method on the server is generated: DDOS attack. The so-called DDOS attack is that the client forges some non-existent IPs and keeps sending SYN packets, so that the semi-connection queue of the server continues to increase. When the size of the semi-connection queue reaches the limit, network congestion will cause the server to no longer accept connections, causing the server to crash.

2.1.2, TCP state transition

TCP state transition diagram:

cke_123.png

(1) As can be seen from the state transition diagram, the LISTEN state can be converted to the SYN_SEND state by sending SYN and data; that is, the LISTEN state can send data.

(2) The SYN_SEND state can receive SYN, and send SYN and ACK to switch to the SYN_RECV state; that is, the two devices can send SYN packets to each other and establish a connection.

2.2 TCP transmission data

TCP data transmission mainly relies on two functions, send() and recv().

When using the send() function to send data, returning a positive number does not necessarily mean that the sending is successful. Because the send() function just copies the data to the write buffer of the protocol stack and sends it by the protocol stack; during the sending process, it will pass through N gateways, and there may be packet loss or link disconnection that may cause the data to fail to be sent to the destination. If you want to know whether the data is sent successfully, you need to add an acknowledgment mechanism (ACK).

2.2.1, transmission control block TCB

In order to ensure that the data can be distributed correctly, TCP uses a TCB (transmission control block) data structure to encapsulate the data sent to different devices. This TCB will exist for the entire TCP cycle until the connection is disconnected.

A TCB data block includes the socket information corresponding to the data sending parties and the buffer with data storage. Before establishing a connection and sending data, the communication parties must do a preparatory work: allocate memory and create a TCB data block. After the two parties have prepared their own socket and TCB data structures, they can enter the "three-way handshake" to establish a connection.

2.2.2, TCP subcontracting

TCP sub-packet means that the data to be transmitted is very large, and if it exceeds the remaining space in the sending buffer, it will be sub-packaged; the data to be sent is greater than the maximum packet length, and TCP will sub-packet before transmission.

The processing of the subpacket in the application program is generally the send cycle send(), and the receiver cycle recv().

2.2.3, TCP sticky packet and its solution

TCP sticky packet means that several data packets sent by the sender are glued into one packet when received by the receiver. From the perspective of the receiving buffer, the head of the subsequent data packet is followed by the end of the previous data packet.

Common solutions:

(1) (Recommended) Add the packet length in front of the application layer protocol header. The data is received twice; the first time the length of the packet is received first, and then the data is read at one time or cyclically according to the length of the packet.

For example:

// ...

ssize count=0;

ssize size=0;

while(count<tcpHdr->length)

{

size=recv(fd,buffer,buffersize,0);

count+=size;

}

// ...

(2) Add separators for each package. Add a delimiter at the end of the data, which may result in unpacking the data; because after splitting the data packet, the next data packet needs to be recorded for merging with the data behind the packet.

cke_124.png

2.3 TCP waved four times

Disconnection is a more complicated process than establishing a connection and transmitting data. Disconnection is mainly divided into two types: active shutdown and passive shutdown.

Schematic diagram of four waves:

cke_125.png

It should be noted that calling close() does not immediately complete the disconnection, but closes the data transmission, and enters the four-wave stage, and the TCB data structure has not been released. After four waves of hands, the TCB was actually released.

According to the four-wave process, some questions can be considered:

(1) In the process of data transmission, if the network cable is disconnected and immediately connected, how does TCP know?

If the network cable is disconnected, the network card will stop power supply. After the network card is connected again, the power supply will be restored, the network card service will be restarted, and the network connection will be reconnected. The application design is detected by the heartbeat package.

(2) How does the server know whether the client is down?

It also needs to be detected through the heartbeat packet mechanism.

(3) How does the server identify network congestion and downtime?

When the server sends a heartbeat packet, it not only sends it once, but sends it multiple times; if the network is blocked, there must be a reply message within a certain period of time; if it is downtime, no matter how long it takes, there will be no reply from the client.

(4) How to deal with a large number of CLOSING states?

A large number of CLOSING states appear, basically there are too many logics to be processed in the business, resulting in the CLOSING state all the time; asynchrony can be used to separate the network layer from the business layer and process them separately.

(5) Why is there a TIME_WAIT state in the four-wave wave?

Prevent no LAST_ACK or LAST_ACK loss, resulting in retransmission of sockets that no longer exist.

Summarize

It is necessary to master the process of TCP three-way handshake and four-way handshake, and be familiar with TCP state transitions. Know what SYN packets and ACK packets are.

(1) The three-way handshake is initiated by the client SYN, the server sends SYN and ACK after receiving the SYN, and the client replies ACK; the connection is established.

(2) Disconnection mainly includes active disconnection and passive disconnection.

(3) Four hand waving is caused by the initiator calling close() and sending a FIN packet at the same time; the receiving end receives the FIN packet and returns an ACK packet, and the receiving end sends a FIN packet; the initiator receives the FIN packet and returns an ACK packet; the disconnection is completed.

(4) Understand the state transition diagram of TCP. LISTEN state to SYN_RCVD state and SYN_SEND state, how to enter ESTABLISHED state; four waved FIN_WAIT_1, FIN_WAIT_2, TIME_WAIT, CLOSING direct conversion, CLOSE_WAIT and LAST_ACK processing, etc.

(5) Understand the underlying principles of the API, as well as fully connected queues and semi-connected queues.

(6) TCP subpackage scenarios and the processing method of TCP sticky packets.

The complete process of TCP communication:

cke_126.png

 

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

The 8 most in-demand programming languages ​​in 2023: PHP is strong, C/C++ demand is slowing Musk announced that Twitter will be renamed X, and the logo will be changed for five years, Cython 3.0 is officially released GPT-4 is getting more and more stupid? The accuracy rate dropped from 97.6% to 2.4%. MySQL 8.1 and MySQL 8.0.34 were officially released. The father of C# and TypeScript announced the latest open source project: TypeChat Meta Enlargement move: released an open source large language model Llama 2, which is free for commercial use . React core developer Dan Abramov announced his resignation from Meta. ChatGPT for Android will be launched next week. Pre-registration starts now . needs? Maybe this 5k star GitHub open source project can help - MetaGPT
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/10090377