[Network programming] transport layer protocol - TCP protocol

1. TCP protocol format

insert image description here

  • The meaning of each field in the TCP header is as follows

Source/destination port number : Indicates which process the data comes from and which process it sends to the peer host.
32-bit serial number/32-bit confirmation serial number : respectively represent the number of each byte of data in the TCP message and the confirmation of the other party, which is an important field for TCP to ensure reliability.
4-bit TCP header length : indicates the length of the TCP header, in units of 4 bytes.
6-bit reserved field : 6 bits temporarily unused in the TCP header.
16-bit window size : an important field to ensure the TCP reliability mechanism and efficiency improvement mechanism.
16-bit checksum : filled by the sender, using CRC check. If the verification at the receiving end fails, it is considered that there is a problem with the received data. (The checksum includes the TCP header + TCP data part)
16-bit urgent pointer : identifies the offset of urgent data in the message, and needs to be used in conjunction with the URG field in the flag field.
Option field : Additional option fields are allowed in the TCP header, up to 40 bytes.

  • The 6-bit flag in the TCP header:

URG : Whether the urgent pointer is valid.
ACK : Confirm whether the serial number is valid.
PSH : Prompt the receiver application to read the data in the TCP receive buffer immediately.
RST : Indicates that the other party is required to re-establish the connection. We call the message carrying the RST flag a reset message segment.
SYN : Indicates a request to establish a connection with the other party. We call the message carrying the SYN identifier a synchronous message segment.
FIN : Notify the other party that the local end is going to be closed. We call the message carrying the FIN identifier the end message segment.

1.1 How does TCP separate the header from the payload?

Observe the TCP protocol format, the message part is 20 bytes in total except the options. So we can fetch 20 bytes first . The length of the four-digit header indicates the size of the header , and the size of the option can be calculated based on this.
After reading the basic TCP header and options fields, all that is left is the payload.

  • About the four-digit header length

There are four bits here. According to the normal calculation, the value range is [0 ~ 15], but it is obviously wrong, because the header must be at least 20 bytes.
Therefore, it is stipulated that the basic unit of the 4-bit header length description in the TCP header is 4 bytes , so the value range is [0 ~ 60], so the size range of the entire header is [20 ~ 60] , and the length of the option field in the header is the largest . is 40 bytes .

1.2 How is the payload delivered upwards?

Because each process of the application layer will bind a port number:

  • The server shows that a port number is bound
  • The client is automatically bound to a port number by the operating system

The header is extracted above, and the header contains the destination port, and the corresponding protocol can be found upwards.

Supplement : The mapping relationship between the port number and the process ID is maintained in the kernel in a hash manner, so the transport layer can quickly find the corresponding process ID through the port number, and then find the corresponding application layer process. The time to bind the mapping relationship: when binding the port.

1.3 Understanding of TCP header

Like the UDP header mentioned in the previous chapter, the TCP header is a structured object:
[Network Programming] Transport layer protocol - UDP protocol

It is also the kernel that creates a block of memory, then copies the payload later, and converts it into structured data before filling in each field.

1.4 Serial number and confirmation serial number

Before talking about the need and confirmation of the serial number, introduce the concept of network reliability:

1.4.1 Unreliable network problem

Now computers are basically based on the von Neumann architecture:
insert image description here
although the devices in the above picture are all on one machine, they are all independent hardware devices, and they must communicate if they want to interact with data . Therefore, these devices are actually connected by "wires". The "wire" connecting the memory and peripherals is called the IO bus , and the "wire" connecting the memory and the CPU is called the system bus .

In a machine, the length of these "lines" is very short, so the probability of error in transmitting data is very small, but if the two machines to communicate are far apart (network), then the probability of error in transmitting data will be greatly increased Increase.

Therefore, the essence of the unreliable problem of network transmission is that the distance becomes longer.

  • unreliable problem scenario

Packet loss, out-of-sequence (network congestion), parity error (bit flipping), duplication

How do you make sure the other person hears what you say? The answer is to get a reply (response) from the other party. Only when the response is received can the historical message be guaranteed to be received by the other party. It is only reliable if the response is confirmed.
However, there must be the latest news in the communication between the two parties, and the latest news generally cannot be guaranteed to be reliable .

It can be seen from the above description that there is no absolute reliability , only relative reliability.

One of the mechanisms for TCP to ensure reliability is the acknowledgment response mechanism.

Therefore, when the two parties communicate, in addition to the normal data segment, the confirmation data segment may also be included .
insert image description here
As shown in the figure, the communication between the two parties uses a serial method, that is, the data will continue to be sent only after the confirmation response is received. Such efficiency can be imagined to be very low.

This is not the case in actual work, but one party sends multiple data segments at the same time , as long as all data segments are answered.

insert image description here
But there will be a problem at this time, that is, the order in which these data segments arrive at the opposite side is not necessarily the order in which they are sent .
For example, four data segments are sent, but only three confirmation responses are received, so how do you know which data segment failed to send?

1.4.2 32-bit serial number

The solution to the above problem is the 32-bit sequence number field in the TCP header .
Each data segment sent by TCP is numbered, and this number is called a sequence number.

In this way, the order of the transmitted data segments is guaranteed.
for example:

Assuming that you want to send 4000 bytes of data and send it in four times, you need to send four TCP messages. At this time, the 32-bit serial number in the four TCP messages is the serial number of the first byte in the sent data. Therefore, fill in 1, 1001, 2001 and 3001 respectively.
insert image description here
When host B receives the four TCP packets, it can use the sequence number fields in the four headers to sort them.

1.4.2 32-bit confirmation sequence number

The 32-bit confirmation sequence number in the TCP header tells the peer what data I have received so far, and where should your data be sent next time .

insert image description here

For example, the serial number of the data segment sent by the client is 1, and the message contains 1000 bytes of data. If the server receives it, it will fill in the 32-bit confirmation serial number in the response header returned to the client as 1001, then This 1001 has two meanings:

1️⃣ Tell host A that I have received the byte data whose serial number is before 1001.
2️⃣ Tell host A that the next time it sends data to me, it should start from the byte data with serial number 1001.

Through the sequence number and the confirmation sequence number, it can be indicated that
the receiver has received all (continuous) messages before the ACK sequence number (confirmation sequence number).

For example:
the data sent is 1000 bytes in size.
insert image description here
If the data segment with serial number 1001 is not delivered to host B, and the others are delivered to host B, then the confirmation serial numbers of data segments after 1001 can only be filled with 1001. This indicates that all data segments with sequence numbers before 1001 have been received.


  • Why are there two sets of serial numbers?

Why can't you compress the 32-bit serial number and the 32-bit confirmation serial number into one field, fill in the serial number when sending, and fill in the confirmation serial number when returning?

If one end sends data and one end receives data, of course this method can be used, but TCP is full-duplex , and both parties may send messages to each other at the same time.
In the messages sent by both parties, not only need to fill in the 32-bit sequence number to indicate the sequence number of the data currently sent by themselves. It is also necessary to fill in the 32-bit confirmation sequence number to confirm the data sent by the other party last time, and tell the other party which byte sequence number should start sending next time.

1.5 Window size

First of all, we must know that TCP has its own sending buffer and receiving buffer.

When the upper layer calls write/send, it actually copies the data to the send buffer.
When the upper layer calls read/recv, it actually copies the data to the receive buffer.

This will lead to two situations:
the sending data is too fast , causing the receiving buffer to be full, and the remaining messages will be discarded.
Sending data is too slow , which affects the business processing of the upper layer.

In this case, TCP will control the transmission speed . Therefore, it is necessary to know the acceptance capacity of the opponent's buffer zone . That is, the size of the remaining space in the receive buffer.

  • How do you know the remaining space in the other party's buffer?

Fill in the size of the sender 's remaining buffer through the fields of the 16-bit window . That is, the ability of the current host to receive data. Then the receiver will adjust the sending speed after knowing it.

  • The larger the window size field, the stronger the ability of the receiving end to receive data, and at this time the sending end can increase the speed of sending data.

  • The smaller the window size field, the weaker the ability of the receiving end to receive data. At this time, the sending end can reduce the speed of sending data.

  • If the value of the window size is 0, it means that the receiving buffer of the receiving end is full, and the sending end should not send any more data at this time.

  • add another point

Because the window has 16 bits, the maximum memory bit of the window is 64k. If the amount of data is too large, you can use some options in the option field to expand the window.

1.6 Six flag bits

  • Why is there a flag?

There are also types of TCP messages , such as ordinary messages for normal communication, messages sent when a connection is established, and messages sent when a connection is disconnected.
Corresponding actions are required for these different types of messages . For example, if you receive a message for normal communication, you need to put the data in the buffer. If you receive a message for establishing a connection, you need to perform Three handshakes.
The six flags are to distinguish different types.

insert image description here

  • SYN

The SYN in the message is set to 1, indicating that the message is a connection establishment request message .
SYN is set only during the connection establishment phase, and SYN is not set during normal communication.

  • FIN

The FIN in the message is set to 1, indicating that the message is a disconnection request message .
FIN is set only during the disconnection phase, and FIN is not set during normal communication.

  • ACK

The ACK in the message is set to 1, indicating that the message can confirm the received message .
Generally, except that the first request message does not set ACK, the rest of the messages will basically set ACK, because the sent data itself has a certain ability to confirm the data sent by the other party, so when the two parties communicate with each other, they can Respond to the data sent by the other party last time .

  • PSH

The PSH in the message is set to 1, which is to tell the upper layer of the other party to take the data as soon as possible .
Because the window value of the receiver may be relatively small, and the sender needs to block and wait for the receiver to take the data in the buffer before sending, then the PSH flag can be used to urge at this time.

  • URG & 16-bit urgent pointer

The URG in the message is set to 1, which tells the other party that this data needs to be processed as soon as possible .
Because TCP is a reliable transmission, the data segments must be received by the receiver in an orderly manner, but if there are data segments that want to jump in the queue, you can set URG.

It should be noted here that it does not mean that all parts of the payload of this data segment must be processed urgently, it may be only a small part, so how to find the location?

One field in the TCP header is the urgent pointer . It fills in the offset, and you can find the urgent data . Because there is only one urgent pointer, it can only identify one location in the data segment, so only one byte of urgent data can be sent .

URG is generally used to send out-of-band data , it does not need to go through the TCP stream, because the receiver processes it directly. For example, we have sent a lot of data now, and the other party is processing it, but we suddenly find that the data is not needed. At this time, we can send urgent out-of-band data and close the socket.

  • RST

The RST in the message is set to 1, indicating that the other party needs to re-establish the connection.
When the connection between the two communicating parties is not established, one party sends data to the other party, and at this time the RST flag in the response message sent by the other party will be set to 1, indicating that the other party is required to re-establish the connection.
Another situation is that the network cable of the server is disconnected and the connection is disconnected, but the client does not know and will send a message. At this time, the server will set RST to 1 and let the client establish a new connection.

2. Acknowledgment response mechanism (ACK)

One of the mechanisms for TCP to ensure reliability is the acknowledgment response mechanism.

The confirmation response mechanism is realized by the 32-bit sequence number and the 32-bit confirmation sequence number in the TCP header. The received confirmation response indicates that all the data before the sequence number have been received .

  • How to understand that TCP numbers each byte of data?

insert image description here

We can regard the sending buffer of the transport layer as an array. When we copy the data of the application layer to the sending buffer, each byte of data will naturally have a number (subscript), but this Subscripts do not start from 0, but increase from 1. .

insert image description here

The serial number filled in the header when the sender sends data is actually the subscript corresponding to the first byte of data in the sending buffer among the several bytes of data sent .
When the receiver receives the data and responds, the confirmation sequence number in the response header is actually the subscript corresponding to the next position of the last valid data received in the receiving buffer .
After the sender receives the response from the receiver, the sender can continue sending from the position where the subscript is the confirmation sequence number .

2.1 Timeout retransmission mechanism

2.1.1 Two cases of packet loss

  • case one

The sent data message is lost. At this time, if the sender cannot receive the corresponding response message within a certain period of time, it will retransmit after a timeout.
insert image description here

  • case two

The response packet sent by the other party is lost, and the sender will retransmit it after a timeout because it cannot receive the corresponding response packet.
insert image description here


When packet loss occurs, the sender will not know whether the packet was lost when the data segment was sent or the packet was lost when the response was confirmed. So the sender can only perform timeout retransmission.

Then if it is the second packet loss situation, the receiver may receive the same data . Because repeated packets are also unreliable, host B needs to deduplicate ( by serial number).

Because timeout retransmission is required, the data will not be cleared immediately after it is sent out, but will be kept for a period of time. This part of the data in the sending buffer cannot be deleted or overwritten until the response message of the data is received.

2.1.2 Waiting time for timeout retransmission

We use the timeout to judge whether the packet is lost, so how long is this time?

We know that the time to send data is determined by the network conditions, and the network will continue to change due to changes in the environment. So the time for retransmission overtime must not be fixed .

In order to ensure relatively high-performance communication in any environment, TCP will dynamically calculate the maximum timeout period:

In Linux (the same is true for BSD Unix and Windows), the timeout is controlled with a unit of 500ms , and the timeout time for each timeout retransmission is an integer multiple of 500ms.
If there is still no response after retransmission, the waiting time for the next retransmission is 2 × 500 2\times5002×500ms.
If there is still no response, then the waiting time for the next retransmission is 4 × 500 4\times5004 × 500ms. And so on, increasing exponentially.
When a certain number of retransmissions is accumulated, TCP will consider that the network or the peer host is abnormal, and then forcibly close the connection.

3. Connection management mechanism

3.1 Connection-oriented related concepts

Connection-oriented means that the two hosts to be connected respectively open up an area on their own hosts, and then jointly maintain the two areas through the TCP protocol to achieve the reliability of network transmission. Therefore, connection-oriented is to ensure the reliability of data .

It may not be very easy to understand this way, so let's compare it for connectionless understanding. The UDP protocol is a typical connectionless protocol. During network communication between two hosts, it is not necessary to know whether the target host ip and target port exist, and can be directly sent to the network according to the defined ip and port, while connection-oriented is the first According to the given target ip and port number, some messages are sent to the network to confirm whether the target host exists. If it does not exist, the next network communication cannot be completed.

Therefore, connection-oriented requires a connection to be established before network communication can be carried out. Establishing a connection is to determine the existence of the other party and negotiate some control quantities to ensure that the subsequent communication is reliable.

  • The concepts of connectionless and connection-oriented protocols

Packets in a connectionless protocol are called datagrams, and each packet is individually addressed and sent by an application. From the protocol's point of view, each datagram is an independent entity and has no relation to any other datagram transmitted between two identical peer entities, which means that the protocol is likely to be unreliable . In other words, the network will do its best to deliver each datagram, but there is no guarantee that the datagram will not be lost, delayed, or out of order.
  On the other hand, connection-oriented protocols maintain state between packets , and applications using such protocols typically have long-lived conversations. Remembering these states, the protocol can provide reliable transport. For example, the sender can remember which data has been sent but not acknowledged, and when the data was sent. If no acknowledgment is received within a certain time interval, the sender can retransmit the data. The receiving end can remember which data has been received and discard duplicate data. If the packet arrives out of order, the receiving end can save it until the logically preceding packet arrives.
  A typical connection-oriented protocol has three phases. In the first phase, a connection is established between peer entities. Next comes the data transfer phase, in which data is transferred between peer entities. Finally, when the peer entity completes the data transfer, the connection is torn down.
  A standard analogy is: using a connectionless protocol is like sending a letter, while using a connection-oriented protocol is like making a phone call.

  • Why does TCP establish a connection?

Because reliability is to be guaranteed, the connection cannot directly guarantee reliability .
As long as a connection is established, there will be a connection structure , which includes policies such as timeout retransmission, in-order arrival, flow control, congestion control, etc., as well as communication status and message attributes. The connection structure is the basis for ensuring data reliability. The three-way handshake is the basis for establishing a connection structure, so the three-way handshake indirectly guarantees reliability .
UDP does not require communication status and message attributes, etc., so there is no need to establish a connection.

3.2 Three-way handshake

The two parties need to establish a connection before performing TCP communication. The process of establishing a connection is called a three-way handshake.

insert image description here
The first handshake : The SYN bit in the message sent by the client to the server is set to 1, indicating that it requests to establish a connection with the server .
The second handshake : After receiving the connection request message from the client, the server initiates a connection establishment request to the client and responds to the connection request sent by the client . At this time, the message sent by the server to the client Both the SYN bit and the ACK bit are set to 1.
The third handshake : After receiving the message from the server, the client knows that the server has received the connection establishment request sent by itself, and requests to establish a connection with itself, and finally the client responds to .

  • Why the three-way handshake?

The establishment of the connection is not 100% successful. Any one of the three handshakes may lose packets. The first two handshakes can be guaranteed to be received by the other party, because they all have responses. If not, it will be timed out and retransmitted. What if the three ACK responses are lost?
insert image description here
When the client sends an ACK response , it will think that the three-way handshake has been successfully established. If the ACK response is lost at this time, the connection establishment will fail at this time, but there is no need to worry at all. There are solutions:

For example, if the server does not receive a response, it will retransmit the second handshake, and the client will realize that the connection has not been successfully established.
In addition, the client has already sent data, because the message can only be sent after the three-way handshake is successful, so the server will return an RST message to request the client to re-establish the connection.

  • Does one and two handshakes work?

Let me talk about it first. Once the client sends a connection request, it thinks that the connection is established, and the server will maintain the connection. Then, if the client writes a multi-thread to continuously send connection requests to the server, the server will think that these connections have been established, and the server will maintain these connections. If there are too many connections, the resources will be full. The situation is called a SYN flood . Secondly, it is impossible to verify that the full-duplex communication channel is unobstructed (the client cannot guarantee that the message sent by itself will be received by the server), so it is impossible to complete the connection with a handshake.
What about two handshakes? The moment the second handshake sends out the message, the server thinks that the connection has been established, and the client may not have received the message at all, so the same problem as the handshake (SYN flood) will occur. Secondly, it is impossible to verify that the full-duplex communication channel is unobstructed (the server cannot prove that it can send messages and be received by the other party).

The essential reason why the above will lead to a stand-alone attack on the server is that the server has already established a connection when the client has not yet established a connection. Therefore, the client must first establish a connection, and the server must establish a connection.


Now you can explain why there is a three-way handshake:

The three-way handshake is to verify that the full-duplex communication channel is unobstructed with the minimum cost.
insert image description here
In order for the server to establish a connection, the client must first establish a connection , so it can effectively avoid the problem of single host attacking the server.

  • ddos attack
    Here, it should be noted that the three-way handshake cannot solve security problems. For example, a large number of hosts sending TCP requests at the same time will also cause the server to crash. Assume that hackers have hacked many hosts and sent TCP connection requests to the server at the same time:
    insert image description here
    at this time, if the client sends a connection request, the server will not be able to provide services (cannot be connected), this attack method is a ddos ​​attack (denial of service attack) ).

  • Can a four-way handshake work?

It is possible, but it is not necessary and will reduce efficiency.
The server sends the SYN and ACK of the second handshake separately. Since these two can be combined and sent, there is no need to send them separately and twice. For optimization purposes, two and three of the four handshakes can be combined .

  • State changes in the three-way handshake

insert image description here

At the beginning, both the client and the server are in the CLOSED state.
1️⃣ In order to receive the connection request from the client, the server needs to change from the CLOSED state to the LISTEN state.
2️⃣ At this point, the client can initiate a three-way handshake to the server. When the client initiates the first handshake, the state changes to the SYN_SENT state.
3️⃣ After receiving the connection request from the client, the server in the LISTEN state puts the connection into the kernel waiting queue, and initiates a second handshake to the client. At this time, the state of the server changes to SYN_RCVD.
4️⃣ When the client receives the second handshake from the server, it then sends the last handshake to the server. At this time, the connection of the client has been established and the status becomes ESTABLISHED.
5️⃣ After the server receives the last handshake from the client, the connection is established successfully, and the status of the server becomes ESTABLISHED at this time.

  • The relationship between sockets and the three-way handshake

Before the client initiates a connection establishment request, the server needs to enter the LISTEN state first. At this time, the server needs to call the corresponding listen function to set the socket attribute .
When the server enters the LISTEN state, the client can initiate a three-way handshake to the server. At this time, the client should call the connect function.
It should be noted that the connect function does not participate in the underlying three-way handshake , and the function of the connect function is only to initiate a three-way handshake. When the connect function returns, either the bottom layer has successfully completed the three-way handshake connection establishment, or the bottom layer three-way handshake has failed.
If the server and the client successfully complete the three-way handshake, a connection will be established on the server at this time, but this connection is in the waiting queue of the kernel, and the server needs to obtain the established connection by calling the accept function .
After the server side obtains the established connection, the two parties can exchange data by calling the read/recv function and write/send function.

3.3 Four waves

Since both parties need to maintain the connection cost, the connection needs to be disconnected after the TCP communication between the two parties ends. We call this process of disconnecting the connection four times.

If you don't want to send a message to the other party, you need to send a disconnection request. For example, if the client wants to disconnect:

The client sends a disconnection request, and the server returns an ACK response, which has already waved twice.
The server also disconnects, sends a request, and the client returns an ACK response, which is four waved hands.

There is a problem here. Since the client has already stated that it will not send data to the server, why will it send a confirmation response later?

Note that the data that does not send data here refers to user data (the application layer does not send data). It does not mean that there is no packet interaction at the bottom layer.
insert image description here
Note that the two or three waved hands here may be combined into one.

  • State changes when waved four times

insert image description here

Both the client and the server are in the ESTABLISHED state after the connection is established before waving.
1️⃣ The client actively initiates a connection disconnection request to the server in order to disconnect from the server, and the status of the client becomes FIN_WAIT_1 at this time.
2️⃣ The server responds to the connection disconnection request sent by the client, and the status of the server becomes CLOSE_WAIT at this time.
3️⃣ When the server has no data to send to the client, the server will initiate a disconnection request to the client and wait for the last ACK to arrive. At this time, the server status changes to LASE_ACK.
4️⃣ After receiving the third wave from the server, the client will send the last response message to the server, and the client enters the TIME_WAIT state.
5️⃣ When the server receives the last response message from the client, the server will completely close the connection and become CLOSED.
6️⃣ The client will wait for a 2MSL (Maximum Segment Lifetime, maximum packet lifetime) before entering the CLOSED state.

  • The relationship between sockets and four waves

When the client initiates a disconnection request, the client actively calls the close function to close the socket.
When the server initiates a disconnection request, the server actively calls the close function to close the socket.
A close corresponds to waving twice, and both parties must call close, so it is waving four times.

The final state of the party that actively disconnects is TIME_WAIT,
and the state of the party that is passively disconnected after waving twice is CLOSE_WAIT

We mainly study these two states:

  • CLOSE_WAIT state

When the two parties wave their hands four times, if only the client calls the close function, but the server does not call the close function (it will not send FIN) , the server will enter the CLOSE_WAIT state , and the client will enter the FIN_WAIT_2 state.
If the server does not actively close unnecessary file descriptors, there will be a large number of connections in the CLOSE_WAIT state on the server side, and each connection will occupy the resources of the server, which will eventually lead to fewer and fewer available resources on the server .
Therefore, when writing the network socket code, if you find that there are a large number of connections in the CLOSE_WAIT state on the server side , you can check whether the server did not call the close function in time to close the corresponding file descriptor .

  • TIME_WAIT state

If packet loss occurs three times before the first wave, we can use the timeout retransmission mechanism. The most worrying thing is naturally the packet loss during the fourth ACK response

If the client enters the CLOSED state immediately after sending out the fourth wave, at this time, although the server has performed a timeout retransmission, it has not received a response from the client, because the client has already closed the connection .
The server does not get a response after several timeout retransmissions, and will eventually close the corresponding connection. However, this abandoned connection needs to be maintained during the server’s continuous timeout retransmission, which is very unfriendly to the server. .
In order to avoid this situation, the client does not enter the CLOSED state immediately after waving four times, but enters the TIME_WAIT state to wait. At this time, if the packet of the fourth waving packet is lost, the client can also receive it from the server. The resent message then responds.

Therefore, TIME_WAIT will ensure that the last ACK response is received by the other party as much as possible, and the message sent before the disconnection may remain in the network, then TIME_WAIT can ensure that the data on the communication channel between the two parties dissipates as much as possible in the network.

If the TIME_WAIT state is too long, the connection will be automatically closed, so how long is this time?

  • TIME_WAIT wait time

The TCP protocol stipulates that the party that actively closes the connection must be in the TIME_WAIT state after four waved hands, and wait for two MSLs ( the maximum lifetime of the message) before entering the CLOSED state.

We call the maximum time elapsed from the sender to the receiver MSL.

The reason why the waiting period of TIME_WAIT is set to two MSLs:

MSL is the maximum lifetime of a TCP message, so if the TIME_WAIT state persists for 2MSL, it can ensure that the unreceived or late message segments in both transmission directions have disappeared.
At the same time, it is also the theoretically guaranteed time for the reliable arrival of the last packet.

The default configuration value on Centos7 is 60s

3.4 bind failed

  • bind binding failure reason

In the previous code, it is found that if the server actively disconnects, it will fail to bind for a period of time, because the server is in the TIME_WAIT state, and the port and connection still exist, so the binding will fail (the port is occupied).

The hazards of not restarting the server immediately:

For example, during Double Eleven, too many connections caused the server to hang up. At this time, we want to restart immediately but have to wait for a long time (60S), which will cause huge losses.

So how to solve this problem?

  • Set socket multiplexing

Use setsockopt() to set the socket descriptor option SO_REUSEADDRto 1, which means that multiple socket descriptors with the same port number but different IP addresses are allowed to be created.

int opt = 1;
setsockopt(listenfd, SOL_SOCKET, SO_ REUSEADDR, &opt, sizeof(opt)) ;

4. Flow control

TCP supports determining the speed at which the sender sends data based on the ability of the receiver to receive data. This mechanism is called flow control .

When talking about the 16-bit window size above, it is said that the speed of data transmission should be moderate, so there is a 16-bit window size in the header to control the transmission speed. By filling in the 16-bit window size, tell the peer end its receiving ability (receive buffer how much is left in the area).

But how does the sender know the receptivity of the other party for the first time?

Before the communication, there has been a three-way handshake, so during the handshake, the window size can be exchanged with each other.

How the window size controls the transmission speed has already been mentioned above, so I won’t repeat it.

Here is one more point:
when the sending end knows that the ability of the receiving end to receive data is 0, it will stop sending data. At this time, the sending end will know when it can continue to send data through the following two methods:

Waiting for notification : After the upper layer of the receiving end reads the data in the receiving buffer, the receiving end sends a TCP message to the sending end, actively informs the sending end of its window size, and the sending end learns that there is space in the receiving buffer of the receiving end You can continue to send data.
Active inquiry : The sender sends a message ( window detection ) to the receiver at regular intervals . The message does not carry valid data, just to inquire about the window size of the sender, and the sender can continue until the receiving buffer of the receiver has space. Data is sent.

These two strategies are used at the same time in practice, whichever comes first will be processed.

Five, sliding window

Before we send data but do not receive a response, we must temporarily save the data to support possible subsequent timeout retransmissions. So where is it saved?

The answer is send buffer

As mentioned earlier, multiple messages are generally sent in parallel, that is, the next message has already been sent before the response is received, the purpose is to improve efficiency.

Then we can divide the send buffer into three parts:

  • Data that has been sent and an ACK has been received.
  • Data that has been sent but has not received an ACK.
  • Data that has not been sent yet.

insert image description here
The essence of the sliding window is to send a part of the buffer. Re-divide the three-section interval by continuously sliding.

  • How to understand sliding window?

insert image description here
Think of the buffer as an array, then the movement of the sliding window is actually the updating of the subscript.

  • The size of the sliding window

The size of the sliding window is related to the receiving ability of the other party. No matter how you slide in the future, it is necessary to ensure that the other party can receive normally (the size of the sliding window <= the receiving capacity of the other party). The specific size will be discussed later when we talk about congestion control.

  • Will the sliding window move to the right as a whole?

Maybe swipe right, maybe stay the same. Because it is possible that the data has not been taken away in the receiving buffer of the other party.

  • How does the sliding window slide?

When the sender receives the response from the other party, if the confirmation sequence number in the response is ACK_SEQ, and the window size is tcp_win, then you can update win_start to ACK_SEQ, and update win_end to win_start + tcp_win.

insert image description here
Then if the upper layer of the other party does not take the data all the time, but keeps sending and sending, it will lead to tcp_winsmaller and smaller, that is, the left side of the sliding window has been moving backwards, but the right side remains unchanged, and finally the sliding window will become 0 .

  • What if the received ACK is not the acknowledgment of the leftmost data, but the middle one?

Because TCP is a reliable transmission, there is no possibility of out-of-order, so if a response to intermediate data is received, packet loss must have occurred.

  • packet loss problem

Packet loss can be divided into two situations:
1️⃣ Data is not lost, ACK response is lost
According to the definition of the confirmation sequence number , if 3001 is received, it means that all the data before 3000 has been received, then move win_start to 3001. Can.

2️⃣ The data is really lost.
insert image description here
When the data packets of 1001-2000 are lost, the sender will always receive the response message with the confirmation sequence number 1001, which is to remind the sender that "the next time the byte data with the sequence number 1001 should be sent ".

And if three identical acknowledgment numbers are received consecutively, the retransmission mechanism will be triggered. This is also called fast retransmission:
fast retransmission is the ability to quickly retransmit data. When the sender receives three consecutive identical responses, fast retransmission will be triggered, unlike timeout retransmission, which needs to be set by retransmission timing The server will retransmit after a fixed time.

To sum up:
the left end of the sliding window is determined by the confirmation sequence number, and the right end is determined by the left end and the remaining space of the receiving buffer of the other party.

  • sliding window space problem

The sliding window keeps sliding to the right, so what should I do when the space is always used up?
The send buffer is organized into a ring structure by the kernel .

6. Congestion Control

It is normal to lose one or two out of 1000 messages, just resend them, but if 999 out of 1000 messages are lost, do we need to retransmit them?

For example, if only one of the 40 students in a class fails the exam, it is most likely the problem of this person. If 39 students fail, is it still the problem of the students?

For such a large area of ​​packet loss , TCP will consider the problem of network congestion . At this time, retransmission is useless, and retransmission will only aggravate the problem of network failure.

  • How to solve the problem of network congestion?

When the network is congested, although the communication parties cannot propose a particularly effective solution, the hosts of both parties can do so without increasing the burden on the network.
If a large number of packets are lost during communication between the two parties, these packets should not be retransmitted immediately, but should send less data or even no data, and wait for the network conditions to recover before the two parties slowly resume the data transmission rate .

It should be noted that when the network is congested , not only one host is affected , but almost all hosts in the network are affected. At this time, all hosts using the TCP transmission control protocol will execute the congestion avoidance algorithm.

  • congestion control

TCP introduces a slow start mechanism, which sends a small amount of data to explore the path at the beginning of communication, finds out the current network congestion state, and then decides how fast to transmit data.

Before talking about the slow start mechanism, introduce a concept: the congestion window
is actually a number, and when it exceeds this number, it may cause network congestion.
At the beginning, it is defined as 1. Every time an ACK response is received, 1 is added. Every time a data packet is sent, the congestion window is compared with the window size fed back by the receiving host , and the smaller value is taken as the actual transmission. The window size of the data, that is, the size of the sliding window .

Sliding window size = min(congestion window, window size (acceptance of the peer))


The value of the congestion window is increased by 1 every time an ACK response is received. At this time, the congestion window grows exponentially. If the ability of the other party to receive data is not considered first, then the size of the sliding window depends only on the size of the congestion window. At this time, the size of the congestion window changes to: 1 2 4 8 ...
But we know that exponential growth is very scary, and it may cause network congestion again at this time.

At this time, a slow start threshold is introduced . When the size of the congestion window exceeds this threshold, it will no longer grow exponentially, but grow linearly.
When TCP starts to start, the slow start threshold is set to the maximum value of the other party's window size.
At each timeout retransmission, the slow start threshold will become half of the current congestion window , and the value of the congestion window will be reset to 1, and so on.

As shown in the figure:
insert image description here
the slow start in the early stage is to allow the network to recover autonomously, and the exponential growth in the later stage is to restore communication as soon as possible.

7. Delayed Response & Piggybacked Response

  • delayed response

Now there is a lot of data in the receiver's buffer, but the application layer has a high probability that it will take the data away immediately. If you wait for a while before replying, you can return a larger window .

It should be noted that the purpose of delayed response is not to ensure reliability, but to allow a little time for the data in the receiving buffer to be consumed by the upper application layer as much as possible. At this time, the window size reported when ACK response is It can be larger, so as to increase the network throughput and improve the data transmission efficiency.

Also, not all packets can be delayed in acknowledgment.

  • Quantity limit : Respond once for every N packets.
  • Time limit : Respond once if the maximum delay time is exceeded (this time will not cause false timeout retransmission).

The specific number of delayed responses and the timeout time vary depending on the operating system. Generally, N is 2, and the timeout time is 200ms.


  • piggybacking

We know that the receiver needs to give a response to the sender after receiving the data. If the receiver also wants to send data, can they return it together directly?

The most intuitive aspect of piggybacking is actually the efficiency of sending data. At this time, the two parties no longer need to send simple confirmation messages when communicating.

Eight, TCP derivative problems

8.1 Byte stream oriented

When creating a TCP socket, a send buffer and a receive buffer are created in the kernel at the same time.
Due to the existence of the buffer, the reading and writing of the TCP program does not need to match one by one, for example:

  • When writing 100 bytes of data, you can call write once to write 100 bytes, or you can call write 100 times to write one byte each time.
  • When reading 100 bytes of data, there is no need to consider how to write it. You can read 100 bytes at a time, or you can read one byte at a time and repeat 100 times.

In fact, for TCP, it doesn't care what data is in the sending buffer . From the perspective of TCP, these are just byte data one by one. Its task is to send these data to the receiving buffer of the other party accurately. That's enough, and how to interpret the data is completely determined by the upper-layer application, which is called byte stream-oriented .

Here you can compare UDP. UDP is not byte-oriented. It must be read once when it is sent once, and it must be read ten times when it is sent 10 times. This kind of protocol with clear boundaries between packets and packets at the transport layer is called datagram-oriented .

8.2 Sticky package problem

  • What is a sticky bag?

Because TCP is oriented to byte streams, the application layer needs to separate these packets. If the processing is not good, there will be over-reading or under-reading that will affect subsequent messages. This problem is called sticky packets.

  • How to solve the sticky bag problem?

The essence of solving the sticky packet problem is to determine the boundary between packets.
For a fixed-length package , it is sufficient to ensure that it is read at a fixed size every time.
For a variable-length packet , a field for the total length of the packet can be agreed at the header position, so that the end position of the packet can be known. For example, the Content-Length attribute is included in the HTTP header, indicating the length of the text.
For variable-length packets , it is also possible to use an explicit delimiter between packets. Because the application layer protocol is determined by the programmer himself, as long as the delimiter does not conflict with the text.

8.3 Abnormal TCP connection

  • process terminated

Two processes that have already established a connection, one of which suddenly hangs up, what will happen to the established connection at this time?

In fact, the connection is also a file, and the file descriptor follows the process. When the process exits, the operating system will close the file. Therefore, the operating system will normally wave four times to disconnect , which is no different from closing by yourself.

  • host reboot

When the host is restarted, the operating system will kill all processes first and then shut down and restart . Therefore, the situation of machine restart and process termination is the same. At this time, the operating systems of both parties will normally complete four handshakes , and then release the corresponding connection resources. .

  • Unplug the network cable/cut off the power

When the client is offline, the server cannot know that the client is offline in a short time , so the server will maintain the connection with the client, but this connection will not be maintained, because TCP has a keep-alive strategy of.
A normal party will keep asking the other party whether the connection still exists, and if it is not, it will directly disconnect.

Nine. Summary

The TCP protocol is so complicated because TCP must not only ensure reliability, but also improve performance as much as possible.

  • reliability

Checksum
Sequence Number
Acknowledgment
Timeout Retransmission
Connection Management
Flow Control
Congestion Control

  • improve performance

sliding window.
Fast retransmission.
Delayed response.
Piggybacking.

  • How to achieve reliable transmission with UDP?

In fact, it refers to the reliability above.



Guess you like

Origin blog.csdn.net/qq_66314292/article/details/131731784