I. Introduction
These days wrote four TCP
series of blog, this one is the fifth chapter, is also expected to be the last one this time, and I begin to finish the study of this network layer. For if my other TCP
interest aspects of the blog, go to my personal blog can be found in the computer network that category. This blog just to talk about TCP
is the means by which to ensure reliable data transmission.
Second, the text
2.1 network transmission problems exist
Research TCP
before how to ensure reliable data transmission, we first list what there is a network transmission problem, only to find the problem, can address the problem, find ways to cope. TCP
Is to rely on the network layer IP
protocol to send data, while the IP
protocol is an unreliable protocol, it is just best effort transmission, but does not guarantee data integrity can reach, can not even guarantee that the data can reach. Meanwhile, the maximum network unit (MTU) allowed transmission is limited, typically 1500
bytes. Therefore, TCP
in order to transmit data larger than this, it is necessary to split the data into a data segment is transmitted. For these reasons, the network will transmit the following problems:
- Data is corrupted during transmission, the bits
0
into1
, or1
into0
; - Data loss during transmission does not reach a destination;
- A plurality of segments out of order packets arrive at the receiving side, the receiving side can not correctly composition data;
TCP
The realization is basically about the three questions above, as well as how to improve the transfer rate achieved. Here we have to talk TCP
in order to deal with these problems above what had been done thing.
2.2 TCP solve data corruption
We first discuss the first question, the data is damaged. Check if the data is corrupt data check mode, the following is a TCP packet format, which can be seen a 16
field bits, called checksum , which is used if the recipient error check data part.
The name of the field is called a checksum , because TCP
of the way is to check the data checksum algorithm, the process of the algorithm is as follows:
- The checksum field is set
0
, then the data portions to16
bits as a unit, split; - After several units split binary addition, if the addition result to the first carry
17
bits, the17
bits to the first bit (in fact, complement and calculation results, also known as rewind), the final added inverted (0
turned into1
,1
becomes0
), and then into the checksum field; - After the segment is sent to the destination, the destination host also above two steps, then the result of the operation with the header checksum field are added, and then negated if the result is
0
, that the data is no error;
This is how to play a role in checking it? Is very simple, it is assumed in the process of data transmission, data is not in error, the transmitting end and receiving end, the obtained result is the same. The checksum field is the result of anti-code, that is to say, the result is 1
the position, it is in the anti-code 0
, and the result for 0
the position, that is, anti-code 1
. That is, in the case where no error in the data, and a checksum field results are added, must be full 1
, and then re-inverted, that is, 0
the. So long as the end result is not obtained 0
, the recipient is considered data errors (after error how to deal with, I do not find relevant information, ashamed).
However, the above algorithm will be able to verify whether the data is a mistake? The answer is no. Because of this fundamental check, but is in fact the sum data, and then determine that there is no change in it. And we all know that 1+2 == 2+1
, 1 + 4 == 2 + 3
relying solely on the summation, there is no guarantee that there is no change in the data. As long as many data errors have occurred, and cancel each other out, this algorithm will not detect. But TCP
still using this algorithm, I personally think the reason is simple, and the probability of error in the data network is not high, but many of the probability of error and cancel each other even smaller, so the reliability of this algorithm is still relatively high of.
2.3 TCP packet loss problem to solve
TCP
The solution is data loss retransmission timeout . TCP
Will maintain a timer, and setting a timeout, when sending a TCP
rear segment, has not been ACK packet within the timeout, the sender that the data is lost, then the retransmission of the missing segment, until receive. Due to TCP
the use of a pipeline transmission , at the same time, there may be more have been sent but not received ACK
segment, so logically speaking, TCP
to maintain multiple timers, a binding for each segment, but doing so will have a greater cost, but also very complex for timer management. So TCP
in fact will only maintain a timer, the current record is the first to be sent, but has not yet received ACK
segment packets. When this time-out segment, the sender will retransmit segment, and restart the timer; if the received segment of the ACK
packet, also restart the timer, but this time is the earliest sent but not yet acknowledged packets segment has changed, this time recorded is the time of transmission of this new segment of. In addition, triggering fast retransmit , the timer will restart. About TCP
pipeline transmission, I can refer to this blog post: https://www.cnblogs.com/tuyang1129/p/12450978.html .
Here there is a complex problem, the timeout will be how to set? Not difficult to think that the timeout should be slightly larger than the round-trip time data (RTT), for example, data from the sender to the receiver to the ACK
message, the share 200ms
that the timeout should be slightly larger than this value, for example 400ms
. But the network is unstable, for each packet, because different paths through the different levels of network congestion, more or less round-trip time will change. So for this timeout, it should be calculated based on the average RTT. But direct statistical averaging just too rough for TCP
it, a set of complex algorithms to calculate the timeout period.
To calculate the RTT
approximate average, we must first have the sample value, assuming sample RTT defined SampleRTT
. TCP
During operation procedures, may be measured at any one time SampleRTT
, i.e. a measurement packet sent to the receiver from ACK
the time used, then it is used to calculate RTT
the weighted average. Then over a period of time once again, and the new measured SampleRTT
for updating the weighted average. Assuming that the weighted average RTT
is defined as EstimatedRTT
in the TCP
specification, the calculation EstimatedRTT
formula is:
EstimatedRTT = (1- α)*EstimatedRTT + α * SampleRTT
(A formula)
Wherein SampleRTT
is the latest measured sample RTT
can be determined dynamically by the above formula RTT
weighted average. Due to the later measure SampleRTT
, the closer the network in the current situation, so the update EstimatedRTT
process, the latest SampleRTT
supposed to occupy more weight, so in the TCP
specification, it is recommended to α
set the value 1/8
, so the above formula is:
EstimatedRTT = 0.875 * EstimatedRTT + 0.125 * SampleRTT
And SampleRTT
the EstimatedRTT
fluctuation of FIG follows:
In addition to seeking RTT
the weighted average of the network RTT
changes are also necessary, after all, can be seen from the figure, the sample RTT
fluctuation is very severe, only EstimatedRTT
, not enough to make us an accurate estimate timeout. So we need to find SampleRTT
the EstimatedRTT
degree of deviation, which is similar to the variance , the variance to dynamically set timeout. Assuming that the variance is defined as DevRTT
is TCP
defined in the specification DevRTT
is calculated as follows:
DevRTT = (1 - β)* DevRTT + β * | SampleRTT - EstimatedRTT |
(Official d)
As can be seen from the above equation, if the SampleRTT
fluctuation is large, DevRTT
a large value will be, and vice versa will be small. In the TCP
specification β
recommended value it is 0.25
. We now know that the RTT
weighted average, but also know RTT
the fluctuations, it is time to consider how to set the timeout time. Not difficult to think, the timeout should be required than RTT
the weighted average, which is EstimatedRTT
bigger, so that most segments of RTT
less than this value, so as to avoid frequent retransmission timeout. That number should be big? Consider this, when the network choppy, represents the actual RTT
should be from EstimatedRTT
farther away, while less volatile, the actual RTT
should be close EstimatedRTT
, but this value fluctuations, we have calculated, that is a formula DevRTT
, so assumed that the timeout time is defined as TimeoutInterval
, TCP
specification recommends its value is calculated using:
TimeoutInterval = EstimatedRTT + 4 * DevRTT
(Formula III)
Thus, whether a weighted average or a fluctuation on a network are taken into account. The TCP
standard recommended initial TimeoutInterval
for the 1s
(then see from this part of the book, only to deeply understand mathematics powerful, Barbara theory into practical). Of course, the calculation for the time-out period, there are two exceptions:
- When a segment timeout, the sender will retransmit this segment, while back on the timer, the timeout will be set to the last two times , instead of using the value calculated with formula 3; or if timeout, continued retransmission timeout to triple again until the message is successfully received, while successfully received, it recalculated using the formula three timeout. The goal is to prevent multiple time-out leads to a continuous retransmission, resulting in network congestion is more serious, after all, the timeout is the result of network congestion.
- Sample records
RTT
, the selection will not be retransmitted segment as a sample, it is because, when the time-out event occurs, the sender does not know or because data is lost because of network latency and timeouts. If the segment is delayed because of a timeout, the retransmission packet, the delay time of theACK
packet arrives, the sender mistake retransmitted packet is received correctly, then an error will be detectedSampleRTT
.
In short, TCP
the timeout retransmission mechanism, a good solution to the problem of data loss occurs in the network. Moreover, in order to improve efficiency, TCP
there is a fast retransmission mechanism, depending on the particular circumstances, before the timeout to determine segment is lost, and then retransmitted, but not described in detail here.
2.4 TCP how to solve the data arrive out of order
The third problem is that the data arrive out of order problem. Due to limitations of the network, TCP
the larger the data must be split into a smaller packet segments, encapsulated into TCP
segments, one by one transmission. Due to the uncertainty of the transmission network (such as through different path, a packet is then retransmitted segment is lost, etc.), these segments may not be entirely in the order of arrival. Therefore, in order to be able to receive the complete data in the receiver, and can sequentially combine these messages, TCP
there must be a mechanism to solve this problem.
TCP
The method used is that each TCP
segment is assigned a serial number, serial number of each segment is sequentially increased, so that the receiving side according to the serial number, the received segment is determined to which part of the data, and whether it has received all of the parts. It goes from the above TCP
we can see the packet structure figure, which has a 32位
sequence number field. However, the TCP
number of segments of not 0,1,2,3
.... so simple, here we are concerned that TCP
is how to achieve such a number of mechanisms.
First we have to clear a point, TCP is a byte are numbered, rather than segments are numbered . TCP
Data to be transmitted to each byte of a given number, such as the first byte 0
number, the second one is a 1
number, and so on. And each segment is generally packaged more than one byte of data, so TCP
the segment, the data is encapsulated in this segment, the first byte of the sequence number. For example, say a sender to send 250
bytes of data, we assume that the initial sequence number from 0
the start, then this 250
byte number respectively 0-249
. Suppose each segment allows up package 100
bytes of data, so the first segment of the package 1
to the 100
byte sequence number of these bytes 0-99
, the first segment will be 0
put into it No. header portion; and a second segment encapsulated 100-199
number of bytes, so its serial number 100
; and the third segment encapsulated 200-249
number of bytes, so its serial number 200
. The above is TCP
the processing method of a sender number.
Let's talk about TCP
how this works in the recipient number mechanism. Similarly to the above three segment example, assuming the sender will send the above three segments, the first recipient receives a segment, found that the packet number is 0
, contains 100
bytes of data, then the receiver will confirm to the sender that the message has been received, and the acknowledgment is to use TCP
the header of the acknowledgment number field. Recipient receives the sequence number 0
, the length of 100
the byte segments, will ACK
fill in the acknowledgment packet sequence number 100
, said it has received is less than the number 100
of all bytes, the next desired received packets Group numbers is 100
; and the second segments arrive out of order, sequence number 100
, a length 100
byte, then sent back again to the recipient ACK
message, this time for the acknowledgment number 200
, said he received 200
all previous byte, it is desirable the next message sequence number is 200
; and the received sequence number 200
, length 50
byte segment, the acknowledgment sequence number is sent back 250
ACK packets.
These are sequentially receives the segment, it is assumed in the above case, the arrival order of the packets is three 0 -> 200--> 100, i.e., reorder, then the following occurs:
- Recipient receives the number
0
, the length of the100
segment bytes, loopback acknowledgment number of100
theACK
message; - Receiving a sequence number
200
, a length50
byte segments, then the recipient wishes to receive the100
number of packets arrive out of order so determined situation occurs, it does not deliver data to the upper section, but it into receive buffer; - Received sequence number
100
, the length of100
the segment,100
it is looking forward to the recipient segment numbers received, so it is received and delivered to the upper layer, while the serial number is found in the receive buffer200
segment, which is positive the receiver is expecting the next message is received, then taken out, the upper delivery, while the sender to sendACK
the message,ACK
the acknowledgment sequence number250
indicating that they have received250
all the bytes of the previous, the next message expected arrival the number is the text250
;
Through the above mechanism, the recipient successfully solved the problem of data arriving out of order. Of course, the use of this mechanism numbers, in fact, more than that simple, which also involves the TCP pipeline transport mechanism, if you want to know, can refer to my other blog post - https://www.cnblogs.com /tuyang1129/p/12450978.html .
There is also a problem, in the above example, I assume that number is from 0
the beginning, but not the case. In an actual implementation, the number is generally calculated by a special algorithm random value, there are two reasons for this:
- Assume that each
TCP
number are connected from the0
beginning, it assumes that the client Xianxiang server sends a message, there is no confirmation is received, immediately disconnect; but after disconnecting, and they immediately established a connection, and At this time, for the first time sent the segment just reach the server, what would happen. The server will think that this is the data sent over the connection to be established, and because the initial two numbers are connected0
, the receiver will put this segment reception. In order to reduce the probability of occurrence of similar situations,TCP
random initial sequence number, so that two different initial sequence number connected to a large probability, then when this situation occurs, the receiver can not receive the segment; - The second reason is for security reasons, if the initial numbers are fixed, that each segment of the serial number can speculate drawn, so there hackers can use it to simulate the sender sends a TCP packet, make attacks, such as sending a large number of connection requests, taking up server resources;
2.5 TCP flow control and congestion control
Flow control and congestion control, strictly speaking, is not a TCP
reliable transport mechanism, but, sort of relationship, so I mention.
- Flow control:
TCP
receiving a holding Fangkuai Wei receiving buffer for receiving data sent by the sender. However, the receiving buffer is not infinite, if the receive buffer is filled, the longer receive data at this time, will not be able to receive, it can only be discarded. So, in order to reduce the occurrence of this situation,TCP
the recipient need to inform the sender that he can up to how much data is received,TCP
the sender based on this information, have a choice to send data, which is flow control; - Congestion control: flow control and similar, but not to limit how much data is sent the recipient, but the router. Routers also receive buffer, if too much data is present in the reception buffer of the router, also affect the transmission network, which is why the network packet loss rate and congestion control is to control the transmission data according to the congestion status of the network;
These two mechanisms, flow control is relatively simple. We can see that in TCP
the message format, there is a section called the window size, which is part of the receiver tells the sender that he can present a maximum of how much data is received, and the sender will send data length is less than the size of the window . However, there is a special case, if the window size is 0
, for the current window is full, the sender will not send data under normal circumstances, but in reality, the sender still transmits a byte of data to the receiving side, as a kind of temptation. Because the recipient usually does not send the message to the sender, the window size is usually carried in ACK
the packet, if the window size is at this time 0
, the sender will not send data, the recipient will not be able to send to the sender ACK
report Wen, this time even if the cache is cleaned, the sender does not know. Therefore, even if the window size 0
, the sender still needs to send data, exploratory, if the cache has been cleared by the temptations message ACK
packet, the sender will be able to know.
Congestion control is TCP
a relatively complex mechanism, not a few words to say clearly, this part I specifically wrote a blog explained, interested can read about: https://www.cnblogs.com/tuyang1129 /p/12439862.html .
Third, the summary
For the TCP
description of reliable transmission described here. The above content is TCP
a basic introduction to the principles of reliable transmission, but the specific implementation may be improved and optimized on these foundations. TCP
A variety of complementary mechanisms, for if TCP
there is not much to understand, there may be some presentations do not understand, so if you want to really get to know TCP
the relevant knowledge and other computer networks, it is recommended to buy a book study systematically. I hope this blog help people see if blog content is wrong, correct me hope.
Fourth, the reference
"Computer Network - top-down approach (seventh edition of the original book)"