How do computer network --TCP reliable data transmission

I. Introduction

  These days wrote four TCPseries of blog, this one is the fifth chapter, is also expected to be the last one this time, and I begin to finish the study of this network layer. For if my other TCPinterest aspects of the blog, go to my personal blog can be found in the computer network that category. This blog just to talk about TCPis the means by which to ensure reliable data transmission.


Second, the text

 2.1 network transmission problems exist

  Research TCPbefore how to ensure reliable data transmission, we first list what there is a network transmission problem, only to find the problem, can address the problem, find ways to cope. TCPIs to rely on the network layer IPprotocol to send data, while the IPprotocol is an unreliable protocol, it is just best effort transmission, but does not guarantee data integrity can reach, can not even guarantee that the data can reach. Meanwhile, the maximum network unit (MTU) allowed transmission is limited, typically 1500bytes. Therefore, TCPin order to transmit data larger than this, it is necessary to split the data into a data segment is transmitted. For these reasons, the network will transmit the following problems:

  1. Data is corrupted during transmission, the bits 0into 1, or 1into 0;
  2. Data loss during transmission does not reach a destination;
  3. A plurality of segments out of order packets arrive at the receiving side, the receiving side can not correctly composition data;

  TCPThe realization is basically about the three questions above, as well as how to improve the transfer rate achieved. Here we have to talk TCPin order to deal with these problems above what had been done thing.


 2.2 TCP solve data corruption

  We first discuss the first question, the data is damaged. Check if the data is corrupt data check mode, the following is a TCP packet format, which can be seen a 16field bits, called checksum , which is used if the recipient error check data part.

  The name of the field is called a checksum , because TCPof the way is to check the data checksum algorithm, the process of the algorithm is as follows:

  1. The checksum field is set 0, then the data portions to 16bits as a unit, split;
  2. After several units split binary addition, if the addition result to the first carry 17bits, the 17bits to the first bit (in fact, complement and calculation results, also known as rewind), the final added inverted ( 0turned into 1, 1becomes 0), and then into the checksum field;
  3. After the segment is sent to the destination, the destination host also above two steps, then the result of the operation with the header checksum field are added, and then negated if the result is 0, that the data is no error;

  This is how to play a role in checking it? Is very simple, it is assumed in the process of data transmission, data is not in error, the transmitting end and receiving end, the obtained result is the same. The checksum field is the result of anti-code, that is to say, the result is 1the position, it is in the anti-code 0, and the result for 0the position, that is, anti-code 1. That is, in the case where no error in the data, and a checksum field results are added, must be full 1, and then re-inverted, that is, 0the. So long as the end result is not obtained 0, the recipient is considered data errors (after error how to deal with, I do not find relevant information, ashamed).

  However, the above algorithm will be able to verify whether the data is a mistake? The answer is no. Because of this fundamental check, but is in fact the sum data, and then determine that there is no change in it. And we all know that 1+2 == 2+1, 1 + 4 == 2 + 3relying solely on the summation, there is no guarantee that there is no change in the data. As long as many data errors have occurred, and cancel each other out, this algorithm will not detect. But TCPstill using this algorithm, I personally think the reason is simple, and the probability of error in the data network is not high, but many of the probability of error and cancel each other even smaller, so the reliability of this algorithm is still relatively high of.


 2.3 TCP packet loss problem to solve

  TCPThe solution is data loss retransmission timeout . TCPWill maintain a timer, and setting a timeout, when sending a TCPrear segment, has not been ACK packet within the timeout, the sender that the data is lost, then the retransmission of the missing segment, until receive. Due to TCPthe use of a pipeline transmission , at the same time, there may be more have been sent but not received ACKsegment, so logically speaking, TCPto maintain multiple timers, a binding for each segment, but doing so will have a greater cost, but also very complex for timer management. So TCPin fact will only maintain a timer, the current record is the first to be sent, but has not yet received ACKsegment packets. When this time-out segment, the sender will retransmit segment, and restart the timer; if the received segment of the ACKpacket, also restart the timer, but this time is the earliest sent but not yet acknowledged packets segment has changed, this time recorded is the time of transmission of this new segment of. In addition, triggering fast retransmit , the timer will restart. About TCPpipeline transmission, I can refer to this blog post: https://www.cnblogs.com/tuyang1129/p/12450978.html .

  Here there is a complex problem, the timeout will be how to set? Not difficult to think that the timeout should be slightly larger than the round-trip time data (RTT), for example, data from the sender to the receiver to the ACKmessage, the share 200msthat the timeout should be slightly larger than this value, for example 400ms. But the network is unstable, for each packet, because different paths through the different levels of network congestion, more or less round-trip time will change. So for this timeout, it should be calculated based on the average RTT. But direct statistical averaging just too rough for TCPit, a set of complex algorithms to calculate the timeout period.

  To calculate the RTTapproximate average, we must first have the sample value, assuming sample RTT defined SampleRTT. TCPDuring operation procedures, may be measured at any one time SampleRTT, i.e. a measurement packet sent to the receiver from ACKthe time used, then it is used to calculate RTTthe weighted average. Then over a period of time once again, and the new measured SampleRTTfor updating the weighted average. Assuming that the weighted average RTTis defined as EstimatedRTTin the TCPspecification, the calculation EstimatedRTTformula is:

EstimatedRTT = (1- α)*EstimatedRTT + α * SampleRTT(A formula)

  Wherein SampleRTTis the latest measured sample RTTcan be determined dynamically by the above formula RTTweighted average. Due to the later measure SampleRTT, the closer the network in the current situation, so the update EstimatedRTTprocess, the latest SampleRTTsupposed to occupy more weight, so in the TCPspecification, it is recommended to αset the value 1/8, so the above formula is:

EstimatedRTT = 0.875 * EstimatedRTT + 0.125 * SampleRTT

  And SampleRTTthe EstimatedRTTfluctuation of FIG follows:

  In addition to seeking RTTthe weighted average of the network RTTchanges are also necessary, after all, can be seen from the figure, the sample RTTfluctuation is very severe, only EstimatedRTT, not enough to make us an accurate estimate timeout. So we need to find SampleRTTthe EstimatedRTTdegree of deviation, which is similar to the variance , the variance to dynamically set timeout. Assuming that the variance is defined as DevRTTis TCPdefined in the specification DevRTTis calculated as follows:

DevRTT = (1 - β)* DevRTT + β * | SampleRTT - EstimatedRTT |(Official d)

  As can be seen from the above equation, if the SampleRTTfluctuation is large, DevRTTa large value will be, and vice versa will be small. In the TCPspecification βrecommended value it is 0.25. We now know that the RTTweighted average, but also know RTTthe fluctuations, it is time to consider how to set the timeout time. Not difficult to think, the timeout should be required than RTTthe weighted average, which is EstimatedRTTbigger, so that most segments of RTTless than this value, so as to avoid frequent retransmission timeout. That number should be big? Consider this, when the network choppy, represents the actual RTTshould be from EstimatedRTTfarther away, while less volatile, the actual RTTshould be close EstimatedRTT, but this value fluctuations, we have calculated, that is a formula DevRTT, so assumed that the timeout time is defined as TimeoutInterval, TCPspecification recommends its value is calculated using:

TimeoutInterval = EstimatedRTT + 4 * DevRTT(Formula III)

  Thus, whether a weighted average or a fluctuation on a network are taken into account. The TCPstandard recommended initial TimeoutIntervalfor the 1s(then see from this part of the book, only to deeply understand mathematics powerful, Barbara theory into practical). Of course, the calculation for the time-out period, there are two exceptions:

  • When a segment timeout, the sender will retransmit this segment, while back on the timer, the timeout will be set to the last two times , instead of using the value calculated with formula 3; or if timeout, continued retransmission timeout to triple again until the message is successfully received, while successfully received, it recalculated using the formula three timeout. The goal is to prevent multiple time-out leads to a continuous retransmission, resulting in network congestion is more serious, after all, the timeout is the result of network congestion.
  • Sample records RTT, the selection will not be retransmitted segment as a sample, it is because, when the time-out event occurs, the sender does not know or because data is lost because of network latency and timeouts. If the segment is delayed because of a timeout, the retransmission packet, the delay time of the ACKpacket arrives, the sender mistake retransmitted packet is received correctly, then an error will be detected SampleRTT.

  In short, TCPthe timeout retransmission mechanism, a good solution to the problem of data loss occurs in the network. Moreover, in order to improve efficiency, TCPthere is a fast retransmission mechanism, depending on the particular circumstances, before the timeout to determine segment is lost, and then retransmitted, but not described in detail here.


 2.4 TCP how to solve the data arrive out of order

  The third problem is that the data arrive out of order problem. Due to limitations of the network, TCPthe larger the data must be split into a smaller packet segments, encapsulated into TCPsegments, one by one transmission. Due to the uncertainty of the transmission network (such as through different path, a packet is then retransmitted segment is lost, etc.), these segments may not be entirely in the order of arrival. Therefore, in order to be able to receive the complete data in the receiver, and can sequentially combine these messages, TCPthere must be a mechanism to solve this problem.

  TCPThe method used is that each TCPsegment is assigned a serial number, serial number of each segment is sequentially increased, so that the receiving side according to the serial number, the received segment is determined to which part of the data, and whether it has received all of the parts. It goes from the above TCPwe can see the packet structure figure, which has a 32位sequence number field. However, the TCPnumber of segments of not 0,1,2,3.... so simple, here we are concerned that TCPis how to achieve such a number of mechanisms.

  First we have to clear a point, TCP is a byte are numbered, rather than segments are numbered . TCPData to be transmitted to each byte of a given number, such as the first byte 0number, the second one is a 1number, and so on. And each segment is generally packaged more than one byte of data, so TCPthe segment, the data is encapsulated in this segment, the first byte of the sequence number. For example, say a sender to send 250bytes of data, we assume that the initial sequence number from 0the start, then this 250byte number respectively 0-249. Suppose each segment allows up package 100bytes of data, so the first segment of the package 1to the 100byte sequence number of these bytes 0-99, the first segment will be 0put into it No. header portion; and a second segment encapsulated 100-199number of bytes, so its serial number 100; and the third segment encapsulated 200-249number of bytes, so its serial number 200. The above is TCPthe processing method of a sender number.

  Let's talk about TCPhow this works in the recipient number mechanism. Similarly to the above three segment example, assuming the sender will send the above three segments, the first recipient receives a segment, found that the packet number is 0, contains 100bytes of data, then the receiver will confirm to the sender that the message has been received, and the acknowledgment is to use TCPthe header of the acknowledgment number field. Recipient receives the sequence number 0, the length of 100the byte segments, will ACKfill in the acknowledgment packet sequence number 100, said it has received is less than the number 100of all bytes, the next desired received packets Group numbers is 100; and the second segments arrive out of order, sequence number 100, a length 100byte, then sent back again to the recipient ACKmessage, this time for the acknowledgment number 200, said he received 200all previous byte, it is desirable the next message sequence number is 200; and the received sequence number 200, length 50byte segment, the acknowledgment sequence number is sent back 250ACK packets.

  These are sequentially receives the segment, it is assumed in the above case, the arrival order of the packets is three 0 -> 200--> 100, i.e., reorder, then the following occurs:

  1. Recipient receives the number 0, the length of the 100segment bytes, loopback acknowledgment number of 100the ACKmessage;
  2. Receiving a sequence number 200, a length 50byte segments, then the recipient wishes to receive the 100number of packets arrive out of order so determined situation occurs, it does not deliver data to the upper section, but it into receive buffer;
  3. Received sequence number 100, the length of 100the segment, 100it is looking forward to the recipient segment numbers received, so it is received and delivered to the upper layer, while the serial number is found in the receive buffer 200segment, which is positive the receiver is expecting the next message is received, then taken out, the upper delivery, while the sender to send ACKthe message, ACKthe acknowledgment sequence number 250indicating that they have received 250all the bytes of the previous, the next message expected arrival the number is the text 250;

  Through the above mechanism, the recipient successfully solved the problem of data arriving out of order. Of course, the use of this mechanism numbers, in fact, more than that simple, which also involves the TCP pipeline transport mechanism, if you want to know, can refer to my other blog post - https://www.cnblogs.com /tuyang1129/p/12450978.html .

  There is also a problem, in the above example, I assume that number is from 0the beginning, but not the case. In an actual implementation, the number is generally calculated by a special algorithm random value, there are two reasons for this:

  1. Assume that each TCPnumber are connected from the 0beginning, it assumes that the client Xianxiang server sends a message, there is no confirmation is received, immediately disconnect; but after disconnecting, and they immediately established a connection, and At this time, for the first time sent the segment just reach the server, what would happen. The server will think that this is the data sent over the connection to be established, and because the initial two numbers are connected 0, the receiver will put this segment reception. In order to reduce the probability of occurrence of similar situations, TCPrandom initial sequence number, so that two different initial sequence number connected to a large probability, then when this situation occurs, the receiver can not receive the segment;
  2. The second reason is for security reasons, if the initial numbers are fixed, that each segment of the serial number can speculate drawn, so there hackers can use it to simulate the sender sends a TCP packet, make attacks, such as sending a large number of connection requests, taking up server resources;


 2.5 TCP flow control and congestion control

  Flow control and congestion control, strictly speaking, is not a TCPreliable transport mechanism, but, sort of relationship, so I mention.

  • Flow control: TCPreceiving a holding Fangkuai Wei receiving buffer for receiving data sent by the sender. However, the receiving buffer is not infinite, if the receive buffer is filled, the longer receive data at this time, will not be able to receive, it can only be discarded. So, in order to reduce the occurrence of this situation, TCPthe recipient need to inform the sender that he can up to how much data is received, TCPthe sender based on this information, have a choice to send data, which is flow control;
  • Congestion control: flow control and similar, but not to limit how much data is sent the recipient, but the router. Routers also receive buffer, if too much data is present in the reception buffer of the router, also affect the transmission network, which is why the network packet loss rate and congestion control is to control the transmission data according to the congestion status of the network;

  These two mechanisms, flow control is relatively simple. We can see that in TCPthe message format, there is a section called the window size, which is part of the receiver tells the sender that he can present a maximum of how much data is received, and the sender will send data length is less than the size of the window . However, there is a special case, if the window size is 0, for the current window is full, the sender will not send data under normal circumstances, but in reality, the sender still transmits a byte of data to the receiving side, as a kind of temptation. Because the recipient usually does not send the message to the sender, the window size is usually carried in ACKthe packet, if the window size is at this time 0, the sender will not send data, the recipient will not be able to send to the sender ACKreport Wen, this time even if the cache is cleaned, the sender does not know. Therefore, even if the window size 0, the sender still needs to send data, exploratory, if the cache has been cleared by the temptations message ACKpacket, the sender will be able to know.

  Congestion control is TCPa relatively complex mechanism, not a few words to say clearly, this part I specifically wrote a blog explained, interested can read about: https://www.cnblogs.com/tuyang1129 /p/12439862.html .


Third, the summary

  For the TCPdescription of reliable transmission described here. The above content is TCPa basic introduction to the principles of reliable transmission, but the specific implementation may be improved and optimized on these foundations. TCPA variety of complementary mechanisms, for if TCPthere is not much to understand, there may be some presentations do not understand, so if you want to really get to know TCPthe relevant knowledge and other computer networks, it is recommended to buy a book study systematically. I hope this blog help people see if blog content is wrong, correct me hope.


Fourth, the reference

  "Computer Network - top-down approach (seventh edition of the original book)"

Guess you like

Origin www.cnblogs.com/tuyang1129/p/12458592.html