Detailed explanation of the relationship between TCP RTT and TCP RTO

1. Definition of RTT of TCP and RTO of TCP

1.1, what is RTT of TCP

RTT is the abbreviation of Round Trip Time. The RTT of TCP means the difference between the time when the TCP message is sent and the time when the ACK message corresponding to the message is received. We know that the routing path between two hosts can be changed dynamically, so the network delay between the two hosts will not only change dynamically due to the different network status, but also change with the change of the routing path. Therefore, the RTT between the two hosts is not static, but a dynamically changing value.
TCP maintains an independent current RTT value for each TCP connection.

BTW: The round-trip delay RTT is only important for the TCP protocol at the transport layer, because TCP needs to set the timeout time of the timeout timer and the appropriate receiving window rwnd according to the value of the average round-trip delay RTT. UDP has no confirmation and retransmission mechanism, and RTT has no meaning for UDP.

1.2, what is RTO of TCP

We all know that TCP uses a timeout retransmission mechanism to ensure that network packet loss occurs, so that when the confirmation ACK is not received, the message can be retransmitted to provide an end-to-end reliable transmission mechanism. So how long does it take to perform a retransmission? This introduces the RTO. RTO is the abbreviation of Retransmission Time Out. It defines the TCP retransmission timeout time, that is, defines the timeout period of the TCP timeout retransmission timer. It means that the message will be retransmitted after this time is exceeded from the time when the data is sent.

  • If the RTO is too large
    , the retransmission will be slow, and it will take a long time to retransmit if it is lost. As a result, the TCP connection does not fully utilize the network, and the network utilization rate is low, which in turn leads to long delays in TCP end-to-end transmission and poor performance.
  • The RTO is too small and
    the retransmission is too fast, which will cause retransmission of unlost packets. As a result, there are a large number of meaningless retransmission messages on the network. First, it will occupy a large amount of bandwidth, and second, it will increase network congestion, which will cause more timeouts, and then more timeouts will lead to more retransmissions.

Therefore, it is very important to define an appropriate RTO. In an ideal network, RTO slightly greater than RTT is the most ideal definition that can make full use of network capabilities, but in actual networks, it is much more complicated than this.

2. Calculation of the current RTT and RTO of TCP

In a stable network, the RTT remains almost constant, so it is very simple, just measure the RTT once and use this value all the time. However, in an actual network, RTT usually changes dynamically, sometimes getting bigger and sometimes getting smaller, so how to calculate an appropriate RTT that best reflects the current network becomes the key to RTT calculation.

2.1, before starting to talk about the RTT calculation algorithm, let's understand the process of sampling RTT in TCP

In a TCP network, RTT calculation sampling is not performed on every TCP packet. Only some TCP packets are sampled for RTT calculation. The definition of this sampling method is a bit abstract: at any time, the RTT value is measured and only measured once for each TCP connection, then when sending a packet, if the given connection is measuring the RTT value (the timer for measuring RTT has been If used), the RTT is not measured for the packet .
As shown below:

  • RTT #1:
    The time difference between sending message 1 and receiving the corresponding ACK (message 2), which is well understood.
  • RTT #2:
    The time difference between sending message 3 and receiving the corresponding ACK (message 5), which is well understood. But here we can notice that packet 4 is not used to calculate the RTT, which corresponds to: all TCP packets initiated by the left host ( Packet 4) will not perform RTT sampling calculation
  • RTT #3:
    The time difference between sending message 6 and receiving the corresponding ACK (message 10), which is easy to understand. But here we can notice that packet 4 is not used to calculate the RTT, which corresponds to: all TCP packets initiated by the left host ( Both packet 7 and packet 9) will not perform RTT sampling calculation

insert image description here

2.2, the initial calculation method of TCP RTT and RTO and existing problems

  • Calculation method
    RTT = time of receiving ACK - time of sending message
    Then we will use the formula to calculate the current RTT when we get each RTT sampling result, which is called SRTT (Smoothed RTT), according to the speed of network RTT change , where alpha can take different values ​​to suit different networks. If the alpha value is small, the weight of the current RTT is large, and it can quickly adapt to small changes in RTT, but it is greatly affected by the temporary fluctuation of RTT. If the alpha value is small, the weight of the historical SRTT is large, and the weight of the current RTT is small. The change curve of SRTT is more stable and smooth, and it is not easily affected by the temporary fluctuation of RTT, but the response is not fast enough, and it cannot quickly adapt to the real change of RTT in a network with rapid RTT change. Therefore, the value of alpha is an empirical value, and it is better to use different values ​​in different networks. Usually alpha can take a number between 0.8 and 0.9.
  SRTT = ( alpha * SRTT ) + ((1 - alpha) * RTT)

(In the first iteration, the initial value of SRTT is the latest RTT.)
In this way, using SRTT and RTT as weighted calculations, the SRTT can be smoothly processed, and the current RTT (ie SRTT) will not always jump, but It can also make timely adjustments according to the current RTT.

According to the implementation of Linux, TCP_RTO_MAX is 120 seconds, TCP_RTO_MIN is 200 milliseconds, the initial value of RTO is 1 second, and the value of beta is generally between 1.3 and 2.0

 RTO = min [ TCP_RTO_MAX,  max [ TCP_RTO_MIN,   (beta * SRTT) ]  ]

After the SRTT starts to calculate, the RTO can be calculated with the above formula.

  • Existing problems
    When retransmission occurs, the following problem will occur.
    If you use the newly transmitted package as the starting point, the sampled RTT may be much larger than the actual RTT; if you use the retransmitted package as the starting point Starting point, it may cause the sampled RTT to be or much smaller than the actual RTT.
    insert image description here

2.3, the improvement of the calculation method of TCP RTT and RTO

Because of various problems in the initial algorithm, the following improvements are proposed:

  • Improvement 1: Do not perform RTT sampling on retransmitted packets :
    Do not perform RTT sampling on retransmitted packets, but only perform RTT sampling on newly transmitted packets. This avoids the above-mentioned problem of whether to use the retransmitted message or the newly transmitted message as the starting point of RTT sampling when retransmission occurs. But this introduces a problem, that is, when the RTT suddenly becomes larger at a certain moment, that is, the network generates a relatively large delay, which will cause all packets (because the previous RTO is very small, so in TCP When the retransmission timer expires, all packets have not received the corresponding ACK) and retransmission will occur, but because the retransmitted packet is not counted as RTT, the RTO will not be updated, which will cause TCP retransmission timing If the device is still a very high value, then further retransmissions will occur, and the network will suddenly enter a congested state, resulting in a sharp drop in network throughput performance. So the following improvement two is introduced.
  • Improvement 2: Once retransmission occurs, the existing RTO value is doubled :
    Once retransmission occurs, the existing RTO value is doubled to avoid network congestion caused by a large number of retransmissions.

Problems with this algorithm:
1) The alpha value of the parameter is not easy to choose, as mentioned above: if the alpha value is small, the current RTT will have a large weight, which can quickly adapt to small changes in RTT, but is affected by temporary fluctuations in RTT If the alpha value is large and the alpha value is small, the weight of the historical SRTT is large, and the weight of the current RTT is small. The change curve of the SRTT is more stable and smooth, and it is not easily affected by the temporary fluctuation of the RTT, but the response is not fast enough, and the change of the RTT is relatively fast. The network cannot quickly adapt to real changes in RTT.
2) When the network is poor, the RTO continues to double, and the retransmission rate slows down sharply, resulting in a very rapid deterioration of the efficiency and performance of TCP transmission, and a very low network utilization rate.

2.4, Further improvement of calculation method of TCP RTT and RTO (current implementation of Linux)

In order to solve the above problems, the concept of deviation RTT (RFC6289) is introduced. This algorithm introduces the difference between the latest RTT sampling value and the smoothed SRTT value as a factor to calculate. If the difference is large, then DevRTT dominates the calculation results, and if the gap is small, SRTT dominates the calculation results.

SRTT = SRTT + alpha*(RTT – SRTT)

DevRTT = (1-beta)*DevRTT + beta*(|RTT-SRTT|)

RTO= micro * SRTT + delta *DevRTT

In the Linux implementation, alpha = 0.125, beta = 0.25, micro = 1, delta = 4

In this way, the problem of bad value of alpha is solved, and the goal of smooth and fast perception of RTT changes is achieved.

Guess you like

Origin blog.csdn.net/meihualing/article/details/129473851