BBR simple explanation congestion algorithm

ACM papers TCP BBR, the opening on the introduction of Figure 1 , in order to explain an entry point BBR algorithm:

  • Why the current packet loss detection based on TCP congestion control algorithms also optimize space?
  • BBR optimization algorithm limits Where?

figure 1

In order to understand this chart I spent a whole night time, it made me re-examine all the basic concepts, and I discuss the following definition for TCP RTT, the bandwidth, Inflight data have been simplified, making it easier to meet to discuss physical intuition, but it does not affect the final conclusion.

For ease of discussion, first introduced in the form of symbols. When we use the TCP end to end of the transmission data B from A, the network link between A and B is a complex and dynamic, but we can imagine A to network B to some black box link , this the link having the following physical properties:

  • RTprop: an optical signal from the A end to B end of the minimum delay (delay in fact 2 times, because it is a round-trip, but not discussed), depending on the physical distance
  • BtlBw: In the A to B link, the bandwidth of which depends on the period of the slowest link bandwidth, called bottleneck bandwidth, it is conceivable for the thickness of the optical fiber
  • BtlBufSize: In the A to B link, each router has its own cache, and the capacity of the cache, the cache size of a bottleneck
  • BDP: the entire physical link (excluding router cache) that can store data bits and the sum, BDP = BtlBw * RTprop

Only physical attributes were not enough, in practice we are most concerned about the true nature of TCP are the two links:

  • T (delay): The data from A to B is the actual delay, corresponding to FIG. Round-trip time (in fact, one-way trip time, 2 times the RTT that is, does not affect the discussion)
  • R (bandwidth): The actual data transmission bandwidth, corresponding to FIG. Delivery rate (not strictly correspond, do the same with T simplified equivalent)

I'm long-winded one: TCP BBR protocol defined bandwidth (delivery rate) and our intuition is not the same, which is defined as:

Bandwidth = the amount of data / time sent out from the received ACK to the long

Our instinct is: the speed of data through the network cable

To imagine favor of intuition, we use T instead of RTT, using R instead of the delivery rate, all the concepts below is the same as a simplified, please note!

Further, in order to quantitatively analyze T and R, then introducing a concept:

  • D: from A has been issued but the data was not received by B (corresponding to the inflight data, I deliberately modified its original definition --A has been issued but not received data B ACK returned - for intuition imagination, It does not affect the conclusion)

With the above definition, it is natural there are the following three equations:

  • T> = RTprop, i.e., the actual delay is always greater than the minimum delay is equal to
  • R <= BtlBw, i.e., the actual bandwidth bottleneck bandwidth is always less than or equal
  • R = D/T

3 can be obtained from the above formula:

  1. T/S >= 1/BtlBw
  2. R/S <= 1/RTprop

FIG 2 is the origin of the two-type two slope (Slope) of.

With the above discussion, we can more easily understand and map the upper half of the physical meaning of the lower half of the figure:

The upper half of FIG.

  • Before bit data is not being transmitted on the physical link capacity of the whole link (BDP) exceeds the limit transmission delay is RTprop, corresponding to the upper half of the horizontal line in FIG blue
  • When the data of the physical capacity filled the whole link, the router begins to enable cache to store bits of data, which is equivalent to lengthen the entire link, resulting in transmission delay starts to increase, departing from the physical limits RTprop, so with slope = 1 / BtlBw piece of green shaded
  • When (BDP + BtlBufSize) router cache fills, the entire link is lost start data, 1 / BtlBw slash disappear, the upper half corresponding to the red dotted line in FIG.

Lower half of FIG.

  • Before bit data is not being transmitted on the physical link capacity of the whole link (BDP) than in the B-side data bandwidth is observed to rise gradually, and the rising rate of 5 bandwidth is determined, i.e., slope = 1 / RTprop, corresponding to the lower half of FIG blue shaded
  • When the data of the physical capacity filled the whole link, the router begins to enable cache to store bits of data, but does not affect the B-side observation bandwidth, this bandwidth limit is BtlBw, corresponding to FIG cross-piece green BtlBw line
  • When (BDP + BtlBufSize) router cache fills, the entire link is lost data starts, but the B-side observation bandwidth limit or BtlBw, the dotted line corresponds to the red half of FIG.

Thus you can answer these two questions: the

  • Why the current packet loss detection based on TCP congestion control algorithms also optimize space?

Because the packet loss detected based on the amount of data that the algorithm always reaches BDP + BtlBufSize inflight this state, since the router in the modern large cache, the physical link corresponding to the human elongated, so that data transmission delay becomes large, that RTT is large.

  • BBR optimization algorithm limits Where?

BBR is no longer based algorithm to detect packet loss, but efforts to estimate the BDP and RTprop, so that the RTT to its physical limits RTprop close, thereby reducing the transmission delay, the purpose of acceleration of TCP.

So where common BBR and packet loss detection algorithm in? - They are trying to make the bandwidth of the link tends to its physical limits: BtlBw.

BBR-based algorithm, due to the bottleneck router queue is empty, the most direct impact is substantial decline in RTT, you can see in the figure below the red line CUBIC RTT much higher than the BBR:

And because there is no packet loss, the BBR transmission rate will have increased dramatically, the graph of FIG inserted CDF cumulative probability distribution function from the CDF can clearly see CUBIC most are less certain connection:

If the link switching occurs, the new bottleneck bandwidth liter big or how small you do? BBR periodically attempts to detect a new bottleneck bandwidth, 1.25,0.75,1,1,1,1 this cycle is as follows:

1.25 BBR will make the attempt to send more packets in flight, and if produced a backlog queue, the queue 0.75 will be released. In the figure is the first transmission of the TCP link is 10Mbps, the second 20 switch to a faster network links 40Mbps, due to the presence BBR 1.25 soon discovered greater bandwidth, the first 40 seconds and then switched back to 10Mbps links, 2 seconds since RTT rapidly increases BBR lowered transmission rate, it can be seen that thanks pacing_gain BBR conversion cycle works well.

pacing_gain cycle further advantage is that different initial speeds can make TCP link bandwidth is shared equally rapid plurality of, as shown, are connected after starting generation due overestimation BDP queue backlog, the BBR earlier in connection will rapidly reduce the transmission rate over several cycles, eventually backlog queue is not generated at the RTT it is the same, so that the link 5 at equilibrium average bandwidth:

我们再来看看慢启动阶段,下图网络是10Mbps、40ms,因此未确认的飞行字节数应为10Mbps*0.04s=0.05MB。红色线条是CUBIC算法下已发送字节数,而蓝色是ACK已确认字节数,绿色则是BBR算法下的已发送字节数。显然,最初CUBIC与BBR算法相同,在0.25秒时飞行字节数显然远超过了0.05MB字节数,大约在 0.1MB字节数也就是2倍BDP:

大约在0.3秒时,CUBIC开始线性增加拥塞窗口,而到了0.5秒后BBR开始降低发送速率,即排空瓶颈路由器的拥塞队列,到0.75秒时飞行字节数调整到了BDP大小,这是最合适的发送速率。

当繁忙的网络出现大幅丢包时,BBR的表现也远好于CUBIC算法。下图中,丢包率从0.001%到50%时,可以看到绿色的BBR远好于红色的CUBIC。大约当丢包率到0.1%时,CUBIC由于不停的触发拥塞算法,所以吞吐量极速降到10Mbps只有原先的1/10,而BBR直到5%丢包率才出现明显的吞吐量下降。

CUBIC造成瓶颈路由器的缓冲队列越来越满,RTT时延就会越来越大,而操作系统对三次握手的建立是有最大时间限制的,这导致建CUBIC下的网络极端拥塞时,新连接很难建立成功,如下图中RTT中位数达到 100秒时 Windows便很难建立成功新连接,而200秒时Linux/Android也无法建立成功。

BBR算法的伪代码如下,这里包括两个流程,收到ACK确认以及发送报文:

function onAck(packet) 
  rtt = now - packet.sendtime 
  update_min_filter(RTpropFilter, rtt) 
  delivered += packet.size 
  delivered_time = now 
  deliveryRate = (delivered - packet.delivered) / (delivered_time - packet.delivered_time) 
  if (deliveryRate > BtlBwFilter.currentMax || ! packet.app_limited) 
     update_max_filter(BtlBwFilter, deliveryRate) 
  if (app_limited_until > 0) 
     app_limited_until = app_limited_until - packet.size

这里的app_limited_until是在允许发送时观察是否有发送任务决定的。发送报文时伪码为:

function send(packet) 
  bdp = BtlBwFilter.currentMax × RTpropFilter.currentMin 
  if (inflight >= cwnd_gain × bdp) 
     // wait for ack or retransmission timeout 
     return 
  if (now >= nextSendTime) 
     packet = nextPacketToSend() 
     if (! packet) 
        app_limited_until = inflight 
        return 
     packet.app_limited = (app_limited_until > 0) 
     packet.sendtime = now 
     packet.delivered = delivered 
     packet.delivered_time = delivered_time 
     ship(packet) 
     nextSendTime = now + packet.size / (pacing_gain × BtlBwFilter.currentMax) 
  timerCallbackAt(send, nextSendTime)

pacing_gain便是决定链路速率调整的关键周期数组。

BBR算法对网络世界的拥塞控制有重大意义,尤其未来可以想见路由器的队列一定会越来越大。HTTP3放弃了TCP协议,这意味着它需要在应用层(各框架中间件)中基于BBR算法实现拥塞控制,所以,BBR算法其实离我们很近。理解BBR,我们便能更好的应对网络拥塞导致的性能问题,也会对未来的拥塞控制算法发展脉络更清晰。

 

发布了24 篇原创文章 · 获赞 6 · 访问量 457

Guess you like

Origin blog.csdn.net/qq_39965800/article/details/103565720