Performance Evaluation Report of BBR 1

This report is divided into two parts, one part is its network performance when only BBR is working. One is the network performance of BBR when it competes with Cubic.

This report is a preliminary report, which mainly discusses the evaluation and analysis of BBR in published articles [Experimental Evaluation of BBR Congestion Control, ICNP2017].

Since we will conduct a large number of comparative tests under the large buffer and the small buffer below, please explain:

Buffer refers to the queue size of network forwarding devices (routers, switches), large buffer refers to a queue with a size of 8BDP, and small buffer refers to a queue with a size of 0.8BDP (less than 1BDP).

1 Network with only BBR flow - same RTT

Although it is unlikely that the network will become an all-BBR environment, it is still interesting to explore the competition between flows within the BBR.

Because from these different experimental variables, we can see the key parameters that affect the performance of BBR. It can be seen that the size of the buffer and the RTT have a very strong impact on the performance of the BBR flow.

First observe the RTT performance when multiple BBRs compete. It can be seen that RTT_min can be correctly measured when all flows are BBRs.

However, after the real work, the RTT_use of TCP is usually 2 times the RTT_min. This is because in large buffer scenarios, BBR tends to fill up its inflight cap (= 2BDP).

This parameter is set by bbr_cwnd_gain, which limits the size of the window that is sent, thereby limiting inflight packets.


Let's try the fluctuation of BBR's goodput. Goodput includes the traffic of retransmissions. Of course, in practical applications we pay more attention to througput, not including retransmission traffic.

There are 6 streams shown here, which are sent from: 0 s, 23 s, 31 s, 38 s, 42 s, 47 s. In general, the fairness of BBR is very random, and after reaching stability, the difference between different streams is relatively large.


It is worth noting that in actual operation, the flows of multiple BBRs are not completely sent according to their physical bandwidth, especially when the buffer is relatively small. The total bandwidth it sends is greater than the real physical bandwidth.

Of course, we can also understand this strategy: if it is smaller than the actual physical bandwidth, when competing with other strategies, its available bandwidth will be preempted by other flows, resulting in a smaller and smaller throughput.

However, considering that BBR does not have a fast recovery stage, it will not reduce the window, which is more terrifying: everyone is not humble, resulting in link overload and higher packet loss rates. Take a look at the previous picture:


3 interfaces send 6 flows (two flows via one if.), looking at the total throughput, in the case of small buffer, the load basically reaches 1.1.


Cubic under the same conditions is simply the conscience of the industry.

2 RTT different BBR flows

It can be seen from the experiment that when the buffer is large, it is obviously unfair between different RTTs, and the higher the RTT, the higher the throughput.

When the buffer is small, the flows of different RTTs are fair competition.


1.2 Analysis of the results

So why does buffer have such a big impact on BBR? Why can't it work according to its real bandwidth and RTT when multiple BBRs work together?

The explanation given by the article: Since BBR sets its transmission rate according to the timing, its bandwidth is easily overestimated when the timing of multiple streams is not synchronized (for example, when A is in 1.25x detection phase, B is in 0.75x row. Empty phase, and the bandwidth always takes the window maximum).

In the case of bandwidth overestimation, the moment of BBR operation is not at the most correct and reasonable point. The operation here refers to the time point when the sending window limits the number of sent packets.

Ideal, single-stream BBR operating point:

But actually in the case where the bandwidth is overestimated:

Explaining the above figure, in the case of large Buffer Size, BBR's Inflight Cap-based operation precedes Cubic's packet-loss-based operation.

In the case of small Buffer Size, the operation of BBR is slightly slower than that of Cubic. So in the case of small buffer, BBR will have obvious overload.

同样,在large buffer的情况下,不同RTT导致其BDP不同,因此大RTT的Inflight Cap会多于小RTT的,从而占据更多带宽。

3 BBR与Cubic的对比

相对来说,在buffer比较小的时候,BBR更加有优势(从throughput的对比可以初步了解一下)。

而且BBR有一个很明显的优势,它的RTT不管在large buffer和small buffer的情况下,都比较小。而Cubic只有在small buffer的情况下,RTT才会变的比较接近BBR的水平。

先上一组TCP的RTT对比图:

当然,这部分是分别测的全BBR的环境和全Cubic的环境,并没有测试两者共同工作时的RTT。

按照常理推断,由于两者处在同样的网络环境下(队列相同,链路相同),所以其RTT应该是相同的。

虽然这个猜测很让人沮丧,但是也有可能出现亮点的地方,比如在传输大小为几十K的小页面时,RTT的表现。

在试验床测评BBR时,会补充这方面的实验。

那看一下两者在全为相同算法的环境下的丢包表现:

BBR可以说是非常“社会”了,不管是在large buffer还是small buffer情况下,它的重传率(丢包率)都非常高。

依靠其产生的高丢包率,基本可以碾压其他依靠丢包来调控窗口的算法了。

下面看一下Cubic与BBR两者竞争时的表现,结果不出意外:

BBR在large buffer的时候就占优,small buffer的时候尤其“狠”。

与Cubic对比的小结

文章总结了4种场景下BBR的表现,并与Cubic做了对比。从结果来看,BBR的表现特性很明显,带宽占用率高,无视丢包,竞争有优势。

但是也要注意到,实验中几点的不足之处:

a 和Cubic做实验时,每次都是先开BBR流,后开Cubic流。如果是先开Cubic,后开BBR呢?

b 没有做多个BBR流与单个Cubic流,多个Cubic流与单个BBR流的对比。

c 在延迟更小,带宽更高的情况下呢?延迟在10ms以内的情况下,谁的表现更好呢?

Inspired

Based on the above analysis, we can find that when a single BBR works, it is very effective, reasonable and correct. When multiple BBRs work together, the bandwidth will be too much, the RTT will also be longer, and there will be too many packet loss.

Why can BBR be strong in competition? That is, it will tend to occupy more bandwidth, while packet loss does not reduce the window.

What is our countermeasure: boost actually adopts a strong strategy when packet loss occurs: random packet loss does not reduce the window, and congestion loss reduces the window to the bandwidth size. From intuition, it is not worse than BBR.

So is it really good to not reduce the window when packets are lost? There is really no problem with a large number of retransmitted packets. What negative effects will it bring? Will Boost lose out in competition with BBR? These are all work that can be explored.

For the first two problems, further work is to deeply analyze the TCP retransmission mechanism of Linux. The function call relationship has been clarified in the previous wiki, so what is the cost of each call? Can it actually be implemented? What are the unexpected situations? These require further additions.

For the third question, the evaluation of boost and BBR in the test bed can give a preliminary answer.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325993858&siteId=291194637