Delay optimization in audio and video network communication

foreword

In the process of dealing with video and video network communication, we often encounter the following problems. Today, I will talk about my personal optimization experience around the following problems.

Why is the delay of TCP transmission larger than that of UDP?
The delay of live broadcast is often more than 1s. What is the root cause?
The delay of RTC real-time communication is required to be within 400ms. How is it achieved?

For example,
let's take online shopping and the time-consuming optimization of a courier from delivery to delivery as an example:

Warehousing and staying at the express station for too long? —— For example, each courier is stored in the warehouse for half a day and then distributed for 1 day. Let’s see, what possible factors will affect the time it takes for the courier to be delivered from merchant A to consumer B? As shown in FIG:

The number of express deliveries is uneven, sometimes it is very large (heaped up like a mountain), and sometimes it is very small (scattered), and the resource utilization rate of each node cannot be fully utilized
If the courier is lost in the middle, it needs to be reissued (the overall time will basically be doubled)
Transport routes are too congested, causing too much time on the road
Do you have any ideas to solve the above delay factors one by one to improve the delivery speed of express delivery?

Improve the throughput of each express station, reduce the cache of each transfer station, and speed up the flow of express delivery?

The collection, transmission and delivery of express parcels should be split from the granularity of "days" to a finer granularity, such as hourly granularity, in order to pursue a more dynamic and smoother scheduling of resources?
2 copies of each shipment at the same time? Can I send another if I lose one? —— Of course, the negative impact is an increase in capacity resources
Dynamically select the appropriate transit transportation route, where is it not blocked? - Dynamic planning of transport routes to avoid congestion, this is an art.

It can be seen from the above exploration that to optimize the delay, we mainly focus on reducing the cache of intermediate links, maximizing the utilization of resources, and dynamically planning the transmission path through redundancy against loss. If you can understand these abstract points, then congratulations, it is not difficult for you to understand how to reduce the network transmission delay of audio and video.

Analyze the reasons
Why is the delay of TCP transmission longer than that of UDP?
Because TCP guarantees "reliable transmission", when data is lost, it will be retransmitted, which will bring delay (similar to the loss of express delivery), and greatly affect the continuous and timely transmission of subsequent data:

Because TCP has a "cache queue" (similar to a courier station), it will not send a piece of data immediately, but cache it first, and control the sending frequency through a "sliding window" (similar to the delivery manpower of a courier station) , after the previous wave of data is sent and the ACK is confirmed, the next wave of data is sent (similar to the delivery of a box of express delivery, and then the next box is put into it); while UDP does not have this mechanism, it is sent with eyes closed (need to preempt the entire There are more couriers in the region)
Because TCP is a "gentleman", once it finds that data is lost or the network is not good, it will take the initiative to "give up" and reduce the size of its "sliding window" (give up courier manpower to other companies in the region) , causing your own data to be sent more slowly.
The delay of live broadcast is often more than 1s. What is the root cause?
The underlying transmission protocol (RTMP) of live broadcast uses TCP by default, and introduces the large delay of TCP.
Due to the characteristics of TCP that does not lose data, once the player is stuck for a few seconds due to network reasons, the player does not chase frames. In this case, there will be a delay of how many seconds after the subsequent playback is resumed (because these data that are not sent in time due to network jams will not disappear out of thin air)
In order to achieve playback in seconds, usually the live streaming server will cache the latest GOP , when the player does not chase frames, it will bring a fixed delay (depending on the GOP size configured on the streaming end)

4. If the HLS delay used by the player is greater, it depends on the minimum interval of HLS slices on the server (the official recommendation is 10s)

Private message me to receive the latest and most complete C++ audio and video learning and improvement materials, including ( C/C++ , Linux , FFmpeg , webRTC , rtmp , hls , rtsp , ffplay , srs )

The delay of RTC real-time communication is required to be within 400ms. How is it achieved?

In this regard, you need to know the following keywords: UDP, FEC, congestion control, data volume adjustment, jitter buffer, service-side scheduling and cascading

The bottom layer uses UDP instead of TCP - compared to the "gentleman's concession" feature of the TCP sliding window, UDP can send data to the maximum and improve throughput (improve bandwidth utilization)

The bottom layer uses the UDP protocol. How to solve the problem of packet loss and retransmission?

For audio and video transmission, in order to meet the low latency, the first thing is to "allow" some unimportant data to be lost. For the lost data, some compatibility processing will be done at the receiving and playback ends, such as: PLC (Intelligent Compensation for Audio Packet Loss) ), video skips B/P frames at the end of the GOP, etc.

On top of UDP, using FEC redundancy (similar to sending one more courier, but more advanced), the lost data can be "recovered" at the receiving end without retransmission.

Of course, on top of UDP, retransmission is still used, because FEC redundancy will increase the resource consumption of bandwidth, so retransmission is still a good choice when the RTT network delay is small. .

The bottom layer uses UDP to send data without morality, won't the network burst?

Of course it will. Therefore, we can add some traffic control mechanisms on top of UDP to make the overall transmission smoother. For details, please refer to my article: "Understanding Traffic Shaping in Network Communication"

However, if the amount of data to be sent (similar to the amount of express delivery) itself explodes (higher than the current network can carry), no matter how smooth the transmission end is, it will not solve the problem of congestion. Therefore, RTC One of the most core methods to reduce latency is: bandwidth detection and supporting data volume adjustment methods

What bandwidth detection and supporting data volume adjustment methods are available?

Bandwidth detection is to dynamically estimate the current network bandwidth. How to accurately predict the bandwidth in real time is an art worthy of deep digging and long-term research. This article will not discuss it. You can search for BBR and GCC. other keywords

Data volume adjustment: When it is detected that the network bandwidth is insufficient, the data volume can be adjusted to prevent the delay caused by congestion. Usually, the adjustable content includes: audio and video frame rate/bit rate, FEC redundancy ratio, large and small streams, etc.

Since UDP also has retransmission, there will naturally be network jitter, and passive buffer data is generated passively at the receiving and playback ends. How does RTC eliminate these passive buffers?

The buffer at the receiving end is often used to resist network jitter. The fixed buffer brings a fixed delay (for example, the data must be buffered to 500ms and then output and played), while the dynamic jitter buffer will be based on the network quality. Dynamically adjust the buffer size

TSM time-domain companding (variable speed and constant pitch): It is an audio playback frame tracking strategy that does not affect the user's listening experience. Similar to 0.9/1.1 multiple playback, it is used to speed up or slow down the consumption of buffer data to achieve tracking The effect of frame or slowdown

6. With a series of client-level delay optimizations, in the end, the scheduling and cascading of the overall server-side stream forwarding network are left. This is also a direction worthy of deep digging and long-term research. It is not a long discussion here. Roughly express it with a picture:

Delay optimization in audio and video network communication

foreword

Guess you like