WebRTC | The essence of audio and video real-time communication

Table of contents

1. Two indicators of audio and video real-time communication

1. Real-time communication delay indicators

2. Basic concepts related to video

3. Audio and video service quality indicators

2. Solve the main contradiction of real-time communication

1. Increase bandwidth

A. Provide better access services

B. Guarantee the bandwidth and quality of the cloud network

C. A more reasonable routing scheduling strategy

2. Reduce data volume

A. Use a better compression algorithm

B. SVC technology

C. Simulcast Technology

D. Dynamic code rate

E. Drop frames or reduce business

3. Appropriately increase the delay

4. Improve network quality

5. Quickly and accurately assess bandwidth


        The essence of audio and video real-time communication is to approach or achieve the effect of face-to-face communication as much as possible, and this is also the goal of audio and video real-time communication.

1. Two indicators of audio and video real-time communication

        One is the delay index in real-time communication; the other is the audio and video service quality index.

1. Real-time communication delay indicators

         Between end-to-end, there are many factors that cause delays, such as audio and video acquisition time, codec time, network transmission time, audio and video rendering time, and the time used by various buffers. Among the many delay factors, the delay caused by network transmission is dynamic (fast and slow, erratic), so it is the most difficult to evaluate, control and solve, while the delay time caused by other factors is basically constant .

2. Basic concepts related to video

(1) Resolution: Refers to how many pixels an image occupies on the screen. The higher the density of pixels in an image, the higher the resolution of the image. For real-time communication, the default image resolution is generally set to 640×480 or 640×360. If the resolution is lower than this value, the information contained in the image is too little, basically only one head can be seen, and the effect will be very poor. Difference. In addition, the resolution also indicates the maximum upper limit of image clarity.

(2) Frame rate: refers to the number of video frames (images) played per second. The more frames played, the smoother the video. Generally, the frame rate of cartoons/movies is above 24 frames/s, and the frame rate of high-definition videos is above 60 frames/s. For real-time communication video, 15 frames per second is a watershed. When the frame rate is less than 15 frames per second, most people will feel that the video quality is not good and the video freezes seriously.

(3) Bit rate: refers to the size of data stream per second after video compression. In principle, the larger the resolution, the larger the bit rate. If the resolution is large but the bit rate is low, it means that a large amount of image information is discarded during video encoding, which will result in the inability to completely restore the image during decoding, resulting in distortion.

        In the case of the same resolution, the higher the bit rate, the better the restoration and the clearer the image. Of course, the bit rate here is limited. After exceeding a certain threshold (MOS=5), no matter how large the bit rate is, it is meaningless.

(4) MOS value: used to evaluate the quality of business services, the higher the MOS value, the better the service quality. It is divided into 5 levels, from high to low: 5-excellent; 4-good; 3-ok; 2-poor; 1-very bad.

3. Audio and video service quality indicators

        Business service quality indicators, including audio service quality and video service quality.

        In order to make online real-time communication approach or achieve the effect of face-to-face communication, it is necessary to reduce the delay of transmission as much as possible, and at the same time increase the bit rate of audio and video transmission. However, reducing the delay is contradictory to increasing the bit rate, unless all users have enough bandwidth and good enough network quality, which is obviously unrealistic.

2. Solve the main contradiction of real-time communication

        The main contradiction in real-time communication is the contradiction between audio and video service quality and bandwidth size, network quality, and real-time performance. There are several ways to resolve this discrepancy:

  • One is to increase the bandwidth;
  • The second is to reduce the amount of data;
  • The third is to increase the delay appropriately;
  • Fourth, improve network quality;
  • The fifth is to quickly and accurately evaluate the bandwidth.

1. Increase bandwidth

        In addition to the passive method of waiting for 5G to improve network capabilities, there are also some solutions to increase bandwidth in disguise, which are divided into client-side solutions and server-side solutions.

        Among the client solutions, the most typical one is the routing solution supported by WebRTC - it can select the best quality network connection line according to the priority.

        In the server-side solution, there are three ways to indirectly increase the bandwidth, namely: providing better access services, ensuring the bandwidth and quality of the cloud network, and more reasonable routing scheduling strategies.

A. Provide better access services

        Under normal circumstances, users of the same type of operator (such as China Unicom) will not encounter any problems when communicating with each other, but when users of different operators (such as China Unicom and China Telecom) communicate, it is difficult to effectively guarantee the network quality . The general way to solve this problem is to allow users to connect to the access server of the same region and the same operator, so that the connection channel between the user and the server can be effectively guaranteed. For example, when telecom users in Shanghai access, they must choose a server located in Shanghai, telecom, and with the lowest load to access.

B. Guarantee the bandwidth and quality of the cloud network

        That is, after the data enters the cloud, the network quality inside the cloud must be good. Because the size and quality of the bandwidth inside the cloud can be controlled, it is relatively simple to improve the network capacity of this part. The easiest way is to buy a high-quality BGP network for internal use in the cloud. But the cost of high-quality BGP is relatively high.

C. A more reasonable routing scheduling strategy

        The basic principle of route selection is that the route with the shortest distance, the best network quality, and the least server load is the best quality route.

2. Reduce data volume

        Reducing the amount of audio and video data must be at the cost of sacrificing the quality of audio and video services, but this is a balance.

        What are the ways to ensure the real-time performance of audio and video by reducing the amount of data? Five methods are summarized here, namely, adopting better compression algorithms, SVC technology, Simulcast technology, dynamic bit rate, dropping frames or reducing services. (The most used is Simulcast and dynamic bit rate)

A. Use a better compression algorithm

        H265 and AVI are codecs that have only been introduced in recent years, and their compression rate is much higher than that of the popular H264.

B. SVC technology

        The basic principle is to divide the video into multi-layer coding according to time, space and quality, and then send them to the server in one stream. After the server receives it, it selects different layers to deliver according to the bandwidth of each user. The advantage is that users with different network conditions can get better service quality. But it also has disadvantages: First, the upstream code stream has not decreased but increased, so the upstream user needs to configure a good bandwidth; second, because the SVC implementation is complex and there is no hardware support, the terminal consumes a lot of CPU when decoding.

C. Simulcast Technology

        Simulcast is similar to SVC technology, but its implementation is much simpler than SVC. The basic principle is to encode the video into multiple streams with different resolutions, and then upload them to the server. After receiving the code stream, the server selects one of the most suitable code streams and sends it to the user according to the different bandwidth conditions of each user. Compared with SVC technology, it has the following differences:

  • One is that each stream uploaded by Simulcast can be decoded independently, but SVC cannot;
  • Second, because each channel of Simulcast can be decoded independently, its decoding complexity is the same as that of ordinary decoding;
  • Third, because Simulcast uploads multiple separate streams, the upload bit rate is much higher than that of SVC.

D. Dynamic code rate

        When the network bandwidth estimates that the user's bandwidth is not enough, the compiler will let it reduce the output bit rate; when the estimated bandwidth increases, it will increase the output bit rate. This is the dynamic code rate. If you find that when the network jitter is relatively large, the image of an audio and video product is clear for a while and blurred for a while, it is probably because it adopts a dynamic bit rate strategy.

E. Drop frames or reduce business

        In addition to the methods described above, there is another less friendly method, which is to drop frames or close some unimportant services to reduce the amount of data. Of course, this method is only used when the user's bandwidth is seriously insufficient, and this strategy will only be used as a last resort.

3. Appropriately increase the delay

        We call the fast and slow phenomenon of data transmission as network jitter. For video, network jitter will cause frequent freezes and fast playback; for audio, there will be problems such as stagnation and swallowing. How to solve this problem? The method is actually very simple: increase the delay, that is, first put the data in the queue for buffering, and then obtain the data from the queue for processing, so that the data becomes "smooth".

        However, for real-time audio and video broadcasting, the delay must be controlled within a certain range. As long as the one-way delay is less than 500ms, most people are acceptable. Since the time of audio and video acquisition, codec, rendering, etc. is fixed, as long as the network delay is calculated, the delay of the buffer can be determined.

4. Improve network quality

        There is a default prerequisite for improving network quality, that is, the network quality can only be improved when there is no network congestion, otherwise it is impossible to improve the network quality.

        On the network, packet loss, delay, and jitter will have an impact on network quality:

  • Packet loss is the most important indicator of network quality during network transmission, and it has the greatest impact on the network. A high-quality network has a packet loss rate of no more than 2%. For WebRTC, a packet loss rate greater than 2% and less than 10% is a normal network.
  • Latency is also an important indicator of network quality, but it has less impact on the network than packet loss. If the delay in data transmission between the two ends continues to increase, it means that the network line is likely to be congested.
  • Jitter has minimal impact on network quality. Generally, some jitter will occur on the network. If the jitter is small, it can be eliminated through the circular queue; if the jitter is too large, the out-of-order packets will be treated as packet loss. In WebRTC, the jitter duration cannot exceed 10ms. That is to say, if a packet is out of order, wait for the out-of-order packet at most 10ms. If it exceeds 10ms, the packet is considered lost (even if the out-of-order packet arrives at 11ms) , also still considers it missing).

        Five methods to solve packet loss, delay, and jitter problems: NACK/RTX, FEC forward error correction, JitterBufer anti-jitter, NetEQ, and congestion control.

  • NACK/RTX, NACK is a message type in RTCP, and the receiving end reports to the sending end which packets are lost within a period of time. RTX means that the sender retransmits lost packets and uses a new SSRC (to distinguish transmitted audio and video packets from retransmitted packets).
  • FEC is forward error correction, which uses XOR operation to transmit data, so that when a packet is lost, the lost packet can be recovered through this mechanism. FEC is especially suitable for scenarios where a small amount of random packet loss occurs.
  • JitterBufer, used for anti-jitter, can restore out-of-sequence packets with less jitter to ordered packets.
  • NetEQ, dedicated to audio control, which includes JitterBufer. In addition, it can also use the variable speed and no modulation mechanism of the audio to quickly play the accumulated audio data or lengthen the insufficient audio to achieve audio shake prevention.
  • Congestion control: see below.

5. Quickly and accurately assess bandwidth

        In the field of real-time communication, there are four common bandwidth evaluation methods, namely Goog-REMB, Goog-TCC, NADA, and SCReAM. They have their own advantages and disadvantages in evaluating network bandwidth, but on the whole, Google's latest bandwidth evaluation algorithm, Goog-TCC, is optimal.

Guess you like

Origin blog.csdn.net/weixin_39766005/article/details/132175943