Tencent Cloud H5 Voice Communication QoE Optimization

This article was first published in the cloud+ community and may not be reproduced without permission.

Cloud+Introduction: On April 21, Tencent Cloud+ Community held the "Audio and Video Technology Development Practical Salon" held in Beijing by the "sound" you come, and the "visible" visible-audio and video technology development actual combat salon. Zhang Ke, senior engineer of Tencent's audio and video laboratory, focused on network transmission It explained "Tencent Cloud H5 Voice Communication QoE Optimization", including Tencent Cloud H5 solution, audio QOS optimization overall framework and optimization technology, and operation methods. QOS optimization includes four areas: bandwidth estimation congestion control, anti-packet loss technology, delay, and anti-jitter technology. Zhang Ke focused on sharing the technical issues related to QoS involved in the intercommunication between WebRTC and WebRTC, between tbs and WebRTC, and between tbs and natvie. The retrospective analysis tool can improve work efficiency, quickly discover potential technical improvement points, and speed up technology Iteration speed.

Zhang Ke, Senior Engineer, Tencent Audio and Video Lab

    In November, the W3C released the WebRTC standard. RETF, another international organization focused on WebRTC, also released the first RFC8298 in December, which has not yet become a real standard. The focus of my talk today is some experience around network transmission.

    In the global quality report released by CallStatus.io WebRTC in March 2017, the first 10% of calls will be interrupted midway due to various bandwidth, packet loss, and loss reasons. The second is that 10% to 15% of users are not satisfied with the quality of audio and video calls. The third is about 7% of large packet loss, and about 95% of users have lost less than 240 milliseconds.

It is precisely because the current WebRTC solution has many problems. Let's briefly analyze some of the reasons for the poor quality just now. There are about three reasons:

First, WebRTC itself involves P2P network connections, and there may not be a large number of transit systems in the middle. When encountering cross-operators or even small operators, its network links are not quality guaranteed.

Second, the interoperability and compatibility of various browsers are very poor. These questions happen to be our specialty. We developed a solution based on this.

Two core technical advantages

The first one is our real-time audio and video; the second one is that the browser based on the TBS kernel of the QQ browser supports the WebRTC capability. We can do a lot of modifications and even optimizations to the WebRTC code.

Our client can program WebRTC on WeChat and QQ browsers. We also have some capabilities for streaming, recording, on-demand, and more.

Three Differentiated Quality of Service

To implement a basic set of WebRTC, the workload is background construction, access deployment, including transmission control, media encryption, and increased operational capabilities in the background. QQ's stuff is based on the experience of more than ten years of massive users. Our daily QQ calls are about 1 to 2 billion minutes. The extended WebRTC is equivalent to integrating the WebRTC kernel in the QQ browser kernel, and at the same time, it has been extended to solve some of the current WebRTC problems. We also made some simple extensions to QOS. We will make decisions based on some usage situations of the number of users in the future, and see how these three can be better coordinated, or the advantages complement each other.

Distribution and access status of interface machines

We have established a global link quality monitoring system for computer room nodes. In the WebRTC project, more than 60 nodes have been deployed, covering more than 30 countries. The link delay between 98% of nodes in China is less than 36ms. Between 90% of the global network nodes, probably the link node is less than 100 milliseconds. Link optimization is a continuous process, including the deployment of nodes. It will also be decided according to the usage of the number of users, in which regions or regions to deploy our node deployment.

QOS optimization - congestion avoidance algorithm and anti-packet loss technology

In the background quality control schematic, we can see that we have implemented SFU and MCU. Quality control is a three-level strategy. At the interface level, the quality of the terminal's uplink network is estimated first, including bandwidth estimation, including the connection of the anti-packet loss mechanism. Downlink bandwidth estimation, some mechanisms for anti-packet loss are connected. At the core, the control strategy is to decide how our traffic control should be done based on some quality information of upstream and downstream. There is a core decision-making system, which is also a very core and very competitive part of our entire real-time data. power technology.

What technologies are there in QOS optimization?

It is divided into four areas: bandwidth, packet loss and delay, and anti-shake. In order to do a good job in QOS, there are many links. From the codec, link, transmission, processing, equipment and many other links, to deal with the technical cooperation relationship, the scheduling, management and cooperation of various algorithms are the core difficulties.

Bandwidth tools, including network congestion, are GCC, NANA, SCReam, FRACTal. The anti-packet loss technology consists of ARQ, FEC, and PLC delay, including when our core is doing acquisition and playback, we will control things in many fields. Different problems involve different technologies, the depth and breadth of understanding of various basic technologies and application strategies are the first level of things.

The second level, various technologies, various factors, and their coordinated effects, including feedback and linkage strategies, are relatively comprehensive in the second level.

At the third level, many business characteristics determine differentiated strategies based on scenarios, which also lead to different strategies to be adopted.

If you want to do a good job in QOS, you must solve these three aspects, otherwise QOS will not be good.

What kind of algorithm is a better congestion control algorithm?

Real-time media congestion avoidance control scheme has always been a hot research topic of IETF RMCAT. The transmission delay should be less than 100 milliseconds. Data streams cannot interfere with each other, and cannot spontaneously cause inappropriate use of bandwidth. The smaller the packet loss, the better, the bandwidth application rate should be high, and the bandwidth should be used as much as possible. In the case of limited bandwidth, the transmission channel cannot starve. For security reasons, it is necessary to do something about possible feedback attacks in the area of ​​congestion control. In terms of security, the transmission of media data must be smooth. In terms of fairness, don't starve to death or grab more bandwidth, but share bandwidth.

There are many aspects to adaptability, the first adapting to fluctuations in actual bandwidth. For example, in the audio, discontinuous transmission will be started, which may cause the bandwidth to become weaker and weaker. How to deal with this? The last one is feedback. What should I do if the feedback channel is not good? What should I do if the feedback package is lost? How to control it? If you can do these 10 aspects well, you will provide a better tool.

    TFRC is a technology used about ten years ago. The biggest problem is that the delay is uncontrollable, and the video frame size often changes. Has rate oscillation behavior. Accuracy under high packet loss is problematic.

    LEBDBT, the advantage is that the delay is relatively absolutely controllable. Its disadvantage is that it is too sensitive, and there are network fluctuations or some burst traffic on the bandwidth, which will cause it to collapse quickly, resulting in bandwidth starvation. Another is the latecomer effect. It will preempt the bandwidth of the first channel.

    GCC, it has two parts. The first part is that the receiving end is based on the delay-based control algorithm, and the transmitting end is based on packet loss. It mainly has several problems: it performs poorly in the case of multi-stream coexistence in a mobile network environment.

In the case of coexisting TCP streams, it will yield excessively, resulting in WebRTC stream starvation. In the case of multiple WebRTC streams concurrently, the newly added WebRTC stream will damage the communication quality of the existing stream. SCReam is window based and byte oriented. The disadvantage is that see SCREAM-CPP-implementaion, the wired network is not as good as GCC. NADA is based on latency and loss and is still experimental code. FRACTal is the FEC detection bandwidth, and the disadvantage is that the robustness still needs to be improved, and the mobile and wireless networks need to be verified. QUIC is Quick UDP. The default congestion control algorithm cannot guarantee low latency. Private congestion algorithms can be provided.

Two sets of implementation schemes: The existing WebRTC has switched to the green part of the delay-based originating congestion scheme, and the upper and lower algorithms are controlled by SDP parameters.

This is the practice of several mainstream browsers. Edge is still an old algorithm. The main recommendation of OpenWebRTC is the SCReam algorithm.

Our application strategy is different for different browsers and different versions, and we must be clear about providing WebRTC solutions. We use the congestion control scheme in the basic WebRTC call scenario through the SDK parameter exchange. In different browsers, these congestion control algorithms must be managed. We provide private congestion control algorithms, mainly based on our accumulated experience, including my interpretation of various algorithms just now, including the optimization methods of advantages and disadvantages, and we will also compare different algorithms at the operational level.

There are many kinds of FEC algorithms. The first one is Inband FEC, which generates some redundant information in the speech encoder. Its disadvantage is based on the premise of sacrificing voice quality. Although the traffic can be guaranteed to be stable, its quality is not good.

XOR, this is a main implementation method in WebRTC.

    The error correction rate of Reed-Solomon is higher. Interleaving encoding is also used in WebRTC. Its purpose is not to correct errors, but to disperse the packet loss. There is also a Fountain. This technology is also very old. In the field of implementation in the past two years, it may be widely used in broadcasting. It is a registration code without bit rate, especially suitable for multi-scenario use. It increases traffic and delay, but relatively speaking, the amount of delay added by the FEC mechanism is relatively stable, which is suitable for scenarios with stable channel characteristics.

    WebRTC provides both XOR and interleaved encoding. The XOR FEC used in WebRTC, XOR is implemented by the original code of the packet. I mentioned 4 packets to generate three redundant packets. If the data packet 123 is received, the data packet 123 is lost, and 4567 is received. 123 is a data packet, 4 is a data packet, and a redundant packet is 567. Passing 47, the data packet is rejected.

Feedback related to loss level and out-of-order is required to correctly select KFecMaskRandom or KFecMaskBursty.

This is FLEXFEC in WebRTC, or the XOR relationship just now. One is non-interleaved and one is interleaved. Non-interleaved, say 1234 XOR produces a packet. 5678 generates a packet.

In the case of interleaving, it may be 159, and then a longitudinal redundant packet is generated. Now there are two-dimensional ones, a redundant package is also generated horizontally, and a redundant package is also generated vertically. Its error correction ability is not so strong, which is a thing in WebRTC.

Another is the use of FEC and interleaving. Data packets 1234 and 1234 are written into a matrix, and then read by column, thus realizing interleaving.

The goal of interleaving is to discretize errors and recover them by FEC technology. The greater the interleaving depth M, the greater the dispersion, the stronger the ability to resist burst packet loss, and the greater the delay.

How to design a good FEC algorithm?

    1. To be included in the congestion control algorithm, the anti-packet loss algorithm must be adaptive to the network, which is very important and a prerequisite.

    2. How to reduce redundant traffic under the premise of ensuring anti-packet loss capability.

    3. How to maximize the advantages of various FEC mechanisms and scene feedback (number of consecutive packet loss, packet loss characteristics).

    4. The FEC algorithm, the selection of the packet size, and the impact on the flow, delay, and anti-packet loss performance should be considered.

    5. Dynamic redundancy rate mechanism, convergence speed.

    6. FEC effect evaluation.

    7. In one-to-many scenarios, it is necessary to customize the FEC protection scheme for each receiver.

The receiving end needs to do various kinds of feedback. When the receiving end receives the data packets, it needs to calculate the success rate and feed it back to the strategy center for corresponding statistics and control.

Key points of the NACK algorithm:

    1. Combine the Jitter buffer status decision request of audio/video.

    2. Transmitter/receiver delay control.

    3. Many other fine control details.

    4. Evaluation of retransmission effect.

    5. Operation and data monitoring.

A mechanism that combines error correction and simultaneous interpretation. The above compares the capabilities of ARQ and FEC. ARQ introduces burst jitter, which is difficult to handle. The burst packet loss processing capability is good, and the traffic efficiency is high. The introduction delay is not fixed, but a limit can be set.

    FEC jitter changes are small, and both large and small packet loss can be handled well, but at the expense of bandwidth, burst packet loss cannot be handled well. The introduction delay is relatively fixed.

    The key points in the WebRTC RetEQ algorithm:

    1. Delay estimation algorithm.

    2. Play delay estimation algorithm.

    3. Adaptive decision logic.

    4. Voice variable speed algorithm.

    5. VAD, CNG data algorithm.

about traffic.

1. Reduce the transport header: the transport layer header.

    2. Increase the packet duration, adjust the 20 ms to 60 or 80 ms, and reduce the packet header load.

    3. Reduce the kernel code rate. 1: Optimize the code rate at the VAD and DTX2codec levels.

    4. Reduce redundancy.

About delay

Network delay: processing delay, queuing delay, transmission delay and propagation delay.

Equipment delay: acquisition and playback equipment.

The playback delay is the delay between the arrival of the data packet and the playback time, the anti-shake delay.

Codec and processing delay: Codec and various pre- and post-processing algorithm delays.

Operation method

The first part is a professional laboratory, which ensures the accuracy and reliability of our test data, as well as the recourse, and provides a guarantee for the official launch of the system.

    The influencing factors of QOE are very complex. At present, we only focus on objective quality and design an evaluation system. When we operate the online system, we can provide a simple indicator to measure whether our algorithm is good or bad, and do some sector monitoring until the subsequent optimization direction. Even at the level of algorithm optimization, some clustering can be done, some analysis samples can be delineated, and further targeted optimization can be done.

Problem analysis tool: restore the technical parameters of the call process, quickly restore the problem, analyze and diagnose, and provide rich cases for further optimization.

We verify the effectiveness of working methods through ABTest operations. The quality of the new strategy (odd-numbered room) of the FEC-optimized version of the Android platform is significantly better than the old strategy (even-numbered room). Good systems and algorithms are validated and iterated over with operational data.

How about our cloud voice quality data? Less than 2 points accounted for less than 3%. 10% of the calls are interrupted, and 10% to 15% of users are not satisfied with the quality. This data can be compared.

Our optimization is a never-ending subject. From M56 to the M66 version released two days ago, WebRTC has solved more than 1000 bugs.

Q/A

Q: I would like to ask, for example, when there is an access request, there may be some efficiency, your software, your network, and some other hardware reasons. I would like to know, what kind of problems do you encounter when you optimize this? How to avoid these problems? Ask, for example, when your hardware will have some transmission efficiency, as well as your software, or an integrated interaction between various systems, can you share these troubles?

A: As I just mentioned, at the system integration level, if you only use the browser, there are not many optimization methods except for optimization in the background. It may be more about optimizing your process level. If you're optimizing from within WebRTC, that's too much. All areas of audio and video can be done, and the scope is too large. What I'm talking about is only the level of network transmission. There are many aspects such as recovery, efficiency, and so on.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324890692&siteId=291194637