Evolution and Future of Media Transmission Protocol

Audio and video applications have shown a rapid development trend in recent years, becoming the main carrier of Internet traffic, with rich gameplay and various forms, and many complicated media transmission protocols have emerged as the times require. LiveVideoStackCon 2022 Beijing Station invited Zhou Chao, the person in charge of Kuaishou’s transmission algorithm, to introduce the evolution and challenges of media transmission protocols based on Kuaishou’s KTP, KLP, LAS and other protocols and standards based on Kuaishou’s optimization and practice in media transmission. ; Also shared the latest media transmission standard CMTP to explore more possibilities in the future.

Text/Zhou Chao

Edit/LiveVideoStack

-01-

The Prosperity of Media Transmission Protocols in the Age of Audio and Video

This sharing will start from three parts: the status quo of the media transmission protocol, Kuaishou's practice in media transmission protocol optimization, and the outlook for the future.

In recent years, audio and video technology has developed rapidly. With the superimposition of network and AI technology, audio and video have become ubiquitous. The application scenarios cover multiple directions such as on-demand, live broadcast, e-commerce, real-time interaction, games, medical care, and education.

86aef3906d3684e4d2e1ecbc9c57afca.png

From the perspective of user experience, audio and video applications need to find a balance between delay, fluency, and clarity. Corresponding to network transmission, the essence is to find a balance between delay, transmission reliability, and bandwidth utilization. point.

1cedadd20dab6d4b05c93f96f99df8a8.png

c56d6e295d1723dffe587a07484cc5db.png

Based on this, audio and video applications can be roughly divided into three categories, namely pan-VoD, pan-RTC and pan-Live.

Pan-VoD focuses on on-demand applications, is not sensitive to delay, and pays more attention to transmission reliability and bandwidth utilization. Pan-RTC applications are very sensitive to delays. Only when delays are guaranteed, can the reliability of transmission and bandwidth utilization be pursued. Pan-Live applications are between the two, and have certain requirements for each dimension, and different Live vertical scenarios have certain differences in the balance between the three (reliability, delay, and bandwidth utilization).

5fcf03d3f55895d47ff68a5a0a13b4f4.png

From an architectural point of view, pan-VoD can be roughly divided into four steps. After the video is collected and imported, it is superimposed with magic expressions and other playing methods, and then uploaded to the cloud after transcoding and compression. After a lot of work such as review and pre-processing on the server side, a second transcoding is generally performed to further improve the compression rate and generate multiple high-quality copies. Finally, it is distributed to users by CDN for download, decoding, rendering and playback.

On the production side, whether creators can quickly and successfully release their works will directly affect their enthusiasm for creation.

On the consumption side, users pay more attention to the clarity and fluency of video, both of which require reliable transmission and high enough bandwidth utilization. When it comes to reliable transmission, the most common ones are the HTTP protocol and the QUIC protocol. In addition, on the consumer side, in order to cope with the differentiated networks of a large number of users, multi-bit rate adaptive technologies, such as DASH and HLS, are generally adopted.

6fc740116004a0fe8091f9974325c9ca.png

The pan-Live application architecture is similar to the pan-VoD. Both hosts and viewers expect low-latency, high-definition, and high-fluency experiences. However, live streams are generated in real time, which requires high transmission stability, and some compromises will be made in terms of bandwidth utilization and delay. For example, when the network fluctuates violently, frame loss and packet loss are allowed. The live broadcast in the industry mainly adopts the RTMP protocol. In recent years, the QUIC protocol is also being tried, such as the RTMP over QUIC solution. On the consumer side, multi-bit rate adaptive technology is usually used. However, the common DASH and HLS technologies are based on fragmentation architecture, which will bring a large delay in live broadcast scenarios. At present, in order to reduce the delay of live broadcast, many manufacturers are also trying the fast live broadcast solution based on WebRTC.

f676ce6603c879d158e38b18505992e9.png

The goal of the pan-RTC scenario is very clear, which is to achieve ultra-low latency interaction. On the premise of satisfying low latency, the clarity and fluency are further improved. Currently the most used solution is WebRTC, and many companies have also carried out secondary development based on WebRTC to form their own solutions.

953e2abe0f3cedef139139465564507b.png

Generally speaking, each type of application scenario currently has its own relatively mature protocol, which has high stability and good support from various manufacturers, but there are also problems such as poor flexibility, difficulty in cross-layer optimization, and lack of business awareness.

-02-

Kuaishou's practice in media transmission optimization

5139b076a9e90a063a7e159d29d5f4d9.png

In Kuaishou's transmission system, the underlying algorithm is the core part, including common congestion algorithms, multi-bit rate adaptive algorithms, weak network confrontation algorithms, and so on. On this basis, we have designed a wealth of transport protocols, such as KTP, LAS, AAS, KLP, etc. KTP is the first private transmission protocol developed by Kuaishou, which is used in business scenarios such as live streaming, work release, and RTC; LAS is a low-latency live multi-bit rate adaptive protocol developed by Kuaishou. All cloud vendors support it; KLP is a live streaming protocol developed by Kuaishou, which is used to improve the transmission efficiency of live streaming; AAS is a multi-bit rate adaptive protocol in on-demand scenarios, including short video and long video scenarios.

d1249c3ebda68647b7ab07c9851ed496.png

At the beginning of the design of KTP, it is hoped that one protocol can support multiple business scenarios such as on-demand, live broadcast, and RTC at the same time, so as to solve the problems of numerous protocols and high maintenance and optimization costs. In terms of architecture, KTP is generally divided into two layers: the bottom layer is the transmission control layer, which supports a dynamic balance between transmission delay, reliability, and bandwidth utilization through the design of the protocol. Above it is the business perception layer, which perceives business characteristics and adopts the best strategies and algorithms according to the characteristics of different businesses.

d9cba019207aa0d842f941470a58ae4c.png

Through actual test and comparison, it is found that in the live streaming scenario, KTP can still maintain a clear and smooth streaming experience when the packet loss rate is 60% (left picture), while RTMP will experience severe lag when the packet loss rate is 15%. paused, in an unusable state (right image).

84236797a217aafaedd1938bb450f22e.png

In the work publishing scene, the KTP-based general upload service has already been used in Kuaishou's various work publishing/file transfer scenarios, and has significantly improved the success rate of work publishing, from the initial 70% to 80% to more than 99%. Furthermore, even as the network of users becomes more complex and the size of the work grows, the time-consuming to publish it has been declining.

739707680a6f9d90889162a87a97f142.png

Finally, in the RTC scenario, KTP supports all RTC services within Kuaishou, such as PK, Lianmai, conference, StreamLake, etc. Based on advanced algorithms and architecture, the KTP-based RTC solution is significantly ahead of competing products in terms of experience and performance.

d21384ec25a16edc8e3f9a158e31a2d8.png

In addition, in the 2021 ACM Multimedia Low Latency Transmission Challenge, Kuaishou also won the first place with a huge advantage.

e116eb791d725450c91c5bcddb0ef0d5.png

The protocol is a bridge that supports various functions and business requirements, but its transmission performance mainly depends on the underlying algorithm. For example, the congestion control algorithm, one of the core algorithms in the network field, has been a research hotspot and difficulty in the past few decades, directly affecting the transmission performance of the protocol, bandwidth utilization, and weak network resistance. Kuaishou has been continuously digging in the field of algorithms. For example, the self-developed congestion control algorithm IA2C has performance far superior to BBR; NNCC based on reinforcement learning has made new breakthroughs in bandwidth utilization; the next-generation congestion algorithm AQDC, which is currently preparing to go online, is in Significant gains have been made in terms of bandwidth utilization and latency.

b14f5202513992523f906c67474a3917.png

3ff8e7ef17c9ec11df9a2a78164b3598.png

KTP is widely used in scenarios such as work publishing, live streaming, and RTC, and has achieved good returns. However, due to historical reasons, KTP has not been well supported and optimized on the downlink. Therefore, in 2020, we reused the underlying transmission control of KTP, and on this basis, added strategies and algorithms suitable for live streaming features, forming the KLP protocol. When KLP was launched overseas, it achieved very good results.

b3e107b1fc5792ac171ccae9959f4077.png

726504816114f08e96c948bb049c04be.png

On the consumer side, in order to cope with different network characteristics of users, multi-bit rate adaptive technology is generally used to balance fluency and clarity, such as international standards DASH and HLS, the general principle of which is to transcode video files into multiple Gears, each gear is segmented, and the consumer side selects different segments according to real-time network conditions, and finally stitches them into a complete video. These two standards are highly mature, but they were originally designed for on-demand and directly used in live broadcast scenarios, which will cause a large delay.

072c2a2e03550a02c81d062834ffb4d5.png

After full research and discussion, Kuaishou decided to establish a set of low-latency live multi-bit rate standards, that is, LAS. At present, LAS has officially become the industry standard and is widely adopted by the industry. For details, please refer to the official website introduction ( https ://las-tech.org.cn/#/ ).

540da5c65113475189f7b1682a1d6300.png

In terms of on-demand multi-bit rate, we also considered the differences between short videos and long videos, and formed the Kuaishou on-demand multi-bit rate adaptive standard - AAS. In terms of protocol description, the design of MPD and DASH is referred to. The core is the multi-bit rate algorithm researched by Kuaishou, including traditional model-based algorithms and deep learning-based ABR. These algorithms have achieved very good results in different scenarios. Good results.

3ac618a3ae55d61421a99d9217889861.png

In addition, in the short video transmission challenge of ACM Multimedia in 2022, Kuaishou also won the first place with a huge advantage.

a9d9742fad1e43096dced88c97db88a4.png

At present, Kuaishou's network transmission mainly relies on a series of self-developed protocols, but there are still a series of problems, such as insufficient coverage of downlink scenarios, business coupling, ecological closure, inability to empower industries, and inability of three-party CDN to support all scenarios.

-03-

Next Generation Media Transfer Protocol: CMTP

dc5fcf3263e12c339d85bb1a06e3c0ad.png

Based on the successful experience and algorithm accumulation of multiple previous protocols, we expect to design a new protocol CMTP, which can be applied to all scenarios and solve problems such as insufficient coverage and ecological closure. In general, CMTP has the following four characteristics: common architecture, all scenarios, high scalability and rich features.

b7cf4a1496c538fcf703b3a841357db1.png

Architecturally, CMTP is divided into five layers:

UDP/TCP layer : The network protocol used by the underlying IO. UDP is used by default. UDP has high flexibility and is easy to expand. It can support multiple algorithms and strategies. For UDP Block, TCP is used.

Transmission control layer : supports UDP and TCP two modes. Based on UDP, the protocol field, packet grouping and unpacking methods, session management, etc. are standardized, and functions such as ARQ, FEC, congestion control, 0-RTT, encryption, and multiplexing are supported. Based on TCP, it also standardizes protocol fields, packing and unpacking methods, etc., and supports functions such as encryption and multiplexing.

Transmission presentation layer : standardizes the interfaces and functions that the transmission control layer needs to provide, including the definition of media sessions and media streams, as well as the representation of media data and control signaling, and supports protocol optimization.

Application-aware layer : organized in a componentized manner, aware of different business needs, and providing exclusive optimization functions through corresponding components, including live broadcast components (Live), video-on-demand components (VoD), real-time communication components (RTC) and general components. Each component is independent in function and can be plugged, replaced or added to ensure its strong enough scalability, compatibility and business adaptability.

General interface layer : standardizes external standard interfaces and configurations, including client and server interfaces, meta information and common configuration formats, etc.

e68629cef912245be7a2cf2769c653be.png

At present, CMTP has been launched in Kuaishou and has achieved significant benefits. In addition, many manufacturers already support CMTP and promote standardization together with Kuaishou. In the future, we hope that more teams will join us to build a good CMTP ecosystem together.


362f91e43a7501ae63e2ce9506a29d49.png

LiveVideoStackCon 2023 Shanghai lecturer recruitment

LiveVideoStackCon is everyone's stage. If you are in charge of a team or company, have years of practice in a certain field or technology, and are keen on technical exchanges, welcome to apply to be a lecturer at LiveVideoStackCon. Please submit your speech content to the email address: [email protected].

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/130838077