2022 live broadcast technology summary and low-latency live broadcast solution summary

Common solutions for live broadcast now are:

  • RTMP/HTTP-FLV
  • WebRTC
  • RTSP
  • HLS

HLS: The delay mainly comes from the delay in encoding and decoding, network delay, and CDN distribution delay. Since it is a slice protocol, the delay is divided into two parts, one is the slice buffer delay on the server side, and the other is the delay in the anti-shake buffer on the playback side. The size and number of slices will affect the delay size of HLS, generally more than ten seconds.

RTMP/HTTP-FLV: At present, most domestic manufacturers are using RTMP, which is optimized on the server side compared to HLS. The RTMP server no longer performs slicing, but forwards each frame separately, and the CDN distribution delay is very small. The RTMP delay mainly comes from the anti-shake buffer on the playback side: in order to improve the smoothness of the live broadcast when jittering in a weak network environment, the buffer delay is generally five to ten seconds .

These two types of protocols are based on TCP, and domestic manufacturers have basically made the delay of RTMP over TCP to the extreme. If a protocol is still optimized for delay based on TCP, the effect is difficult to be better than the current RTMP.

When streaming, only WebRTC supports streaming on the web page

The RTMP protocol generally transmits flv, f4v format streams, and the RTSP protocol generally transmits ts, mp4 format streams. HTTP doesn't have a specific stream.

RTMP can expand the number of users with the help of streaming CDN, most of the CDN streaming services on the market only support RTMP

TCP / UDP

When to use UDP

  • Network bandwidth requirements are small, but real-time requirements are high;
  • Most applications do not need to maintain a connection;
  • Low power consumption is required;

With TCP, once packet loss occurs, TCP will cache subsequent packets, and wait for the previous packets to be retransmitted and received before continuing to send, and the delay will increase. UDP-based protocols such as WebRTC are excellent choices.

Ali's low-latency live broadcast transmission uses UDP.

TCP / UDP

  • RTMP is transmitted over TCP.
  • RTSP audio and video stream data can be transmitted using TCP or UDP.
  • WebRTC is based on the UDP protocol.

Due to some of its own characteristics, TCP is not suitable for low-latency live broadcast scenarios. The main reasons are as follows:

  • Slow retransmission: TCP's ACK acknowledgment mechanism, the sending side retransmits with a timeout after packet loss, and the timeout time is generally 200ms, which will cause frame jitter on the receiving side.
  • Inaccurate congestion judgment: The congestion control algorithm based on packet loss cannot accurately judge congestion, and packet loss does not equal congestion; it will also cause bufferbloat of the sending link, increase the RTT of the link, and increase the delay.
  • Poor flexibility: This is the main reason. The TCP congestion control algorithm is implemented at the operating system kernel layer, and the optimization cost is high. The mobile terminal only uses the existing optimization of the system.

Therefore, it is better to implement the scheme based on UDP.

Private message me to receive the latest and most complete learning and improvement materials in 2022 , including ( C/C++ , Linux , FFmpeg , webRTC , rtmp , hls , rtsp , ffplay , srs )

 

Audio and video

Common RTMP live audio and video encoding is

  • Video H.264 encoding
  • Audio AAC encoding

Ali's low-latency live broadcast

  • Video H.264 encoding (the source stream does not support B frames (screen jumps will occur), H265 encoding, GOP is less than 3 seconds)

  • Audio uses Opus encoding

Opus is a lossy sound encoding format, developed by the Xiph.Org Foundation and later standardized by the IETF Internet Engineering Task Force, with the goal of replacing Speex and Vorbis with a single format that hopes to include sound and speech, and is suitable for Delayed real-time sound transmission, the standard format is defined in the RFC6716 document. The Opus format is an open format and there are no patents or restrictions on its use.

Opus integrates two voice coding technologies: voice coding-oriented SILK and low-latency CELT. Opus can seamlessly adjust high and low bitrates. Inside the encoder, it uses linear predictive coding at lower bit rates and transform coding at high bit rates (and also uses a combination of the two at the intersection of high and low bit rates). Opus has a very low algorithmic delay (22.5 ms by default), which is very suitable for encoding low-latency voice calls, such as real-time audio streaming on the Internet, real-time synchronized voice narration, etc. In addition, Opus can also reduce the encoding code by reducing the code. rate, to achieve a lower algorithm delay, the minimum can be 5 ms. In multiple blind listening tests, Opus has lower latency and better sound compression than common formats such as MP3, AAC, and HE-AAC.

WebRTC

The end-to-end WebRTC live broadcast method is not suitable for live broadcast scenarios

One-to-many WebRTC live broadcast based on media server

WebRTC is generally used for point-to-point two-way video and voice chat, such as WeChat video, live broadcast does not require two-way video, live broadcast is generally one-to-many, only unilateral video, voice processing in WebRTC, such as echo cancellation, noise suppression, automatic Gains etc. have no effect in live streaming.

Most of the Internet says that WebRTC is mainly used for end-to-end, such as video conferences with a small number of people, but in fact, many manufacturers have already used WebRTC for live broadcast solutions.

However, the transformation cost is relatively large compared to the traditional live broadcast solution. However, the benefits brought by it are also relatively obvious, and there are relatively large improvements in terms of delay, first frame, and weak network freezes. A millisecond-level delay can be achieved.

The part of WebRTC (javascript) in Chrome needs to be developed with nativeRTC (C++) on the server side (cloud).

There are three main nodes involved in the live broadcast process: the live broadcast initiator, the streaming media server and the playback terminal.

The implementation of the live broadcast initiator is relatively simple. You can use js script to use the browser's WebRTC API to achieve video and audio collection, synthesis, encoding, and transmission, or you can use the Android or iOS WebRTC SDK to achieve. Programmers with certain js development capabilities or app development capabilities can be competent for this part of the work.

The development of a streaming media server is relatively difficult. It needs to be able to process WebRTC signaling, receive the RTP protocol, and perform protocol conversion to provide large concurrent live broadcast output to the outside world. If you want to reduce the development cycle and investment, you can use the existing mature products, such as the domestic streaming media server software NTV Media Server G3, which performs well in protocol conversion and broadcast capabilities.

No matter which product is used, the implementation functions on the server should be the same, namely: WebRTP protocol adaptation, audio and video stream reception, protocol re-multiplexing, and broadcast.

The work of the playback terminal is relatively small. After being adapted by the streaming media server, the general terminal does not need any improvement, and can continue to use the original protocol and method for playback, such as the http-flv protocol or the hls protocol. Of course, a WebRTC playback terminal can also be developed.

WebRTC has better cross-platform support and lower latency, but it is more difficult to get started than RTMP

RTMP related

push stream

  1. rtmp publish h264 raw data

  2. The easiest way to publish h264 rtmp is srs-librtmp

  3. Use ffmpeg directly

  4. It can also be packaged into ts

Server

  • srs

  • nignx+nginx-rtmp-module

  • livego

  • node-rtsp-rtmp-server

Pull flow

  • ffmpeg

  • python-librtmp

RTMP Latency Optimization

One of the weaknesses of RTMP is the accumulation of errors, because RTMP is based on TCP and does not lose packets.
Therefore, when the network status is poor, the server will cache the packets, resulting in accumulated delay; when
the network status is good, they will be sent to the client together.
The countermeasure for this is to disconnect and reconnect when the client's buffer is large.

Push end -> RTMP server -> pull end

Then for this process, we can see that the main delay will only occur in these three places

Private message me to receive the latest and most complete learning and improvement materials in 2022 , including ( C/C++ , Linux , FFmpeg , webRTC , rtmp , hls , rtsp , ffplay , srs )

 

Push end

For a streaming end, the first thing involved is encoding, that is, encapsulating the video stream. An important concept GOP is involved here, which refers to the interval between two I frames in the video. So let's first understand the I frame, B frame and P frame in video coding.

I frame, B frame, P frame

We know that video or animation uses the principle of visual residue of the human eye to achieve the effect of animation through a series of pictures. Therefore, video transmission is essentially the transmission of frame-by-frame picture data.

So if each frame of our picture is a complete picture, how much data is needed? We assume that we are transmitting a video composed of 1080P 8bit images, then each frame is 1920x1080x8x3=49766400bit, which is 47.46MB, then if the video frame rate is 30Hz, then it is 1423.82MB per second. That is, in this case, the data we transfer per second is 1.39MB. In fact, this amount of data is relatively large in many cases. Imagine if you watch such a video for 10 minutes, then the flow is 834MB.

To solve this problem, video compression technologies, such as H.264, are currently used for video transmission and storage. In this technique, the encoder divides the picture into three types, namely I-frames, B-frames and P-frames:

  • I frame: that is, the key frame (Intra-coded picture internal coding frame), which refers to the complete picture. In this frame, there is complete picture information.
  • B frame: Bidirectional interpolation frame (Bidirectional predicted picture bidirectional reference frame), this picture is incomplete, if you want to rely on this picture to get complete picture information, you need to refer to the previous frame and the next frame. This is the frame that contains the least picture information.
  • P frame: forward predicted frame (Predicted picture prediction frame), this picture is also incomplete, if you want to rely on this picture to get complete picture information, you need to refer to the previous frame.

Video coding is to produce a GOP (Group of Picture), that is, a group of pictures containing a key frame I frame. The GOP length is the distance between two I-frames. So what effect does GOP have on latency? Because of the existence of GOP, the video decoder on the playback side needs the inverted key frame to decode. If the key frame cannot be obtained at the beginning, the decoder can only wait, and then a black screen will appear until the key frame arrives. And what is the longest time? is the length of a GOP.

So in order to prevent the black screen from appearing, many times the server will cache the previous GOP, which causes the client to always play from the previous I frame, so the delay is at least the length of one GOP. At this time, many friends will definitely think that we can reduce the length of the GOP, right? This idea is indeed correct. Many places with high real-time requirements do this. The reason why this is not done in all scenarios is because the GOP is too low, which will lead to lower coding rate compression ratio and lower image quality.

cache

Because RTMP is based on TCP, there is a problem of accumulated delay, that is, when the network conditions are not good, in order to ensure the reliability of transmission, the failed packets will be saved and sent together when the network conditions are good. But for live broadcast, this will undoubtedly increase the delay. If the network fluctuates greatly, the cache will be harmful, so generally, the cache on the push side will be set as small as possible.

Server

For the server, the first thing to consider is the same as the above, that is, the cache. If it is a low-latency requirement, then the cache of the server should not be designed too large. So in addition to the cache server, what else needs to be considered?

Merged-Read

The read efficiency of RTMP is very low. First, read a byte to determine which chunk is, then read the header, and then read the payload. Therefore, in order to improve performance, the server will generally use merged-read, which is to read a few milliseconds of data at a time, and perform a read. The disadvantage of this is that the server must receive at least so much data to read, and this is the size of the delay. So if it is a low-latency scenario, then you need to turn off this function and let the server parse every time it receives a packet.

Merged-Write

Similarly, in order to provide efficiency, the server will also perform merged-write, that is, send data to the client for a few milliseconds at a time, which will also cause delays. The advantage is that more clients can be supported. So in low latency scenarios we need to make tradeoffs based on requirements and set this to a smaller value.

GOP

As a result, when pushing the content of the streaming end, the server should close the GOP cache and not cache the previous GOP.

cumulative delay

Similarly, because the streaming end may use RTMP or HTTP-FLV, both of which are based on TCP, there will be accumulated delays. The solution to this problem is to reduce the size of the buffer area, and discard if too many buffers are found.

pull end

In fact, there is only one thing to consider at the streaming end, that is, the cache settings and the cache strategy. Because it is not professional, I can only talk about the idea, that is, to obtain the length of the cache and the current playback position, and then the difference between the two is the specific delay. Therefore, a threshold value needs to be set. When it is greater than this value, dynamic fast forwarding is performed. This allows for perceptual latency reduction.

Summarize

Through the above analysis, we can see that the delay in RTMP is an unavoidable problem. What we can do is to try to balance the delay and performance according to the demand. The most important thing here is the cache. The advantage of the cache is stability, but its disadvantage is also obvious, that is, the increase of delay.

RTMP has a large delay, mainly in the collection and push side, and the playback side processing, especially the playback side. The current RTMP players say that the open source framework has been changed, and the delay cannot be reduced.

And the delay caused by TCP will cache subsequent packets in the case of network instability, so to really reduce the delay, it is necessary to customize the protocol based on UDP.

Applicable scene

  • Education Live

    Large class classes can support a large number of students to interact with teachers online with low latency at the same time.

  • E-commerce live broadcast

    Interact with buyers in real time to answer questions and exchange product information.

  • live sports

    Exciting competitions, e-sports and other events allow the audience to understand the scene in real time.

  • Interactive entertainment

    Timely feedback enhances interaction, which greatly optimizes the guest feedback interaction experience when the audience presents gifts.

The above solutions are basically suitable for live broadcasts to choose between RTMP and WebRTC.

but

In terms of delay, WebRTC is better than RTMP. WebRTC can achieve a delay of less than 1 second, and RTMP is generally more than 2 seconds, basically between 2 and 10 seconds.

Sophistication RTMP is better than WebRTC

Our vision for the future of low-latency live streaming technology is three-fold:

  1. The current WebRTC open source software can not support live broadcast very well. I hope that the future standard WebRTC can live broadcast well, so that we can easily do low-latency live broadcast on the browser.
  2. After the arrival of 5G, the network environment will get better and better, and low-latency live broadcast technology will become a technical direction of the live broadcast industry in the future.
  3. At present, most of the low-latency live broadcast protocols of various manufacturers have private protocols. For users, the cost of switching from one manufacturer to another will be very high. The unification and standardization of low-latency live broadcast protocols are very important to the live broadcast industry. A basic judgment is that with the solution and popularization of low-latency live broadcast technology, the low-latency live broadcast protocol will definitely move towards unification and standardization in the future. I also hope that technology manufacturers in our country can make their own voices and contribute to the process of promoting the standardization of low-latency live broadcasts.

Manufacturer's choice

Instant Technology (RTMP)

At the beginning, I also considered using WebRTC for live video, but after research, I gave up and switched to RTMP for live video. The reason is that 60% of browsers in China do not support WebRTC, and the effect of Google Chrome, which mainly promotes WebRTC, is greatly reduced in China. RTMP is actually not the best choice, but we finally chose RTMP, why? Because RTMP is a standard protocol, it can be supported by many CDN networks and is compatible with customers' old systems. Although RTMP is difficult to achieve relatively low latency, after our constant struggle, we still created a miracle, achieving a delay of 400 milliseconds on the anchor side, and a delay of about 1 second on the audience side. In fact, the UDP protocol is the most suitable for live video, which is easy to achieve relatively low latency. Unfortunately, the UDP-based private protocol has inherent deficiencies in compatibility, so we finally abandoned it and used it as a complementary solution, which is relatively poor in the network. Only use the private protocol based on UDP to push the stream, and usually use RTMP. In the process of providing live broadcast services to Huajiao, Yizhu, and Momo, we are even more fortunate that we made the right decision when we chose the agreement. If we adopt WebRTC, these big companies will not choose us no matter how good we are.

Therefore, if you need to cover a wide range of users and ensure that your live broadcast platform is universal, WebRTC is really not recommended. Please do a little more test comparison to verify the situation I described above.

Tencent Classroom (WebRTC)

Tencent Classroom has launched a one-to-many live broadcast solution of WebRTC.

However, the transformation cost is relatively large compared to the traditional live broadcast solution. However, the benefits brought by it are also relatively obvious, and there are relatively large improvements in terms of delay, first frame, and weak network freezes. A millisecond-level delay can be achieved.

Taobao Live (WebRTC)

RTS is a low-latency live broadcast system jointly built by Alibaba Cloud and Taobao Live. This system is divided into two parts:

  • Upstream access: Three input methods can be connected, the first is H5 terminal, which uses standard WebRTC to push streams to the RTS system; the second is traditional RTMP streaming software such as OBS, which uses RTMP protocol to push streams to the RTS system ; The third is the low-latency push stream end, which can use our proprietary protocol based on RTP/RTCP extensions to push streams to the RTS system.
  • Downstream distribution: It provides two kinds of low-latency distribution, the first is standard WebRTC distribution, and the other is our private protocol extension based on WebRTC. Taobao Live currently uses distribution based on private protocol the most.

Low-latency live RTS (Real-time Streaming) is based on Apsara Video Live, and performs full-link delay monitoring, CDN transmission protocol transformation, UDP and other underlying technology optimization. It supports the millisecond-level delay live broadcast capability in tens of millions of concurrent scenarios, making up for the 3-6 second delay in traditional live broadcasts, and ensuring the ultimate live broadcast viewing experience with low latency, low freeze, and smooth opening in seconds.

illustrate

  • The live streaming end continues to use RTMP to push streaming.
  • Standard live streaming (RTMP, FLV, HLS) uses native rtmp://and http://formats.
  • The low-latency live streaming (UDP) artc://format used.

Summarize

RTMP method

advantage:

  • CDN support is good, and mainstream CDN vendors support it
  • The technology is relatively mature and the integration is convenient.
  • Compared with the end-to-end WebRTC method, it has a high degree of concurrency and is suitable for multi-person live broadcast scenarios;

shortcoming:

  • The protocol is based on tcp, which has a larger delay compared to the WebRTC method, and has poor experience in some low-latency scenarios;
  • Does not support browser push streaming, etc.;

End-to-end WebRTC

Based on the end-to-end WebRTC method, strictly speaking, it is not a conventional live broadcast scenario. It is mainly suitable for video conferences and other scenarios with a small number of people. Each node establishes a p2p connection for audio and video transmission. The main workflow is shown in WebRTC above. .

advantage:

  • On the web side, for developers and users, the development and use of audio and video communication is simplified; for developers, the threshold is low, and there is no need to be familiar with streaming media, just calling js api can be achieved; for users , open a browser such as a browser.
  • Peer-to-peer communication saves server bandwidth costs.
  • Compared with the tcp-based rtmp push-pull streaming mode, the udp-supported WebRTC mode has low latency.

shortcoming:

  • Client browser performance is limited. If it is a 1v1 live broadcast, it is good; if multiple people perform live broadcast at the same time, the browser needs to transmit video to many people at the same time, and the performance is not good.
  • Audio and video processing is relatively difficult. WebRTC has few open API interfaces, and it is difficult to integrate third-party audio and video processing solutions, such as the beauty of live shows.
  • The transmission quality of audio and video is difficult to guarantee, especially in the case of cross-region and cross-operator, only end-to-end quality control algorithm can be used, which cannot be guaranteed.
  • Compatibility issues. On the PC side, the current mainstream browsers support WebRTC, but on the mobile side, only some browsers support it (currently, none of the mainstream mobile phone browsers in China support it).
  • The follow-up work on the live broadcast content is not easy to carry out, and the content quality is difficult to control. For example, playback and content review generated by rtmp push-pull streaming are difficult to handle.

WebRTC live broadcast based on media server

End-to-end WebRTC is limited by the limitations of client performance and the number of connections, so it is difficult to apply to live broadcast scenarios. In order to solve these problems, a media server can be introduced, and the client only transmits one audio and video stream to the media server, and the other clients perform audio and video display by establishing a connection with the media server.

The current open source mainstream WebRTC media servers are as follows:

  • Current
  • licode
  • janus

advantage:

  • Compared with the end-to-end WebRTC method, it avoids problems such as low client performance, audio and video processing, and content auditing, and supports more complex application scenarios;
  • Support multiple people to watch live broadcast at the same time, with high concurrency;
  • It is relatively simple and easy to integrate on the web side, and it can be accessed with a browser, and the delay is low;

shortcoming:

  • Compared with the end-to-end WebRTC method, this method has a higher development cost and needs to implement its own media server, and there is currently no mature solution.
  • Compared with mature rtmp supporting solutions, there are relatively few surrounding facilities.

Summarize

In summary

Based on RTMP, it is suitable for multi-person live broadcast scenarios, but it is not suitable for low-latency live broadcast, and does not support Web-side push streaming;

The end-to-end WebRTC live broadcast method is not suitable for live broadcast scenarios;

There is no mature solution for WebRTC live broadcast based on the media server, and it is necessary to implement the media server by itself. The threshold is relatively high.

Guess you like

Origin blog.csdn.net/m0_60259116/article/details/124321920