Dewu live broadcast low-latency exploration|Dewu Technology

1. Background

The timeliness of the live broadcast ensures a good user experience. According to experience in the transaction process, the lower the delay, the better the conversion effect. The traditional live broadcast delay problem has become a problem that cannot be ignored. High delay not only destroys the user's viewing experience, but also makes it difficult for the anchor to obtain user feedback in real time. In order to further optimize the timeliness experience of the live broadcast, we need to have a clear understanding of the cause of the delay and the entire interactive link, so that we can implement related solutions stably.

2. Subjective experience

Our team has internally observed the latency of other e-commerce platforms. Among them, the end-to-end latency of the TOP1 platform is about 3s, while that of Dewu is about 5s. There is still obvious room for improvement. We need to further clarify the specific reasons.

3. What are the benefits of lower latency

3.1 Improve the smoothness of the transaction process

In Dewu’s live broadcast scene, there is a link to add flash products. The countdown of flash products is carried out in real time. If the live broadcast screen has a delay of nearly 8s to catch up, there will be gaps in the communication between users and anchors during this process. . During the live broadcast, the user asked questions in a high-latency scene, but the anchor did not respond for a long time. During this period, the user may exit the live broadcast room or skip this product. This result is not very good for the anchor or transaction conversion. accept.

1.png

3.2 To improve the experience, the delay between different users is too different

Users A and B may be watching a certain live broadcast room. User A may have entered the live broadcast room very early, while user B is a newcomer, but the delay of user B is a few seconds lower than that of user A. When you see it, you may wonder whether there is a problem with your mobile phone, network, or APP, resulting in bad experience feedback.

4. How does the live broadcast delay occur?

To find out how the delay is generated, we must understand which of these programs may experience delays and can be optimized.

Host --> Cloud Server --> CDN Node --> User

Cloud server--> anchor: Live content transcoding, compression, etc.

CDN node --> user: Live broadcast content is distributed to multiple edge nodes

User --> Device: Receive Live Content --> Display Live Content

2.png

4.1 Where delays may occur during these processes

(Part of the explanation comes from third-party literature)

  • There may be a delay in the acquisition and encoding equipment used by the anchor

It mainly includes the encoding delay and the delay introduced by the sending buffer. There is not much room for delay optimization in this link. Although the encoding delay can be effectively reduced by adjusting the encoder parameters, it brings the loss of image quality and also affects the compression effect. Therefore, most Focus on optimizing weak network transmission, the starting point is to provide users with a smooth viewing experience, not limited to reducing latency

  • The processing time of cloud server for transcoding and compression of live content

For live broadcast platforms, real-time transcoding is a very necessary technology. Through real-time transcoding of video streams, high-definition video streams can be optimized to multiple resolutions to meet the compatibility and bandwidth requirements of different terminal devices, and reduce the overhead of network transmission. However, the process of real-time transcoding will inevitably bring a certain delay, because:

    1. The transcoding process requires analysis and processing of the video stream, such as compression and format conversion. This process requires certain computing resources and time.
    2. The transcoded video needs to be retransmitted to the CDN node, and then played by the viewer's device. This process may be affected by factors such as network bandwidth and transmission rate, resulting in a certain delay.

Therefore, for the problem of transcoding delay, a trade-off needs to be made between reducing delay and improving video quality. Using some advanced transcoding algorithms, reducing image quality, reducing the damage to video quality, optimizing encoding parameters, etc., will also bring loss in image quality and compression rate, so this part of the delay needs to be considered comprehensively according to the actual scene , if the delay requirement is very high, you can adjust it slightly.

  • Network transmission delay of CDN nodes

Regardless of the back-to-source situation, the main effect of this link is the gop cache strategy. The names of various CDN vendors are inconsistent, and some are also called (RTMP, FLV, HLS...) Delay, which means caching the latest stream at the edge node. several gops (the average media duration is 5-7s on average), the purpose is to send the media data in real time when the streaming request is established, and optimize the first frame and freeze, which also leads to the first frame received by the player. One frame of data is old data 5 to 7s ago, and the delay of the first frame reaches 5 to 7s. This link is the root cause of excessive end-to-end delay .

  • Player's anti-jamming buffer fixed delay n(s)

During live streaming playback, in order to improve playback fluency and user experience, a part of the buffer is usually cached. Caching refers to pre-loading a certain amount of video data into the local cache before the player starts playing, so that the data in the cache can be quickly read during subsequent playback to avoid stuttering and unsmooth phenomena. This "preloaded" cache is called a "cache buffer".

Different cache buffer sizes may lead to different delay times. The common cache buffer size is 2 seconds or less, which means that the player will pre-load 2 seconds from the video source to the local cache, and wait for the player to play close to the cache. At the end, 2 seconds of content will be preloaded into the cache again to ensure smooth playback.

Fixed delay means that after the player receives the network data, it needs to wait for a certain fixed time to ensure the normal playback of the data. It is generally used to prevent the video from being stuck and not smooth during playback. For example, if the fixed delay is set to 1 second, it will take about 1 second longer for the process from when the data packet arrives at the mobile phone to when the data packet is actually played out. This is the effect of the fixed delay.

Normally, the player will automatically adjust the cache buffer size and fixed delay time according to the current network environment to ensure the best playback effect. However, if the network environment is not ideal, there may still be video freezes and unsmooth situations. At this time, the playback effect can be improved by configuring and optimizing the cache buffer size and fixed delay time.

  • The delay caused by the receiving, decoding and other operations of the user equipment

Assume that the hardware performance of the user's device is relatively low-end, and problems such as freezes occur when receiving and decoding live data. To this end, you can optimize the code stream parameters, such as adjusting the code rate and resolution, to make it more suitable for the hardware performance of the user equipment; or use as few transmission protocols as possible to reduce the decoding time and the use of related computing resources.

  • Comprehensively stated

Any problem in any of these links may cause delays in the live broadcast process. In order to reduce this delay, the processing efficiency of each link can be optimized, and the parameters and settings of network transmission and other aspects can be optimized.

In the transmission link of the live broadcast, the major impact on the delay is the transmission, transcoding, distribution and playback buffer of the CDN. Using the real-time transcoding mode, the delay introduced by the transcoder is generally within 300ms or even shorter. The CDN distribution link will also bring a certain delay, but it is relatively short. In order to combat the delay introduced by the playback buffer introduced by network jitter, the delay introduced by the playback buffer is often 5s or more.

4.2 How to know the specific delay?

  • method one:

The end-to-end test method is adopted, that is, the calculated 播放端time 推流端difference is used as the delay.

Find an online clock accurate to milliseconds: http://www.daojishiqi.com/bjtime.asp

A. Open the above webpage, and the streaming terminal is aimed at the webpage or the capture window

B. Playing end: Go to the corresponding live room to watch the corresponding time difference

Compare A and B (screens), and take screenshots to calculate the time difference.

3.png

  • Method Two:

When encoding, write the timestamp into the SEI information. After the player successfully pulls the stream, it parses the SEI information and compares the difference with the current timestamp.

4.png

SEI needs to involve the underlying development of the push-pull streaming side, so the tentatively selected method 1 is tested. The test results show that the most conservative and fast solution at this stage is to directly adjust the delay gear of the cloud live broadcast console. If you want to adjust the delay gear, you must understand the impact and changes that will be brought about after the adjustment, which involves the knowledge points related to GOP.

4.3 What are GOP and GOP cache? Why should we know about it?

As the name implies, GOP cache is a group of GOP caches stored on the CDN server. The larger the GOP cache, the greater the delay impact. By understanding the GOP cache, we can make more precise optimization in the live delay link. GOP cache is a term proposed by a certain manufacturer, and the names of major CDNs are inconsistent. Some cloud vendors also call it (RTMP, FLV, HLS...) Delay. Cooperating with the push stream GOP or the transcoding broadcast stream GOP, a complete end-to-end delay is formed.

  • GOP(Group of Pictures)

The GOP is a group of continuous video frames, usually including an I frame (key frame) and several P frames (prediction frame) and B frames (bidirectional prediction frame). During the live broadcast, if the size of the GOP is too large, it will cause the receiving end to wait for a relatively long time when waiting for the arrival of the I frame, which will increase the delay of the video. Therefore, reducing the GOP size can reduce the delay of the video to a certain extent.

  • The delay (GOP cache) configuration path of the live console (domain name configuration -> live delay configuration) option is selected

5.jpeg

  • Streaming GOP & Transcoding GOP Relationship

<!---->

    • In the case of no transcoding, the streaming GOP == the streaming GOP
    • In the case of transcoding, if the transcoding template is configured with GOP, then the GOP of streaming == the GOP of transcoding configuration
    • If the transcoding template does not specify a specific GOP, the streaming GOP == transcoded GOP

<!---->

  • The description of the delay configuration emphasizes the streaming gop. Is there any misleading problem?

It is not completely misleading. On the one hand, not all live streams are transcoded, or even GOP is modified. On the other hand, the streaming GOP may have a certain impact on streaming efficiency. The main description does not include the effects and differences of transcoding GOPs.

  • Cache unit duration or size?

The unit of live cache used by Dewu is duration

In live broadcast delay, the cache unit can be time (duration) or size (size).

Cache in units of time (duration) means that during the process of video capture, encoding, and push to the cloud server, the video data will be stored in the buffer first, and after waiting for a certain period of time (that is, the cache time), Only then will it be pushed to the CDN distribution node for playback.

The size of the cache is controlled according to the cache capacity. During the process of capturing, encoding and pushing the video to the cloud server, whenever the video data reaches a certain size, it will be pushed to the CDN distribution node for further processing. play.

In the actual live broadcasting process, two cache units, time and size, are usually combined for delay control. If you are more concerned about the delay, you can increase the buffer time and size appropriately to ensure that the receiving end has enough buffer time to play and reduce the probability of stuttering and pauses. If the real-time performance is more important, the cache time and size can be appropriately reduced to shorten the delay time and ensure the real-time performance of the live broadcast.

It should be noted that cache time and cache size are two key factors for live broadcast platforms to optimize latency. Reasonable settings can significantly reduce latency and improve user experience. But at the same time, too much or too little cache may cause certain problems, so it needs to be adjusted and optimized according to the actual situation.

  • Number of gops to cache?

It is not fixed, and there is no concept of the number of GOPs. The size is based on the length of time and depends on the buffer on the CDN side. No matter how big the buffer is, the sent data is sent according to at least one delay and at most delay + gop. Streaming data is constantly generating new data , the content keeps sliding when sending. There is no direct impact on latency.

  • base time value

RTMP: Low (2s) Medium (4s) High (8s)

FLV: Low (2s) Medium (4s) High (8s)

计算延迟方式:[RtmpDelay, RtmpDelay + GOP] 这里的 GOP,转码前用的推流设置的 GOP,转码后用的转码模板配置的 GOP自定义模版配置的 1080p,gop = 10s = 200桢  的情况下 理论上最小最大值就下面的几种范围[2,12],[4,14],[8,18]

For flv playback, the delay is set to 2 seconds, and the gop is set to 1 second. Theoretically, the end-to-end delay is basically about 3 seconds. If the bit rate is high, the delay value must be appropriately increased to ensure the smoothness of the live broadcast. In addition to the CDN cache delay, the player cache strategy also needs to be considered.

If you want to achieve a stability of 2 seconds, you can consider the ultra-low latency live broadcast solution.

5. Effective ways to reduce live broadcast delay that can be implemented in the future

  • Reduce the gopCache of the official CDN environment to a low level

After the adjustment, the end-to-end delay is expected to be reduced from 5s-8s to 3s-5s

  • The streaming GOP is adjusted to 1s, and the average end-to-end delay is reduced by another 1s

Theoretically, reducing the push GOP is helpful to the delay. If the GOP is reduced to 1 second, a key frame will be pushed every second, and the receiving end can play directly after receiving the key frame, further reducing the delay. At the same time, since more key frames are pushed per second, it will also have a positive impact on the quality and stability of the video.

The size of the push GOP is not the only factor that affects the delay of the live broadcast. There are also factors such as video encoding, streaming server configuration, and network environment that will affect the delay. Therefore, only after comprehensively considering various factors and setting the size of the streaming GOP reasonably can the live broadcast delay be minimized.

  • Increase the consumption speed of video data in the buffer to effectively reduce the delay, such as double-speed playback or direct frame loss, the strategy needs to be more refined

That is to say, increase the consumption speed on the streaming side, cooperate with the strategy of double-speed frame tracking when necessary, speed up the playback speed of the video, and start or stop it at a certain threshold.

The streaming side puts the key frame into the timestamp into the SEI information during the streaming process

After the streaming side decodes successfully during the streaming process, it parses out the timestamp in the corresponding SEI information

Then compare the difference between the two based on the online time of the server to determine the player's player speed, for example (1.0 ~ 1.4s), gradually increase and decrease, and finally stop accelerating consumption at the expected delay time

  • Confirm the current status of the self-developed player's buffer cache in seconds, and align it with competing products for at least 2s buffer

The common live broadcast player cache buffer size is 2 seconds, mainly for the consideration of reducing the probability of freezing and pause and improving user experience. The player cache buffer means that the player pre-buffers a certain amount of video data for playback. When the network condition is bad, the video streaming is interrupted or the delay is too high, the player cache will come in handy to ensure the continuity and smoothness of the playback process. Generally speaking, the player cache buffer size will vary according to factors such as network environment and bandwidth. If the cache is too small, it will cause freezes and pauses; if the cache is too large, it will increase the delay and affect real-time performance. After optimization, the cache buffer size of common live broadcast players is about 2 seconds, which can ensure the smoothness of the playback process without excessively increasing the delay. Different live broadcast platforms (PC, mobile), different networks (WIFI, 4G, 5G) and devices (different manufacturers) may have different player cache buffer size settings, so it needs to be adjusted according to the actual situation in actual use and optimization.

  • Use Alibaba Cloud's RTS or Byte's RTM protocol. If you use the ultra-low-latency solution, you need to confirm the usage scenario (for example, it is only used when there is a need for popular live broadcasts in the head)

Alibaba Cloud's RTS (Real-Time Streaming) and Byte's RTM (RTM, Real Time Media) are both ultra-low-latency commercial solutions, which have the effect of reducing the delay to <=1s. In specific application scenarios and functions Almost the same.

    1. RTS full-link delay monitoring, CDN transmission protocol transformation, UDP and other underlying technologies are excellent, which can provide low-latency streaming media data transmission and processing services, and support high concurrency, low lag, and smooth live broadcast experience in seconds.
    2. RTM transforms the link transmission protocol into UDP and other underlying technology optimization, solves the limitations of the TCP protocol itself and the delay accumulation caused by network jitter, and supports a high-quality live viewing experience with high concurrency and high reliability.

The above two commercialization solutions need to be used with the player SDK, and RTM needs to be used with the Volcano CDN environment, and the cost of the two is different. There are trade-offs to be made against our current architecture and status quo.

  • Use the QUIC protocol (the underlying UDP protocol will theoretically have lower latency), which has been verified on three-party players. Ordinary flv <=5s down to <=2s

The underlying protocol of conventional live streaming streaming is based on TCP, and the theoretical limit is about 3 seconds, which is already the lowest.

QUIC (Quick UDP Internet Connections) is a protocol based on the User Datagram Protocol (UDP), which has the following advantages over the traditional transport layer protocol (TCP) in terms of transmission, resulting in lower latency:

    1. The connection establishment time is short. The TCP protocol requires three handshakes to establish a connection, while the QUIC protocol only requires one handshake, which greatly reduces the connection establishment time and improves communication efficiency.
    2. Packet transmission methods are different. Before sending data, the TCP protocol first needs to perform a slow start process to gradually increase the sending rate and monitor network congestion. The QUIC protocol makes data transmission more efficient by dynamically adjusting the window size and transmission rate.
    3. Multiplexing support is better. The QUIC protocol supports multiplexing, which means that multiple streams can be transmitted simultaneously on a single connection, thus ensuring higher bandwidth utilization and lower latency.
    4. To reduce the dependence on network services, the QUIC protocol can quickly recover when the connection fails or the network service is unavailable, thus ensuring stable data transmission.

To sum up, since the QUIC protocol has better performance than the TCP protocol in terms of connection establishment, message transmission, multiplexing, and network fault handling, it can provide lower latency and higher speed and a more reliable connection. In addition, the use of QUIC (UDP) also needs to consider some issues comprehensively. After it brings lower latency, will there be some other aspects that will be negatively affected, such as the success rate of streaming, stability issues, and the like. Therefore, the final implementation plan needs to be tested in more detail to weigh the pros and cons.

6. Some thoughts

The live broadcast delay problem involves many factors, including buffer settings, transmission protocols, and GOP control at the streaming end and playback end. In order to solve the delay problem, in actual development, in order to achieve a better user experience, we need to comprehensively consider and optimize these factors, and find the best solution in continuous practice and experiments. Through the comprehensive use of these technical solutions, we can improve Improve the real-time performance and viewing experience of the live broadcast platform.


Activity recommendation : Dewu Technology Club is recruiting! Click to pay attention to the official account of Dewu Technology to learn more!


This article belongs to Dewu technology original, source: Dewu technology official website

Dewu technology articles can be shared and forwarded at will, but please be sure to indicate the copyright and source: Dewu technology official website

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5783135/blog/8703212