[Netease Yunxin] Analysis and practical experience of common problems on the playback side of live broadcast scenes

Common playback process

 

Analysis of the main process of the player

The playback process of the player is similar to the streaming process, but the order is reversed. The streaming terminal first collects audio and video, encodes and encapsulates audio and video, and processes them according to the streaming media protocol to finally obtain the output stream. The player parses and decapsulates the input stream to obtain audio packets (such as AAC) and video packets (such as H.264, H.265), and decodes them to obtain audio frame PCM and video frame YUV. Finally, the player needs to render and play the video and audio through the audio and video synchronization module, and finally present it to the user.

From the analysis of the thread model, the core thread of the playback engine can usually be divided into " decapsulation thread (reading thread) ", " audio/video decoding thread ", and " audio/video playback thread ". In addition, it is better for the playback engine and upper-layer applications With the cooperation, the playback engine generally maintains a "message thread" to report to the upper layer such as "playback failure error code, video/audio first frame playback time, statistical data during playback (such as first frame time-consuming, audio/video code rate, audio and video reception frame number/second, audio and video decoding frame number/second, audio and video playback frame number/second, playback freeze times, playback freeze duration, audio and video synchronization clock difference, etc.)".

It can be seen from the thread model that the core threads of the playback engine are a very significant producer-consumer model. This model can efficiently utilize the advantages of CPU multi-core parallelism, and through reasonable media buffer management, it can resist network and decoding jitter to a certain extent, and avoid frequent freezes.

The common problems during the playback process mainly focus on problems such as inability to play, no sound or picture, blurred screen, freeze, and high delay . Usually, the troubleshooting link is carried out backwards by the playback side. Below we will analyze some common problems during the playback of live streams based on some cases we have accumulated in our work.

The resolution is too small to cause hardware decoding failure

Phenomenon: Customer feedback that during the live broadcast, the screen suddenly freezes, but the sound is normal.

Analysis: Through the analysis of the playback log and video stream dump, we found that when the problem occurred, the player was using Android hardware decoding, and the resolution of the pulled video stream changed (1080p->16p). We locked the problem by consulting the data The root cause is that Android's MediaCodec codec supports a range of resolutions that vary from device to device. You can view it by viewing /vendor/etc/media_codecs_xxx.xml in the device, such as media_codecs_mediatek_video.xml of the device "M2006C3LC". The decoding description of H264 is as shown in the figure below. The resolution supported by the "OMX.MTK.VIDEO.DECODER.AVC" The range is (64*64) - (2048*1088), so there may be some unpredictable problems when parsing videos with a resolution of 16*16.

 

Practical experience: Android hardware capabilities for adaptation

1.  Compared with IOS devices, Android devices are more fragmented . Adapting to the different capabilities of different models can greatly reduce the above similar problems. Fortunately, the MediaCodec interface provides developers with a relatively comprehensive Capability Query interface, such as some interfaces provided in MediaCodecInfo.VideoCapabilities:

public Range<Integer> getBitrateRange ()
public Range<Integer> getSupportedFrameRates ()
public Range<Integer> getSupportedHeights ()
public boolean isSizeSupported (int width, int height)
...

This part of the interface provided by MediaCodec can help us better choose whether to use a hard decoder or a soft decoder according to the device capability when initializing the codec/decoder, and avoid decoding failures and fallbacks caused by device capability issues as much as possible. question. For the encoding side, it mainly includes whether MediaFormat is supported, whether Profile/Level is supported, whether the encoding resolution is supported, whether the encoding bit rate is supported, whether the encoding frame rate is supported, whether the number of encoding instances exceeds the maximum supported range, etc. The decoding end is similar to the encoding end.

2. Maintain a blacklist: Enabling video hardware encoding/decoding can reduce the CPU burden to a certain extent and improve video performance. However, due to the serious fragmentation of Android, many encoding and decoding capabilities of MediaCodec are implemented by device manufacturers. It is difficult to achieve consistency in performance. From our past cases, we can find that the hardware encoding and decoding capabilities of some models are very poor, and the root cause of the poor performance cannot be found, so we can choose to use a blacklist for maintenance.

3. Improve the hardware decoding fallback soft solution mechanism: relatively common hardware decoding problems such as hardware decoding takes a long time, hardware decoding API errors, etc. The errors reported by the MediaCodec interface call are generally relatively easy to capture, and you can use the captured error information Carry out discrimination, roll back software decoding processing for key errors or report to the application layer for processing, but there is no error reported in the interface call but the problem of long decoding time will still greatly affect the user experience, so for the case of excessive decoding time It is also necessary to roll back and report to the application layer.

The screen blur problem caused by incorrectly updating the sps/pps parameters

Phenomenon: During the test of our project, we encountered the problem of hardware decoding blurred screen in scenes such as pk.

Analysis: After our tests, we found that the soft solution does not cause blurred screen, but the hardware decoding will cause blurred screen. By using ffmpeg dump to analyze the code stream data of the receiving end, we found that when the blurred screen occurs, the sps/pps of the video has changed, and the problem is very serious. It is clear that it is a typical decoding blurred screen problem caused by incorrectly updating sps/pps.

Practical experience: Under normal circumstances, the sps/pps usually does not change during a single stable stream push. Generally, when the video resolution changes, the original ijk playback engine is in the hardware decoding module. Supports sps/pps updates when switching resolutions. However, the business of some streaming terminals may be more complicated, involving dynamic switching of encoders, etc., which will also cause changes in sps/pps when the resolution remains unchanged. Therefore, from the perspective of versatility, each time the video header information is received, it should be compared with the current video header information, and updated in time when changes occur, to avoid problems such as blurred screens.

Blurred screen problems caused by missing key frames and missing reference frames

Phenomenon: The screen is blurred and recovers after a period of time.

Analysis: Through the analysis of the code stream dump, we found that there is no problem with the code stream. Later, through the analysis of the log, we identified the root cause of the problem: "The decoding screen is blurred due to the loss of the reference frame" .

Practical experience: Huaping problems often occur in scenes such as missing key frames and missing reference frames. At any time, after encoding and before decoding, it is not recommended to discard video frames, otherwise it may cause blurred screen problems at the decoding end. In the live broadcast scene, in order to eliminate the accumulated delay, some players will choose to discard some undecoded frames in the frame buffer. In this case, a more reasonable strategy is to discard an entire GOP, or use the The double-speed playback method is used to catch up with the delay, and the double-speed strategy is stopped when the delay meets the requirements of the live broadcast. In addition, in order to avoid blurred screen problems caused by such problems as much as possible, key frame detection should be performed in scenarios such as the first broadcast, network disconnection and reconnection, and decoding should start from the key frame.

Unexpected playback problems caused by switching audio and video operations on the streaming end

Phenomenon: During the project test, the audio and video on the playback end suddenly stopped playing.

Analysis: By analyzing the code stream at the receiving end, we found that the audio media stream suddenly stopped during the live broadcast. Through investigation, we found that the streaming end supports the dynamic shutdown function of audio/video media streams. This function will cause the audio or video media stream of the receiving end to be interrupted suddenly, and conflict with the existing logic of the playing end, resulting in some unpredictable problems.

Practical experience: If the streaming end supports dynamic closing of audio and video streams, it is relatively difficult to adapt the player. Generally, the scenarios that need to be adapted include but are not limited to: "Audio scene followed by video " , Scenarios such as "Audio first followed by video" , "Sudden interruption of video stream" , "Sudden interruption of audio stream" and other scenarios. It is relatively easier for the playback side to adapt to scenarios where audio and video streams increase. It only needs to dynamically detect the number of media streams in the decapsulation module to determine whether there are new streams added. However, for scenarios where audio/video media streams are suddenly interrupted It is very difficult to adapt. On the one hand, as a general playback engine, it does not interact with the signaling logic of the streaming end. On the other hand, the scene may have many common logics with the player, such as audio and video synchronization (video Synchronous audio, audio synchronous video), buffering logic, etc. have serious conflicts and cause problems. For example, the default synchronization strategy of the general playback end is video synchronous audio, but when the audio stream is suddenly interrupted, we need to perceive that the audio stream on the push end has stopped. Close and adjust the audio and video synchronization strategy of the player. Therefore, it is difficult to be compatible with these scenes from the perspective of the player, but from the perspective of the streaming end, you can choose to add silent frames/fakeVideo video data when the audio/video media stream is closed, so that the media stream can be guaranteed. The continuity, so that it is compatible with most players from the streaming side.

In addition, we have also found in practice that in the scenario where audio/video streams are dynamically switched on and off at the streaming end, even if the playback end is compatible, whether the server or CDN manufacturer supports it will be a big question mark, especially in the application In the scenario where the hls protocol pulls streams, because compared to other streaming protocols, the hls protocol also involves slicing logic on the server side, and it is likely that the hls slicing logic of the server or CDN manufacturer is not compatible with the operation of switching audio and video.

 

Video or audio cannot be decoded due to missing audio header/video header

Phenomenon: Customer feedback that a certain video stream can be pulled through rtmp, but cannot be pulled through hls.

Analysis: By obtaining the source stream address of the customer, test the performance of rtmp and hls streaming through ffplay respectively. Among them, the hls streaming has no sound, and no audio information has been parsed. The code stream analysis of the rtmp streaming Dump It can be seen that although the audio can be played normally, the audio header information is missing.

 

Practical experience: The problem caused by the lack of video/audio headers cannot be solved through the adaptation of the playback end, and the problem may be hidden and difficult to find. The streaming end should check whether it is correct in each situation of the supplementary pronunciation/video header Audio header and video header information are sent, such as audio and video configuration changes, network disconnection and reconnection, and re-push and other scenarios.

Playback exception caused by discontinuous timestamps

Phenomenon: Audio and video are out of sync or stuck.

Analysis: Timestamp anomalies can be roughly divided into two categories. The first category is that the audio and video timestamps are not synchronized, that is, the span is large. The second category is the problem of backtracking audio and video timestamps, such as dts/pts suddenly starting from 0. Under normal circumstances, in order to prevent such situations such as being stuck when such abnormalities occur, the streaming end will be compatible with scenarios with discontinuous timestamps. The general solution may include dropping frames or abandoning the original clock synchronization strategy according to their respective playback frames. Play at intervals. Although this solution can continue to play when the audio and video timestamps are abnormal, it may cause problems such as out-of-sync audio and video, and freezes.

Practical experience: The playback side of this problem can be compatible to a certain extent, but the root cause is that the streaming side needs to ensure the synchronization of audio and video timestamps.

Commonly used tools for analyzing problems

  • ffmpeg, ffplay, ffprobe code stream Dump/stream media playback/view media information, etc.

  • elecard analyzer analyzes code stream content.

  • mediaInfo Information format analysis of media files.

  • flvAnalyser analyzes FLV format encapsulation information.

  • YUV Eye plays YUV data.

Guess you like

Origin blog.csdn.net/netease_im/article/details/131829930