Record and summary of video blurring, freezing and delay problems encountered in the localization system

Table of contents

1. Overview of localization system

1.1. Localized operating system and localized CPU

1.2. Localized server operating system 

1.3. The mainstream configuration of the current localization system

2. Video decoding blurry screen and freeze problem

2.1, video decoding blurred screen

2.2. Video decoding freezes

2.3. Explanation about I frame and P frame

3. The slow processing speed of the domestic graphics card leads to the image freeze problem

3.1. Analysis of Video Delay and Caton Causes

3.2. The efficiency of SDL2 library running on Jingjiawei's domestic graphics card is very low

3.3. Use the method of drawing frame to play to solve this kind of problem

3.4. Regarding the lip sync problem in audio and video playback

3.5. Problems with localized chips

4. Finally


VC++ common function development summary (column article list, welcome to subscribe, continuous update...) https://blog.csdn.net/chenlycly/article/details/124272585 C++ software exception troubleshooting series tutorial from entry to mastery (column article list , Welcome to subscribe, keep updating...) https://blog.csdn.net/chenlycly/article/details/125529931 C++ software analysis tools from entry to mastery case collection (column article is being updated...) https:/ /blog.csdn.net/chenlycly/article/details/131405795 C/C++ basics and advanced (column articles, continuously updated...) https://blog.csdn.net/chenlycly/category_11931267.htmlIn        recent years , with the continuous deepening of the localization process, many IT manufacturers have successively launched software products and systems that support localization systems, and we are no exception. We have also participated in many localization projects, and successively launched A complete set of solutions for the system. Recently, when testing the localized software in the localized desktop system, I encountered many video codec and playback problems. I participated in the discussion and investigation of such problems. I will make a detailed record and summary here, and hope to give you Provide a drawing or reference.

1. Overview of localization system

       The problem in this article lies in the localized PC, so let me give you a detailed introduction to the content related to the localized system.

1.1. Localized operating system and localized CPU

       When it comes to localized systems, it generally involves two major parts, one is the localized operating system, and the other is the localized CPU. Both of these two major parts have made great progress, and a number of localized manufacturers have emerged. At present, the mainstream localized operating systems mainly include Kirin's bid-winning Kirin and Galaxy Kirin systems, and Tongxin Software's UOS system. These system vendors provide desktop and server versions of the operating system. These localized operating systems are all developed from the Linux system, and are essentially Linux systems.

       The mainstream CPUs include Loongson CPU (based on the domestically developed LoogArch architecture), Phytium CPU (based on ARM architecture), Zhaoxin CPU (based on authorized X86 architecture), and Huawei Kunpeng CPU (based on ARM architecture) . CPUs are available in desktop and server versions. The information of several mainstream localized CPUs is given below:

1) Loongson CPU: MIPS architecture was adopted in the early stage, and LoogArch architecture was developed by itself later. The most powerful one is Loongson 3A5000 series, 12nm process, which was manufactured by STMicroelectronics in the early stage, and later handed over to TSMC for 12nm.

2) Phytium CPU: adopts ARM architecture, there are desktop version and server version. The latest desktop version is D2000 series, which adopts 16nm process and is manufactured by TSMC.

3) Huawei Kunpeng CPU: It adopts ARM architecture and is mainly used in servers. The latest model is Kunpeng 920, 7nm process, manufactured by TSMC.

4) Zhaoxin CPU: adopts X86 architecture, the most powerful processor is KX-U6780A, 16nm process, manufactured by TSMC.

5) Haiguang CPU: adopts X86 architecture, mainly used for servers, the latest is Haiguang 7000 series, 14nm process, the foundry is Samsung and GlobalFoundries.

6) Shenwei CPU: It adopts the Alpha architecture, and later developed the SW instruction set by itself. The latest one is Shenwei SW26010 series, which adopts 28nm process and is mainly used for supercomputers. It is manufactured by SMIC.

1.2. Localized server operating system 

        For the deployment of localized servers, Great Wall servers with built-in localized systems and localized CPUs are mainly used. Huawei also provides Taishan servers that support localization . This series of servers mainly uses Huawei's self-developed Eular server operating system and Huawei Kunpeng CPU . For localized server systems, in addition to Kirin, Tongxin UOS and Huawei Eular systems, you can also choose to use Tencent's TencentOS system and Ali's Annolis system .

       Over the years, most IT vendors have chosen the open source and free CentOS system for their server operating systems. However, Red Hat has previously announced that it will stop maintaining CentOS, which means that CentOS will no longer be updated iteratively. When using CentOS, you will encounter system and kernel issues When there is a problem, there is no longer a team to maintain and solve it.

       In order to cope with the dilemma caused by the cessation of maintenance of CentOS, the three major domestic IT manufacturers Huawei, Tencent and Ali stepped up and successively launched a domestic free and open source server operating system evolved from open source Linux and open source CentOS: Huawei Euler ( Eular) system, Tencent TencentOS system and Ali dragon lizard (Annolis) system . These server operating systems have made a lot of optimization and improvement on the basis of the original open source system code, and established an open source community to cooperate with domestic manufacturers to develop and expand the system ecology. At present, many IT manufacturers have migrated their server operating systems to these domestic systems. For example, many manufacturers are now using Huawei's Euler server system.

1.3. The mainstream configuration of the current localization system

       The current mainstream desktop localization PC mainly uses the solution of the winning Kirin/Galaxy Kirin/UOS desktop operating system + Phytium CPU/Loongson CPU .

       The mainstream localized servers usethe combination of the winning Kirin/Galaxy Kirin/UOS/Euler server system + Loongson CPU/Phytium CPU/Kunpeng CPU . Among them, the Kunpeng CPU is dedicated to Huawei and is not open to the public (only used in Huawei products), and it is bound to the Huawei Taishan server. To use the Kunpeng CPU, you need to purchase Huawei's Taishan server, which uses the Huawei Euler system.

For domestic server CPUs, through actual measurements, the performance of Huawei Kunpeng CPUs is higher. In some projects that require higher performance, Huawei will choose Taishan servers with built-in Kunpeng CPUs and Euler systems.

2. Video decoding blurry screen and freeze problem

        When testing the client software on a domestic desktop PC, it was found that there was an obvious blurred screen problem when the video was decoded and played, which was a serious problem.

The current localized software runs in the localized system, and mainly uses the open source SDL2 to achieve video rendering. On the Linux localized system platform, SDL2 internally uses opengl for rendering.

2.1, video decoding blurred screen

       By checking the print log, it is found that the video images captured by the USB camera have obvious frame loss problems. There is a frame drop, so there is a blurry problem. Plug the currently used USB camera into the Windows PC, use the amcap tool to view the video capture parameters of the camera, and find that the camera will encode and compress the image data after capturing the image internally, and supports MJPG and H264 encoding formats, as follows Show:

In the problematic scenario, the default H264 encoding format is used, and the output video data in this encoding format has the problem of frame loss.

2.2. Video decoding freezes

        In order to solve the blurred screen problem caused by video frame loss, change the strong solution mode to wait for I frame playback mode . In the waiting I-frame playback mode, if a video frame is found to be lost, it will not decode and draw until a new I-frame is received. After changing to waiting for I frame mode, although there is no blurred screen problem, there is a serious video freeze problem. Because when there is a frame loss in the video, the video image will no longer be decoded and displayed, and will not be drawn until a new I frame is received. During this period of time, the previous image is always displayed, and a new image will be drawn when a new I frame is received. , so it caused the video freeze problem during this time period. When the remote end finds a frame loss when receiving the video data sent by the local end, it will actively request an I frame from the local end, but this request for an I frame is only a temporary remedy, and there will still be obvious freeze problems when frequent frame loss occurs.

       The root cause of video blurring and stuttering is the frequent frame loss of the video output by the USB camera . Therefore, to solve these two problems, we still need to find a solution from the source (camera). So I tried to change the video encoding format of the camera to MJPG. After re-running, I found that the output image quality in this encoding format is better, and there is no frame loss problem. In this way, there will be no blurred screen or freezing problems when decoding and playing videos.

2.3. Explanation about I frame and P frame

       I frame is intra-frame coding, and an I frame is a complete image; P frame is inter-frame coding, and the P frame stores the content that changes relative to the previous frame. When drawing each frame of image, the current P The frame and the complete image after the last superimposition are superimposed again to form the current complete frame of image (the complete video image is obtained by superimposition and then drawn). The complete image after each superimposition shall be stored in the memory, so that the complete image can be superimposed when the next P frame is received.

       If a P frame is lost in the middle, it may not be possible to superimpose a complete image after receiving the next P frame. In the forced decoding mode, there may be a problem of blurred screen. For the waiting I frame mode, if the video frame data receiving end finds that there is a frame loss, it stops drawing the image and waits for a new I frame to be drawn. And when the video receiving end finds that there is a video frame loss, it will actively request an I frame from the video sending end, so that the I frame can be received as soon as possible, and the new image will be drawn as soon as possible to maintain the continuity of image playback.   

       For a detailed description of I frames and P frames, see this article:

H264 (1) I/P/B frame GOP/IDR/ and other parameters https://blog.csdn.net/weixin_39369053/article/details/105747624 https://blog.csdn.net/weixin_39369053/article/details/105747624

3. The slow processing speed of the domestic graphics card leads to the image freeze problem

       I thought the above problems were over here, but after observation, it was found that the video image still has the problem of freezing, and there is an obvious delay.

3.1. Analysis of Video Delay and Caton Causes

       According to the printing, it takes about a few seconds from receiving video data (received video data after remote encoding and compression) to the completion of decoding and drawing. This delay is a bit exaggerated (you can see it by waving at the camera The video has a noticeable delay)! By analyzing the code, it is found that the delay may be caused by the slow decoding and drawing speed due to insufficient graphics card performance, and the video freeze may be caused by video frame loss.

       So why is there video frame data loss? Further analysis of the code found the answer. The video data processing module opened two threads, one thread was used to receive video data frames, and put them in a buffer queue after receiving them, and the other thread took out video data frames from the buffer queue (encoding compression After the video data), the data is first decoded, and then the decoded video data is drawn into the video window (the video is displayed in the video window). The effect diagram of the two threads operating the data queue is as follows:

       Due to the limited performance of the graphics card, the speed of decoding and display is affected (the bottom layer will use the graphics card to render and draw when drawing video images), resulting in a slow speed of the thread processing decoding and display. At the same time, the video frame receiving thread is constantly receiving data, while the queue The length is limited (for example, the upper limit of the number of stored frame data is more than 100 frames, or the memory of the buffer queue is limited), causing the video frame data in the queue to be discarded before being processed (discarding the old video frame data , to make room for new video frame data). Because there is a video frame loss, the current mode of waiting for I frame decoding is used, which causes no decoding and playback for a short period of time, and does not play and draw until the next frame is received, so the video freezes.

        How do you know that the processing speed of the decoding playback thread is slow? In fact, it is very simple, you can see it by adding time-related print logs. In the log printing system, generally each print will carry time information, which is convenient for analyzing problems, such as conveniently checking the execution time and speed of the code, and quickly viewing the printing logs near the time when the exception occurred.

3.2. The efficiency of SDL2 library running on Jingjiawei's domestic graphics card is very low

       In a domestic project before, I also encountered the problem of obvious delay in video decoding and playback. At that time, the problem was investigated for a long time, and finally I suspected that it was caused by the insufficient performance of the graphics card. The scene at that time was that almost all localized computers in the customer environment were fine, but one localized computer had this video lagging problem . Later, the customer's localized computer provider took the initiative to replace the graphics card for the customer, and there was no problem anymore. This domestically produced computer with problems, used the domestically produced graphics card of Jingjiawei Company before, and later replaced it with an AMD graphics card. It seems that there is still an obvious gap in performance between the domestically produced graphics card and the top AMD graphics card. !

        For this problem of video playback delay lag, if it can be reproduced, it should have been exposed in the company's test environment (testers should have reproduced it during the test process), but this problem has never occurred before . Through the customer's case of this problem, we know why the problem was not exposed in the company's test environment, because the graphics cards in the domestically produced computers used in our company's test environment were all AMD. The reason for the video playback delay and freeze problem encountered in our test environment this time is that the localized equipment used this time is relatively special, and the domestic JM7200 graphics card of Jingjiawei was used, so the same problem as the project customer appeared. .

Use the lshw command        in the Terminal command line to view the graphics card information currently used on the localized machine. The specific command format is: Ishw -c display . For example, the AMD graphics card information is as follows:

description: VGA compatible control
product: Caicos [Radeon HD 6450/7450/8450R5 230 0EM] [1002:6779]
vendor: Advanced Micro Devices, Inc.  [AMD/ATI]  [1002]

The graphics card information of Jingjiawei is as follows:

description: VGA compatible controller
product: JM7200 [731:7200]
vendor: JingJla Micro, Inc. [JJM] [731]

The corresponding Jingjiawei graphics card model is JM7200, and the problematic computer uses Jingjiawei’s graphics card. Combined with similar situations encountered in previous projects, it can be basically concluded that it is the SDL2 open source library currently used for video rendering. It is caused by low running efficiency on Jingjia micro graphics card.

From the running phenomenon, it should be caused by the low efficiency of the open source SDL2 on Jingjiawei’s domestic graphics card. It may be that Jingjiawei’s support for the OpenGL framework on Linux is not good, or there is a problem with Jingjiawei’s graphics card driver, or The performance of Jingjia micro graphics card is not enough, or the SDL2 open source library does not support domestic graphics cards enough. What is the specific reason? It will take time to study later. You can also feed this problem back to Jingjia Micro, and let them analyze the reason!

3.3. Use the method of drawing frame to play to solve this kind of problem

       The current problem is that the processing speed of the decoding and playback thread is obviously slower than that of the video data receiving thread (much slower), resulting in the full video frame data buffer queue, causing some video frame data received earlier and not processed in time to be kicked out of the queue out discarded:

The slow execution speed of the decoding and playback thread is due to the insufficient performance of Jingjiawei's domestic graphics card.

       In order to solve the current video freeze delay problem, after discussion, it was decided to adopt the method of frame extraction. If the video frame data buffer queue reaches a certain upper limit (set a threshold, such as 10), the frame extraction mode will be turned on. After receiving two frames of data, only one frame is played, which reduces the occupation of the processing power of the graphics card. You can adjust the queue upper limit threshold according to the test effect to achieve a better playback effect. But at present, this processing method is only a method of evasion, not a fundamental solution, but for this kind of special localized machine, it seems unrealistic to ask them to replace the graphics card with a non-domestic AMD graphics card, and they can only use This kind of compromise.

       In addition, there will be certain problems when playing video by frame picking. For example, when the video frame rate is low (maybe caused by a bad network), for example, the frame rate is only 10 frames, and only 5 frames of video are played, and there will be image discontinuity. Ton question. But this is also something that can’t be helped. The frame-by-frame playback is to ensure the real-time performance of the video playback. Real-time performance is more important than video freeze.

3.4. Regarding the lip sync problem in audio and video playback

       In fact, there is another problem, that is, the lip sound is not synchronized, because the execution of the video playback thread is relatively slow, and the execution of the audio playback thread is very fast, causing the video playback speed to be significantly slower than the audio playback speed. If the audio and video are not played synchronously control, there may be problems with lip synchronization.

        The issue of lip sync is not the same in different application scenarios. For example, in the field of video players and live broadcasts, more attention will be paid to video and sound. If there is a problem of lip sync, it will be uncomfortable to watch and the experience will be poor. For example, when we watch movies and videos, if lip sync is out of sync, watching It will be painful. But in video conferencing, more attention is paid to the voice communication of each participant, and the video attention is not very high. Even if there is a problem of lip synchronization, it is tolerable. But if the video in the meeting is recorded as a video file, it will be uncomfortable to watch if the lip sync is out of sync in the recorded video.

       Therefore, in many players, strict lip synchronization control is performed based on time stamps (the ffplay player in the open source FFmpeg does lip synchronization control), but in some audio and video software, lip synchronization control is not performed, and audio and video come , In terms of decoding and playback, the speed of video decoding and playback is generally slower than that of audio decoding and playback (video encoding and decoding algorithms and video playback are much more complex than audio, and the processing time is much longer), and it is generally more prone to the problem of lip synchronization.

3.5. Problems with localized chips

       This problem also shows the gap between the domestically produced graphics chip (Jingjia Micro Graphics) and the top foreign graphics chip (AMD Graphics), whether it is performance, stability, or maturity, there should be a certain gap. Therefore, there is still a long way to go for localized chips! Currently, Huawei HiSilicon is currently the best in the field of chips in China. Whether it is functional completeness, performance, or maturity, it is the top of the country.

       In the past, the main control chip of Huawei HiSilicon was basically used in the field of audio and video applications. Later, due to sanctions, Huawei HiSilicon chips could not be produced and could not be supplied, so they had to switch to domestic second-tier manufacturers, such as Cambrian and Rockchip, but these The manufacturer is limited by technical level and industry experience, and there is a clear gap with Huawei HiSilicon in many aspects, such as poor stability, insufficient performance, insufficient functional completeness, and slow processing speed, etc. We are deeply touched by these.

There is no way to use Huawei HiSilicon chips. We can only optimize and improve their chips with these second- and third-rate manufacturers. This process is painful, but in the long run, it will have a great impact on the development of domestic chips. The big benefit is that domestic chips need everyone to support their use, so that they can be continuously optimized, improved and perfected.

4. Finally

       This article records the video playback problems encountered in the localization project from the perspective of a non-audio and video codec developer. It aims to understand these contents as basic audio and video knowledge and common sense, and it is inevitable that there will be imprecise or inaccurate words You are welcome to criticize and correct in the comment area, and you are also welcome to supplement relevant details in the comment area.

We develop business software related to audio and video. We often deal with audio and video codec developers, and we often work with them to troubleshoot various audio and video related problems and software crashes. We are more interested in the basic knowledge and business processes in the audio and video field. I am interested, and I also hope to get in touch with and learn some audio and video related knowledge by cooperating with audio and video codec development colleagues to troubleshoot problems.  

Guess you like

Origin blog.csdn.net/chenlycly/article/details/132368200