Research on H.266/VVC rate control algorithm for video conferencing scenarios

Research on H.266/VVC rate control algorithm for video conferencing scenarios

Personal summary

Paper title

Research on H.266/VVC rate control algorithm for video conferencing scenarios

publish journal

Master's Electronic Journal

author

Yu Donghang

Publication date

2022 -5-25

Reading date

2023.8.3

ScoreScore _

type Ideas annotation
Research Background What is the main content of this article? What is the current status of research? As people's demand for high-definition video quality becomes higher and higher, existing video compression technology needs to be further optimized to adapt to the increasing application requirements. The code rate control module of VVC is divided into the same layers as the HEVC code rate control module, and the ones with practical research value are mainly the frame layer and the LCU layer. Therefore, in recent years, many experts' research on the video coding rate control module mainly focuses on the frame layer. and LCU layer are carried out in two parts .
methods and properties What kind of tasks is it for? How did the author collect data? When and where was this research carried out? What kind of model or method did he propose? Test object: Select the video sequences recommended in the VTM general test environment. Each sequence is classified according to different resolutions. The algorithm selects the B-type test sequences MarketPlace, RitualDance, Cactus and BQTerrace; the C-type test sequences BasketballDrill, BQMall, RaceHorses and PartyScene; the D-type test sequences BQSquare, BlowingBubbles, BasketballPass and RaceHorses; and the E-type test sequences FourPeople, Johnny and KristenAndSara.
Research result How has the model effect been improved? (Efficiency or accuracy or something else) The relative error value of the total average code rate of all sequences of the algorithm is 0.434%, and the code rate control accuracy is better than 0.435% under the adaptive setting. The average peak signal-to-noise ratio of the test sequence under the algorithm is improved by 0.028dB compared with the VTM10.0 code rate control algorithm. The final average BDBR of the algorithm is reduced by 0.86% on average compared to the BDBR of VTM10.0.
Innovation What is the main contribution or innovation of this paper? Is his innovation based on a previous model or theory? The gray level co-occurrence matrix is ​​introduced to calculate the relevant feature values ​​that reflect the texture complexity of the encoded frame, which is used to adjust the frame layer image weight; when calculating the LCU weight, the weight is recalculated by introducing the optimal Lagrange multiplier. Improve the accuracy of weight assignment.
in conclusion What did the author learn from this?
Research prospects Any implications or suggestions for future research? The rate control algorithm of H.266/VVC only considers related algorithm research from the GOP layer, frame layer and LCU layer, and does not consider the correlation of bit allocation in the lower layer of code rate control. Therefore, future work can further consider the relationship between CU division and bit weight allocation of rate control, and construct a smaller-sized coding block layer as a new unit layer for rate control.
importance Why is this research important? The code rate control technology enables the video to adaptively adjust the coding parameter values ​​during the coding process, and maximize the utilization of the communication channel while ensuring the coding quality. Rate control is of great significance in video encoding and video communication applications.
ideas and questions What are your thoughts and questions?
Excellent expression in this article What are the key points of reuse?

Summary

Video coding is an effective measure to improve transmission efficiency and reduce data storage pressure. The new generation video coding standard H.266/VVC (Versatile Video Coding) introduces new compression technology into each coding module, which greatly improves video coding efficiency and can be widely used in high-definition and ultra-high-definition TV, telemedicine, video conferencing, etc. In application. After the outbreak of the new crown epidemic at the end of 2019, video conferencing has become the main way for remote meetings of enterprises and institutions and remote teaching in schools. Facing a large number of conference video users and different conference scenarios, how to ensure the communication quality of the video, especially the coding quality of the Region of Interest (ROI), has become the current research focus in the field of video coding. Code rate control technology can not only generate a code stream that matches the transmission bandwidth, but also ensure the quality of the main coding area through bit allocation adjustment. It is an indispensable and important module of the video communication system. Because the VVC rate control algorithm does not fully consider the content characteristics of the encoded frame and the spatio-temporal domain complexity of the Largest Coding Unit (LCU), resulting in errors in bit allocation, the performance of the rate control algorithm has room for further optimization . This paper optimizes the code rate control algorithm for the problems existing in the bit allocation process of the VVC code rate control algorithm in the frame layer and the LCU layer. On this basis, the target bit adjustment based on the largest coding unit is carried out for the ROI in the video conference scene, which improves the subjective quality of video coding.

This paper proposes a rate control algorithm based on video content-related feature values ​​to address the problem that the H.266/VVC rate control algorithm does not comprehensively consider the actual texture features of conference video encoding frames. First, by introducing the gray level co-occurrence matrix, the relevant eigenvalues ​​reflecting the texture complexity of the encoded frame are calculated, which are used to adjust the frame layer image weight; then, the λ parameter of the LCU layer is recalculated based on the R-λ model to adjust the weight. Allocate the size, and continuously update the parameter value according to the actual consumed bits during the encoding process to improve the accuracy of LCU layer bit allocation. After testing, under the Low Delay-P (LDP) configuration, compared with the adaptive weight allocation algorithm, the rate-distortion (RD) performance of this algorithm is improved when it is closer to the target code rate. 0.86%, improving the subjective and objective quality of the video sequence.

In order to improve the coding quality of conference video ROI, this article first detects conference video ROI based on LCU and marks it with the calculated significance value, and then uses Sobel gradient detection operator and MSE (Mean Square Error) bit detection The algorithm detects the texture complexity and encoding cost of LCU respectively, and jointly weights the complexity factor and encoding cost to construct a new weight factor based on the new Chongqing University of Posts and Telecommunications Master's Thesis Abstract II weight factor for encoding LCU bits The weights are adjusted and distributed. At the same time, the target bit allocation is adjusted through the saliency value of the LCU to ensure the encoding quality of the ROI and achieve the purpose of optimizing the subjective and objective quality of the conference video scene. Experimental results show that under LDP configuration, compared with the adaptive rate control algorithm, the relative error of rate control of the algorithm proposed in this article is reduced by 0.011% on average, and the rate-distortion performance is improved by 1.87%, improving the coding performance of conference videos.

Why rate control is needed

During the video encoding process, if a fixed encoding parameter value is used to encode a video sequence, the code stream output by the encoder will fluctuate depending on the amount of information contained in each frame and the complexity of the content. If the output code stream is too large, it may exceed the capacity of the sender's buffer, resulting in excessive video transmission delays or even frame loss; if the output code stream is too small, the network communication channel will not be fully utilized. Utilization wastes transmission resources, and the video quality obtained by the decoding end is poor, and phenomena such as image blurring and block effects will occur in the decoded video. Therefore, it is necessary to control the bit rate during the video encoding process so that the number of bits of the encoded code stream matches the upper limit of the bandwidth of the transmission channel. At the same time, the quality of the transmitted video must be taken into consideration to make the encoded image distortion as small as possible.

The key to rate control

The key point of code rate control technology is to obtain the value of the quantization parameter (QP) through target bit allocation, and then adjust the output code rate to achieve the purpose of controlling the code rate.

Research background on conference video rate control

For videos in specific conference scenarios, adding rate control technology can not only ensure that the video output code stream is compatible with the network bandwidth, but also ensure that the distortion of the video is minimized as much as possible and improve the quality of the reconstructed video. Considering the scene nature of conference video, people will focus on the area of ​​interest when receiving video information. Before carrying out the bit rate control process, first distinguish the area of ​​interest and the non-interest area. In the bit rate control link, focus on ensuring the encoding quality of the area of ​​interest in conference videos, so that the output code stream of the video can be balanced to the greatest extent. Subjective and objective quality, so the H.266/VVC rate control algorithm for conference video has important research value

Research status of video conferencing system

In conference application scenarios, conference videos often have a large number of fixed background areas, and the texture complexity of the background area is uncertain. For this type of video, people usually focus on faces, or screen content such as lecture PPT, so the foreground area is mostly the face area and the screen content area. The traditional video coding method ignores the visual characteristics of the human eye and simply performs resource allocation and bit rate control based on the texture complexity of the entire video coding unit. For video sequences with complex background areas, this coding method will lead to subjective quality degradation. decline. Therefore, for conference video coding, if the ROI coding method is introduced, the video coding quality can be improved

Problems with the current code rate control algorithm based on the R-λ model

First of all, in the research on the frame layer rate control algorithm, the VVC rate control algorithm takes into account the distortion value of the actual encoding result, and uses the R-λ model as the derivation benchmark to correct the parameter update formula, which improves the accuracy of parameter update. However, during the video encoding process, the VVC rate control algorithm does not fully consider the texture characteristics of the encoded frames in an image group, so the coding rate distortion performance and visual experience at the frame layer need to be improved.

The second is the bit weight allocation at the LCU layer. VVC allocates the weight of the LCU layer based on the total target bits of the current frame and model parameters. It does not take into account the spatial texture characteristics of the LCU layer image in the same frame, the relationship in the time domain at the same position, and the actual coding consumption bits and the target bits. Error relationship, so the bit allocation mechanism of the LCU layer still has the possibility of improvement.

Furthermore, the code rate control algorithm combined with ROI has the scalability to further improve the video encoding quality. The code rate control algorithm combined with ROI can significantly improve the subjective quality, and allocate key bits and weights to ROI, while for Non-ROI bit allocation does not take key weight considerations, and ultimately a good visual experience can be achieved even when the objective indicators slightly increase or even decrease.

The article mainly focuses on two optimization algorithms
  1. Frame-layer bit allocation and R-λ model optimization of conference videos. Currently, there have been many studies related to frame-layer rate control algorithms. Aiming at the rate control of conference video frame layer, this paper proposes a rate control algorithm based on video content-related feature values. In order to solve the problem that the original platform code rate control algorithm does not perform preprocessing and analysis on the frame to be encoded, this paper introduces the gray level co-occurrence matrix to calculate the relevant feature values ​​that reflect the texture complexity of the encoded frame, which is used to adjust the frame layer image weight; in When calculating the weight of LCU, the weight is recalculated by introducing the optimal Lagrangian multiplier to improve the accuracy of weight distribution. The algorithm improves the accuracy of rate control and the subjective and objective quality of video sequences under the same rate control configuration.
  2. The maximum coding unit layer ROI code rate control optimization algorithm is oriented to conference videos. This article uses the image edge detection operator to calculate the complexity factor of the LCU layer pixels, and calculates the mean square error of each LCU of the previous encoding frame and the actual code consumption based on statistics. The number of bits is used to calculate the encoding cost of the LCU to be encoded. The complexity factor and coding cost of the LCU to be encoded are used as the texture complexity description information of the coding block in the conference video, and are jointly weighted into the LCU layer bit allocation model to construct an LCU layer target bit allocation method. Based on the LCU, the saliency value of the conference video is calculated and the ROI is marked, and the target bit allocation is adjusted according to the ROI saliency value to improve the coding performance of the conference video and the coding performance of the ROI.

Overall H.266/VVC encoder workflow

First, the encoder divides the acquired source video image into blocks, then weighs the rate-distortion performance of the coding block under intra-frame/inter-frame coding conditions, selects the appropriate division mode, and sends the divided blocks to intra-frame/inter-frame The prediction module performs predictive coding. If intra-frame prediction coding is performed, the pixel values ​​of the coding area are predicted through adjacent coded pixel points, or areas similar to the current coding area in the current coding frame are searched through intra-frame motion estimation, and the searched areas are subjected to motion transformation. Obtain the current coding area; if inter-frame prediction coding is performed, motion estimation is used to search for coding areas similar to the current coding area in the reference frame, and the searched areas are motion compensated to obtain the predicted pixel values ​​of the current coding area. Next, the prediction residual value is calculated through the original value and the predicted value. In order to make the energy distribution of the prediction residual value more concentrated, H.266/VVC performs DCT transformation on the prediction residual value, and then performs DCT transformation on the transformed matrix coefficients. Quantization processing makes low-frequency coefficients smaller and most high-frequency coefficients are 0, thereby greatly compressing the amount of data to be transmitted. The final quantized data is transmitted to the decoder in the form of a video code stream after entropy coding. After receiving the code stream, the H.266/VVC decoder reconstructs the video frames in a certain order.

Optimization Algorithm 1: Code rate control algorithm based on video content-related feature values
Frame level target bit allocation

gray level co-occurrence matrix

For images with slow texture changes, the values ​​on the diagonal of the gray-level co-occurrence matrix are larger; for images with faster texture changes, the values ​​on the diagonal of the gray-level co-occurrence matrix are smaller, and the values ​​on both sides of the diagonal are smaller. The value is larger.

image-20230804110808109

Taking point (1,1) as an example, a GLCM (1,1) value of 1 means that only a pair of pixels with a gray level of 1 are horizontally adjacent. The GLCM (1, 2) value is 2 because there are two pairs of pixels with gray levels 1 and 2 adjacent horizontally.

Adjacent generally takes four directions, horizontal, vertical, and two diagonally opposite directions.
image-20230807201611566

LCU layer target bit allocation

image-20230807201645314
image-20230804190959020
The optimization is to estimate the Lagrange multiplier

Algorithm flowchart

Algorithm experiment test results

The relative error value of the total average code rate of all sequences of the algorithm in this chapter is 0.434%, and the code rate control accuracy is better than 0.435% under the adaptive setting. The average peak signal-to-noise ratio of the test sequence under the algorithm in this chapter is improved by 0.028dB compared with the VTM10.0 code rate control algorithm. The final average BDBR of the algorithm proposed in this chapter is reduced by 0.86% on average compared to the BDBR of VTM10.0.

Optimization Algorithm 2: Conference video rate control algorithm based on area of ​​interest

Since the smallest unit of target bit allocation in the code rate control algorithm is the LCU, this chapter first uses the LCU as the basic unit to divide the area of ​​interest in the conference scene video into the face area and the screen content area, and calculates the area of ​​interest by using joint features. significance value. Then the Sobel gradient detection operator and the mean square error bit detection algorithm are used to detect the texture complexity area and coding cost area respectively, and the complexity area factor and coding cost area factor of the image are jointly weighted to construct a new weight factor , and reasonably adjust and accurately allocate the target bits of each LCU. Finally, the bit allocation of the LCU is adjusted based on the saliency value of the region of interest to optimize the coding quality of the region of interest.

Guess you like

Origin blog.csdn.net/Aure219/article/details/132111967