ICIP2020: Two-step progressive intra prediction for VVC

 

​This article comes from ICIP2020 article "TWO-STEP PROGRESSIVE INTRA PREDICTION FOR VERSATILE VIDEO CODING"

Because VVC's intra prediction only uses local reference pixels, it cannot handle complex textures. In this paper, local and non-local correlations are combined in intra prediction to further reduce redundancy.

motivation

Since low-frequency coefficients are usually too large, many bits are needed for encoding. If the low-frequency coefficients can be estimated more accurately, the coding efficiency will be further improved.

  • Template matching (TM)

Because it is difficult to use local information to recover low-frequency information, many non-local search algorithms have been developed to improve prediction efficiency. Template matching is one of the potential technologies. It uses neighborhood reconstruction pixels as a template to search for a similar non-local block.

  • Intra block copy (IBC)

The IBC technology is similar, it uses a block vector (Block Vector, BV) to represent the prediction block. Compared with TM, IBC can find more accurate prediction blocks but also requires more bits to represent prediction information.

By combining neighborhood reconstruction pixels and high-frequency coefficient reconstruction local pixels as a template, a more accurate non-local similar block can be searched without BV transmission. Then, low-frequency coefficients can be predicted by non-locally similar blocks.

Method of this article

In VVC, intra-frame prediction uses only neighboring pixels, which results in spatial redundancy that cannot be fully eliminated. This paper proposes a two-step progressive intra prediction method that combines local and non-local content to generate more accurate prediction results.

Fig.1 is the framework of the algorithm in this paper. First, a preliminary prediction block is generated using prediction-based template matching (TMP). Then the residual coefficients in the first and second frequency bands in the scanning sequence are set to zero, and then the remaining coefficients are inversely quantized and inversely transformed, so that a reconstruction block is generated. Then, the reconstructed block and its neighboring reference pixels are combined to form a new template and use TMP again. The result of the second time is used as the actual prediction result of the current block, and the coefficients of the first and second frequency bands will be updated. During transmission, it is necessary to transmit a flag bit for each PU to indicate whether to enable this function.

Prediction-based template matching (TMP)

Template matching (TM) technology is to find a structure on the reference image that is most similar to the template image. This article uses TM technology to obtain the first prediction block, and uses the upper reference pixels, the upper left reference pixels and the left boundary pixels to construct a template (such as the L-shape in Fig.1).

Because too many search locations lead to high TM complexity, this article limits the search area and divides the search area into 4 parts. As shown in Fig.2.

Select 3 matching candidates with the smallest MSE in each search area. Use the weighted average of the three best matching blocks in the same area as the first prediction of the current block, as shown in the following formula,

Among them, P_i1, P_i2, and P_i3 are the corresponding reconstruction blocks of the three best matching items in the i-th area, and E_i1, E_i2, and E_i3 are the corresponding MSEs. P_s1i is the first prediction result.

When all regions have been searched, the region with the smallest MSE of the predicted pixel and the original pixel is selected as the best region. The corresponding prediction is expressed as P_s1, and the label of the area is transmitted through the code stream. Then perform TMP in the selected area, and the decoder can obtain the prediction result.

Progressive quadratic prediction

For complex texture content or noise, the residual error will be large when the L-shaped template is used for TMP. In order to improve the prediction accuracy, this paper proposes progressive prediction, using the high-frequency coefficients of the first prediction to assist the second prediction, as shown in Fig.3.

In the first step, the input pixel is subtracted from the first predicted pixel P_s1 to obtain the residual C. The residual is transformed and quantized to obtain the quantized coefficient. Then the low-frequency coefficients are set to zero, and the remaining coefficients are used for reconstruction. Obtain reconstruction block T_c.

In the second step, the L-shaped reconstruction template T_r and T_c are combined to form the final template T_w. Use T_w template for TMP, as shown in Fig.1. In order to reduce the complexity, only the optimal region in the first prediction is searched. Choose the position with the smallest E_w as the best matching position,

E_r and E_c are the MSE between the template and T_r and T_c, respectively. The reconstructed pixel at the optimal position is denoted as P_s2, which is the final prediction result of the current block.

Finally, update the low frequency coefficient. The second prediction result P_s2 is transformed and quantized to obtain a new coefficient C'. The low frequency part in C'is combined with C to generate final quantized coefficients, which are transmitted to the decoding end. The final reconstructed pixel is reconstructed through the new predicted pixels and coefficients.

It can be seen that the second prediction uses not only the adjacent reconstruction information but also the texture information of the current block, so only predicted blocks with similar textures can be better matched. In this way, the residual value can be reduced to reduce the number of bits required to encode the residual coefficient. Using the second prediction result as the final prediction result can also reduce quantization errors.

Fig.4 shows the number of bits and prediction accuracy occupied by coefficient coding in different frequency bands. The prediction accuracy refers to the consistency between the best matching position and the search position when the corresponding frequency band is forced to zero. In order to balance the prediction accuracy and the number of bits, this article only zeroes the first and second frequency bands.

Experimental results

The coding tool uses VTM7.0, uses all intra (AI) configuration, only encodes the first 200 frames of the test sequence, and QP selects {22,27,32,37}. The test results are shown in Table 1.

It can be seen that the algorithm in this paper achieves a gain of 0.87% under the AI ​​configuration, especially for the SCC sequence (Class F) when the IBC is turned on, the gain reaches 1.31%.

Fig.5 shows the search results. The red, green, and blue boxes represent the blocks coded by this algorithm, the blocks for primary and secondary prediction respectively. It can be seen that the second prediction can modify the result of the first prediction to get more similar blocks.

If you are interested, please follow the WeChat public account Video Coding

Guess you like

Origin blog.csdn.net/Dillon2015/article/details/110235575