H.266/VVC Technology Learning 51: Motion Vector Refinement at Decoder (DMVR)

In order to increase the accuracy of the merge mode MV in the bidirectional prediction operation, the motion vector refinement at the decoder side is applied in the VVC. Search for more accurate MVs around the initial MVs in the reference picture list L0 and the reference picture list L1. This method mainly calculates the distortion between two candidate blocks in the reference picture list L0 and the list L1.

As shown in the figure below, compare the initial MV (black block) to calculate the SAD between each surrounding MV candidate (red block) . The MV candidate with the smallest SAD is the better MV, and the bidirectional prediction signal is generated.

Insert picture description here
Generally speaking, it is to find a better MV around the predicted MV to replace it.

1 Conditions of use

In VVC, DMVR can be applied to CU coded with the following modes and functions:
1. The merge mode of bidirectional prediction;
2. Two reference frames are different (one in front and one back);
3. The distance between the two reference frames and the current picture The same (that is, the POC difference is equal);
4. Both reference frames are short-term reference frames;
5. CU has more than 64 luminance pixels (the side length is 4, which is too small to be used);
6. CU height and CU width All are greater than or equal to 8 luminance samples;
7. BCW weights are equal;
8. WP is not enabled for the current block;
9. CIIP mode is not used for the current block

2 Different functions of the two MVs

The improved MV obtained through DMVR processing will be called the refined MV in the following.
1. The original MV is used in the DBF process, and also used in the spatial motion vector prediction of future CU coding.
2. The refined MV is used to generate inter-frame prediction samples, and is also used for temporal motion vector prediction for future picture coding.

3 Use process

In DVMR, the search point revolves around the initial MV and follows the mirroring rules. That is, any point checked by DMVR (represented by the candidate MV pair (MV0, MV1)) follows the following two formulas:
Insert picture description here

3.1 Search process

Where MV_offsetrepresents the offset between the initial MV and the refined MV in one of the reference frames. The refinement search range is two pixel accuracy from the initial MV. The search includes an entire pixel search stage and a sub-pixel refinement stage.

1. Whole pixel search stage : 25 points full search. First calculate the SAD of the initial MV pair. If the SAD of the initial MV pair is less than the threshold, the entire pixel search of the DMVR is terminated. Otherwise, the SAD of the remaining 24 points will be calculated and checked in the raster scan order. The point with the smallest SAD is selected as the output of the integer sample offset search stage.
In order to reduce the loss of uncertainty in DMVR improvement, it is proposed to use the original MV in the DMVR process. The SAD between the reference blocks referenced by the initial MV candidate is reduced by 1/4 of the SAD value.

2. Sub-pixel refinement stage : Sub-pixel refinement follows the entire pixel search. Based on the output of the entire pixel search stage, conditionally call sub-pixel refinement: when the entire pixel search stage ends at the center with the smallest SAD in the first iteration or the second iteration of the search, the fractional sample refinement is further applied .
In order to save computational complexity, the sub-pixel refinement is obtained by using the parametric error surface equation (the following figure) instead of performing additional searches through SAD comparison.
Insert picture description here
Where (xmin, ymin) corresponds to the minimum cost score position, and C corresponds to the minimum cost value. By solving the above equation using the cost values ​​of five search points, (xmin, ymin) can be calculated as:
Insert picture description here
Since all cost values ​​are positive and the minimum value is E(0,0), the values ​​of xmin and ymin will be automatically limited Between 8 and 8. This corresponds to a half-pixel shift with 1/16 pixel MV accuracy in VVC. The calculated decimal (xmin, ymin) is added to the integer distance refinement MV to obtain a sub-pixel precise refinement increment MV.

3.2 Bilinear interpolation and pixel filling

In VVC, the resolution of MV is 1/16 pixel accuracy. An 8-tap interpolation filter is used to interpolate samples with sub-pixel accuracy. In DMVR, the search point surrounds the initial sub-pixel MV and has an integer sampling offset. Therefore, for the DMVR search process, it is necessary to interpolate the samples at the decimal position. In order to reduce the computational complexity, the bilinear interpolation filter is used to generate sub-pixel precision samples for the search process in DMVR. Another important effect is that by using a bilinear filter, within the search range of 2 samples, DVMR will not use more pixels than the normal motion compensation process. After obtaining the accurate MV through the DMVR search process, a conventional 8-tap interpolation filter will be used to generate the final prediction.
In order not to allow more reference pixels to access the normal motion compensation process, the pixels that are in the bounds will be filled with pixels that are out of bounds. These pixels are not necessary for the interpolation process based on the original MV, but for the interpolation based on the refined MV. Required for the interpolation process.

4 block limit

When the maximum DMVR processing unit has a width and/or height greater than 16 luminance samples, it will be further divided into sub-blocks with a width and/or height equal to 16 luminance samples. The maximum unit of the DMVR search process is 16x16 .

Guess you like

Origin blog.csdn.net/weixin_42979679/article/details/103260000