Ten minutes to understand the principle of HEVC-SCC

1. What is SCC (Screen Content Coding)

In recent years, with the rapid development of cloud computing technology and mobile Internet technology, video applications such as screen sharing, distance learning, remote video conferencing, and wireless display have become more and more popular. The video content in these applications is different from the content captured by common cameras before, and it also contains a large amount of screen content (Screen Content). Different from traditional camera content, there are a large number of static or moving computer graphics and text in the screen content, including a large number of uniform flat areas, a large number of repeated patterns, high saturation or a limited number of different color values, and image sequences Blocks or areas with the same number or letter in , and do not contain sensor noise. These characteristics, which are quite different from natural content, make screen content coding technology a new challenging problem. However, some existing video coding standards, such as the early version of H.265/HEVC (High Efficiency Video Coding) and the earlier developed H.264/AVC (Advanced Video Coding), are focused on compressing video sequences shot by cameras ( Camera-Captured Video). The results of some proposals show that adding coding tools designed for screen content characteristics on the basis of HEVC version can significantly improve the compression performance of screen content.

2. HEVC-SCC main means (tools) and typical benefits

The International Telecommunication Union (ITU), International Organization For Standardization (ISO) and International Electrotechnical Commission (International Electrotechnical Commission, IEC) jointly formulated the extended standard of High Efficiency Video Coding (HEVC) in 2016. Namely HEVC-SCC Extension.

In order to improve the encoding performance of screen content video, HEVC-SCC adds four encoding tools:

  1. Intra Block Copy (IBC)

  2. Palette Mode (PLT)

  3. Adaptive Color Transform (Adaptive Color Transform, ACT)

  4. Adaptive Motion Vector Resolution (AMVR)

These new coding tools have significantly improved coding efficiency, but at the same time, they have placed a huge burden on the coder. Therefore, it is necessary to have a clear understanding of their benefits and use them selectively in the actual implementation of the encoder.

IBC: As shown in the figure below, the typical return of the tool BD-Rate is -19.1% (YUV color format + Text and Graphics with Motion dataset + Random Access (I/P frame mixed encoding))

43bbe5fcaae02f8702a444be77fea6e7.png

PLT: As shown in the figure below, the typical return of the tool BD-Rate is -11.1% (YUV color format + Text and Graphics with Motion dataset + Random Access (I/P frame mixed encoding))

6c91c21d1d375e73d141c55b653f404e.png

ACT: As shown in the figure below, the typical return of the tool BD-Rate is -0.7% (YUV color format + Text and Graphics with Motion dataset + Random Access (I/P frame mixed encoding))

4966fdb0bde7a4da54dec641cda45874.png

AMVR: As shown in the figure below, the typical BD-Rate return of this tool is -1.5% (YUV color format + Text and Graphics with Motion dataset + Random Access (I/P frame mixed encoding))

cb3f8ab60cdb062893872c1d6db96bde.png

3. Introduction to the principles of HEVC-SCC main tools (tools)

To sum up, the latter two tools (ACT & AMVR) have very limited improvement on BD-Rate, and the amount of calculation is huge, so they are usually not considered to be used in the actual implementation of the encoder. Therefore, this chapter mainly explains the implementation principles of the first two tools (IBC & PLT). These two tools are also commonly used by various encoders and have been greatly optimized by the academic community.

IBC (Intra Block Copy, Intra Block Copy)

IBC is a coding technology similar to inter-frame prediction. Its design idea is almost the same as the principle of motion compensation. The difference is that in IB, reference blocks are generally selected in the same picture. As early as when the H.264 standard was formulated, the block replication technology had been proposed, but the test sequences at that time were all natural images captured by cameras instead of SC images. Due to the high complexity of natural images, two spatially adjacent The variation between blocks tends to be large, so the possibility of a block having an accurate predictor in the same picture is low, making the coding efficiency obtained by block replication techniques within the same picture not high. However, with the emergence and enrichment of SC video sequences, this technology has a new application direction. Different from natural images, the spatial correlation between pixels in the same picture of SC images has its unique correlation characteristics: repeated patterns and characters often appear in the same picture. For example, images with many texts such as our computer desktop, produced documents, slideshows, etc. For such SC images, it is very effective to use intra-frame image block compensation. Aiming at this SC image feature, a new generation of IBC technology was proposed. Through continuous improvement and development, this technology was finally adopted in HEVC-SCC.

In IBC, a displacement vector (called a block vector or BV) is used to represent the relative displacement from the position of the current block to the position of the reference block, as shown in the figure below. Furthermore, the reconstructed reference block in the same picture is added to the prediction error to form the reconstructed current block. In this technique, the reference samples correspond to the reconstructed samples of the current image before the loop filter operation.

cd083cbc14e57807d03865d5d64aa522.png

Since the IBC technology was proposed, it has also experienced a series of improvements and developments. When IBC technology was first proposed in HEVC Range Extensions (RExt), IBC was limited to a small area, only used for 2NX2N blocks and only one-dimensional (1-D) BV. That is, the reference block used by IBC must be within the current LCU and several LCUs adjacent to the left, and the reference block is located directly above or to the left of the current CU, so the search direction is only horizontal or vertical. Subsequently, in order to improve the coding performance of IBC, a two-dimensional (2-D) BV was proposed. That is, the two-dimensional BV allows the reference block to be located at any position between the current LCU and the left adjacent LCU, and the search direction is not limited to horizontal and vertical, and may also be oblique. IBC using 2D BV can greatly improve the coding performance due to fully utilizing the available reference area. With the further in-depth study of the characteristics of IBC technology, many new improved algorithms have been proposed, including new block vector prediction, new block vector encoding and the establishment of improved HEVC Merge candidate sets.

Finally, in HEVC-SCC, the IBC mode is regarded as a special HEVC inter-frame prediction mode, which only uses the current reconstructed image as a reference image. That is, when a block is compensated by the reference block of the current reconstructed image itself, it is coded in IBC mode. The same algorithm principle enables the HEVC-SCC IBC mode to learn from and share the design of most existing HEVC inter-frame modes.

PLT (Palette Mode, palette mode)

Palettes are an efficient way to represent blocks containing fewer distinct color values. The palette mode is different from the traditional coding method. It does not predict and convert the block, but uses a palette index (palette index) to indicate the color value of each sample. The color palette was used in the conversion of 24-bit RGB images to 8-bit images in the early days to save RAM memory or image memory cache space. A new CU-based palette coding scheme was first proposed in HEVC-RExt. After further experimental demonstration and improvement, it was finally adopted by SCC.

The palette actually refers to a table that contains all representative color values ​​from a CU encoded in palette mode, and each entry consists of RGB or YCbCr three parts. For each color value sample in the CU, there will be a corresponding color pointer in the table, and these pointers will be transferred to the bitstream. On the decoding side, the palette table and color pointers are used to reconstruct each color sample of the CU. In addition to the palette color pointer, there is also a special pointer called escape index, which is used to indicate those escape color samples (escape color) that do not belong to the palette, that is, those pixels whose number of pixels is very small Color samples that are few and far from the colors in the palette. In this case, in addition to encoding the escaped pointer, also encode the quantized value of its corresponding color sample.

Since different CUs contain different types of colors, the palette tables also have different sizes, which are equal to the number of representative color values ​​in the CU. The pointer assigned to each color entry starts from "Index 0", to "Index 1", "Index 2"... and so on, until all the representative colors in the palette are assigned a pointer. Then the escape pointer will be assigned to the escape color, so the value of the escape pointer is generally the largest, as shown in the figure below.

d858aab7c1a26d97218765c09c43840b.png

If the palette is regenerated for each CU, the amount of calculation will be very large, so the palette of the current CU is generally predicted by the palette of the adjacent coded CU. Specifically, in the palette prediction process, a palette predictor is first generated, and the predictor includes each color entry of the palette corresponding to the CU that has been previously encoded by the palette. For each entry in the palette predictor, a flag is enabled to determine whether to reuse that color entry for the current CU's palette.

Palette predictors are not static, but are constantly updated. First, on the first CTU of each slice or slice, the palette predictor is initialized. Subsequently, as the CU continues to encode the palette and generate the palette table, there will always be a situation where the color entries in the palette predictor cannot satisfy the palette generated by the current CU, then the current palette will Add new representative colors to , and then these new representative colors will be recorded in the palette predictor. This is how the palette predictor is updated.

After the palette table is generated and assigned the palette pointers, these pointers will be mapped to the original CU, and the colors in the CU that are similar to each entry of the palette are represented by the pointers assigned to each color entry to form a palette Swatch pointer illustration. Then, the palette pointer map is scanned by horizontal scanning or vertical scanning, and these pointers are encoded by two encoding modes of COPY_LEFT_MODE and COPY_ABOVE_MODE.

4. Summary of SCC Status

  1. SCC is introduced as an extension of the HEVC standard (non-standard requirement). The cloud-side implementation & optimization solution is very mature and widely used.

  2. HEVC-SCC on the mobile phone side is limited by performance and power consumption, and has not been widely introduced (Snapdragon 8Gen2, Dimensity 9000)

  3. The next-generation encoding protocol AV1 led by Google has adopted SCC as a standard, and the mobile phone side can already decode SCC content efficiently, but there is still a long way to go before generating SCC content (tool hardening & power consumption optimization).

2d017cd2b7ba9601ca59eca367770a55.gif

Long press to follow Kernel Craftsman WeChat

Linux Kernel Black Technology | Technical Articles | Featured Tutorials

Guess you like

Origin blog.csdn.net/feelabclihu/article/details/131255643