H264 compression principle

1. Why the compressed raw data generally adopts YUV format

(1) Utilizing the physiological characteristics of people's perception of pictures, they are more sensitive to luminance information and less sensitive to chrominance information, so video coding is to encode the Y component and UV component separately, and can reduce the UV component.

2. Video compression principle

(1) Spatial redundancy: the correlation between adjacent pixels of the image, for example, after a frame of picture is divided into multiple 16x16 blocks, there are many obvious similarities between adjacent blocks.

(2) Temporal redundancy: Two pictures with a relatively close time difference have less change.

(3) Visual redundancy: Our eyes are less sensitive to certain details, less sensitive to high-frequency information in the image than low-frequency information, and can remove some high-frequency information.

(4) Coding redundancy: the probability of occurrence of different pixels in a picture is different. For pixels with more occurrences, use fewer bits to encode, and for pixels with fewer occurrences, use more bits to encode Encoding, which can reduce the size of the encoding. Like Huffman coding.

3. The type of image frame (I frame, P frame, B frame)

I-frames: keyframes

P frame: predicted frame

B Frames: B Frames Bring Encoding Delay

4、GOP(group of pictures)

The first image of a sequence is called an IDR image. IDRs are all I-frame images, and gop refers to the distance between two IDRs.

The larger the gop length, the higher the video compression rate

5. H264 mainly uses the following methods to compress video data, including:

(1) Intra-frame prediction compression solves the problem of spatial data redundancy.

(2) Inter-frame prediction compression (motion estimation and compensation) solves the problem of temporal data redundancy.

(3) Integer Discrete Cosine Transform (DCT) transforms the spatial correlation into irrelevant data in the frequency domain, and then performs quantization.

(4) CABAC compression

Divide macroblocks :

H264 uses a 16X16 size area as a macroblock by default, and it can also be divided into 8X8 size

After dividing the macroblock, calculate the pixel value of the macroblock

Calculate each picture in the buffer area of ​​the H264 encoder in turn

Divide into subblocks:

The 16X16 macroblock can also be divided into sub-blocks, and the size of the sub-blocks can be 8X16, 16X8, 8X8, etc.

Frame grouping:

After the macroblocks are divided, all the pictures can be grouped.

The algorithm is: in several adjacent images, there are generally only points with differences within 10% of the pixels, the difference in brightness does not exceed %2, and the change in chroma difference is only within 1%. We believe that such Graphs can be grouped into groups. In such a group of frames, after encoding, we only keep the complete data of the first frame, and other frames are calculated by referring to the previous frame. We call the first frame IDR/I frame, and the other frames are P/B frame, the encoded data group is called gop

Motion Estimation and Compensation:

After the frames are grouped in the H264 encoder, it is necessary to calculate the motion vector of the objects in the group. After the motion vector is calculated, the same part is subtracted to obtain the compensation data. We call the motion vector and compensation inter- frame compression technology . It solves the data redundancy in time.

Intra prediction:

In addition to the temporal redundancy mentioned above, there is also spatial redundancy. The human eye is very sensitive to low-frequency brightness, but not very sensitive to high-frequency brightness. Intra-frame prediction technology is proposed .

The residual value is obtained by subtracting the original image from the intra-predicted image .

Do DCT on the residual data:

After intra-frame and inter-frame compression, there is still room for compression.

The residual data is subjected to an integer discrete cosine transform to remove the correlation of the data and further compress it.

CABAC:

The above intra-frame compression belongs to lossy compression, and lossless compression can also be performed, such as Huffman coding, which gives a short code to high-frequency words and a long code to low-frequency words.

Guess you like

Origin blog.csdn.net/weixin_43004377/article/details/125668913