H.264 encoder basic principles

1. EncoderInsert image description here

Related concepts

The encoder uses a hybrid coding method of transformation and prediction:采用混合编码时候主要将图像分为固定大小的块,对块进行压缩编码处理。

1.混合编码: That is to combine predictive coding and transform coding.
  1.1 预测编码:
    1.1.1 Intra-frame prediction: Mainly based on the different sensitivities of the human eye to brightness at different frequencies. Subtract the actual pixels from the reference pixels to obtain the difference. At the receiving end, the predicted value is added to the difference to obtain the actual pixel value. The main solution is the space redundancy problem.
    1.1.2 Inter-frame prediction: Exploiting the correlation between frames, that is, encoding using prediction modes of encoded video frames/fields and block-based motion compensation. Inter-frame correlation is stronger than intra-frame correlation. The main solution is the problem of time redundancy.
  1.2 变换编码: Including such as KL transform, discrete cosine transform (DCT), etc. It is mainly transformed through the high-frequency and low-frequency parts of the image.
    1.2.1 KL transformation: Transform each row of the image array
    1.2.2 DCT transformation: Generally, the image is divided into multiple blocks, and DCT transformation is performed in units of blocks.

1. A video field or frame can produce a coded image.
2. By comparing the current frame with the previous frame, the motion vector MV is obtained. The compared MV is input into the motion compensation prediction period to obtain the predicted image.

The concept of motion compensation

1. Obtain the motion vector MV by comparing the current frame with the previous frame, and input the compared MV into the motion compensation prediction period to obtain the predicted image.
2. Predicting the current frame using forward reference frames is called forward motion compensation, using backward reference frames to predict the current frame is called backward motion compensation, and using forward and backward simultaneous predictions is called bidirectional prediction motion compensation. 3. Overlapping block motion
compensation OBMC, when using OBMC, the prediction of a pixel is not only based on the MV estimate to which it belongs, but also based on the adjacent MV estimate.

The concept of motion estimation, the concept of motion vectors

1. In inter-frame prediction, due to the correlation between adjacent frames, the moving image can be divided into several blocks/macroblocks, and the position of each block or macroblock in the adjacent frame image can be searched to obtain the spatial position of the two. The offset is the motion vector MV. This process is motion estimation.

Entropy coding concept CAVLC (Context-Adaptive Variable Length Coding)

1. Coding that uses the statistical characteristics of the source for code rate compression is called entropy coding, also called statistical coding. Entropy coding mainly includes 变长编码和算术编码。
2. Entropy coding is a lossless compression coding method. The code stream it generates can be decoded without distortion. Recover the original data
3. The input parameters of entropy coding 帧内/帧间预测残差经过变换-量化后的系数矩阵。are: A basic way of data compression is to remove the correlation between the symbols of the source and make the sequence memoryless as much as possible, that is, the appearance of the previous symbol does not affect the subsequent ones. The probability of any symbol appearing.

CAVLC basic principles

根据已编码句法元素的情况动态调整编码中使用的码表, has a very high compression ratio. CAVLC is used to encode brightness and chrominance residual data. CAVLC makes full use of the characteristics of the residual data after integer transformation and quantization for compression, further reducing redundant information in the data.

CAVLC context model

The choice of context model in CAVLC is mainly reflected in two aspects:非零系数编码所需表格的选择以及拖尾系数后缀长度的更新。

Quantitative concept

Because the sampled pulse signal is discrete in time, but it is still continuous in amplitude in space, that is, it has an infinite number of possible values, so its possible values ​​must be rounded off from the infinite number. One becomes a finite number, that is, 将这种信号幅值由连续量变为离散量的过为量化。
PCM encoding is performed after quantization, and the quantized signal is expressed as 0,1.

The functionality of H.264 is divided into two layers

视频编码层(VCL)和网络提取层(NAL,Network Abstraction Layer)。VCL data is the output of the encoding process, which represents the compressed and encoded video data sequence. Before VCL data is transmitted or stored, these encoded VCL data are first mapped or encapsulated into NAL units.

Compressed frames are divided into: I frame, P frame and B frame

I frame: key frame, using intra-frame compression technology.
P frame: Forward reference frame. During compression, only previously processed frames are referenced. Using inter-frame compression technology.
B frame: bidirectional reference frame. During compression, it refers to both the previous frame and the frame behind it. Using inter-frame compression technology.

Filtering concept

The image will appear blocky after the codec inverse transform quantization. There are two reasons for this: ①基于块的帧内和帧间预测残差的DCT变换。变换系数的量化过程相对粗糙,因而反量化过程恢复的变换系数带有误差,会造成在图像块边界上的视觉不连续。
②The second reason comes from motion compensation prediction.运动补偿块可能是从不是同一帧的不同位置上的内插样点数据复制而来。因为运动补偿块的匹配不可能是绝对准确的,所以就会在复制块的边界上产生数据不连续。

coding process

1. By collecting video frames and sending them to the H264 buffer, the encoder divides each picture in the video frame into macro blocks. In some complex parts, it can also be subdivided into smaller sub-blocks for processing. .
2. Select intra-frame compression or inter-frame compression:
  2.1 帧内压缩: By 像素值经过一维/二维的预测函数subtracting each pixel value of the original image and the predicted image from the current predicted pixel, the residual value is obtained, and the prediction information is temporarily stored first. For 残差数据进行变换编码(DCT transformation/KL transformation, mainly to remove the spatial redundancy within the block, that is, the correlation between pixels), 变换后还需要经过量化change the pulse signal from infinite to finite, that is, reduce unnecessary information in visual recovery, and then process this data 进行熵编码( CABAC compression), which is a lossless compression encoding method, (this process mainly gives short codes to high-frequency data, long codes to low-frequency data, and compression based on context correlation). The compressed data will be temporarily stored in the NAL unit.
  2.2帧间压缩: After the encoder groups the frames, the encoder will take out the first two frames of data from the buffer in sequence and perform macroblock scanning. When one of the objects is found and the object is also found in another frame, the calculation will be performed 运动矢量. After obtaining the motion vector MV (obtaining the motion estimation of the motion vector), subtract the same parts of the two frames of images to obtain the compensation data, which is temporarily stored in the NAL unit, and then the motion vector and compensation data are sent to the decoder for decoding. .

2.Decoder

Insert image description here
1. The input parameter is NAL compressed storage data. After entropy decoding, a set of quantized transformation coefficients X is obtained, and then through inverse quantization and inverse transformation, the residual Dn' is obtained. Using the header information decoded from the bit stream, the decoder generates a prediction block PRED. After the PRED generated by the decoder is added to the residual Dn', uFu' is generated. After filtering, the filtered block is finally obtained. The final Fn' is the final decoded output image.

Guess you like

Origin blog.csdn.net/weixin_43917045/article/details/126712054