Notes: New Generation Video Compression Coding Standard-h.264/AVC

Chapter One Introduction:
    Source model: 1. Waveform-based coding. Waveform-based coding uses a block-based hybrid coding method that combines predictive coding and change coding.
                      2. Content-based coding. The encoding method adopted by mpeg4 is a block-based hybrid encoding method and a content-based encoding method.
Chapter 2: Digital Video
    Sampling theorem: When the online frequency of the input analog signal is fe, as long as the repetition frequency fs of the sampling pulse us(t) is not lower than twice fc, it is always enough to recover the original analog from the discrete signal after sampling without distortion Signal.
PCM: Pulse Code Modulation - PCM encoding.
The PAL system adopted in my country stipulates 25 frames per second for color TV, and the NTSC color TV system adopted by the United States and Japan 30 frames.
The most commonly used test standard is Peak Signal-to-Noise Ratio (PSNR). Generally speaking, the higher the PSNR, the higher the video quality, and vice versa.  
Mean Squared Error (MSE)
Chapter 3 Basic Principles of Video Compression Coding
Predictive coding: 1. Intra predictive coding
                  2 Inter-frame predictive coding
Transform coding: 1 KL transform
                  2 Discrete Cosine Transform (DCT)
Comparison between transform coding and predictive coding: the implementation of transform coding is more complicated, and the realization of predictive coding is relatively easy, and the error of predictive coding will spread. Transform coding will not spread errors, and its influence is only limited to one block. In reality, a hybrid coding method is often used, that is, the interframe prediction coding with motion compensation is performed on the image first, and the DCT transformation is performed on the predicted residual signal .
Entropy coding: Entropy coding, also known as statistical coding, is the coding that uses the statistical characteristics of the source to compress the code rate. There are two kinds of variable length coding (Huffman coding) and arithmetic coding commonly used in video coding.
Chapter 4 Introduction to Video Coding Standards
Two famous institutions: MPEG (moving picture expert group) and ITU-T (International Telecommunications Union)
AVS standard (standard in China)
Chapter 5 Principle of H.264/AVC Encoder
The basic part of H.264 supports 3 different grades of application
(1) Basic grade: mainly used for "video call", such as conference TV, videophone, telemedicine, and distance teaching.
(2) Extended grade: mainly used for network video streaming, such as video on demand
(3) Main grade: Mainly used in consumer electronics applications, such as digital TV broadcasting, digital video storage, etc.
In inter-frame predictive coding, the predicted value PRED (P) is obtained by motion compensation (MC) on the previously coded reference image in the current slice, where the reference image is denoted by F`n-1. In order to improve the prediction accuracy and thus the compression ratio, the actual reference image is selected from the past or future (referring to the real order) encoded, decoded, reconstructed and filtered frames.
    After the prediction value PRED is subtracted from the current block, a residual block Dn is generated. After block transformation and quantization, a set of quantized change coefficients X is generated. After entropy coding, some header information required for decoding (prediction mode quantization Parameters, motion vectors, etc. together form a compressed code stream, which is used for transmission and storage through NAL (Network Adaptation Layer).
    The encoder must have the ability to reconstruct the image. Therefore, it is necessary to add the D`n obtained after inverse quantization and inverse change of the residual image to the predicted value P to obtain uF` (unfiltered frame). In order to eliminate the noise generated in the encoding and decoding loop, improve the image quality of the reference frame, and thus improve the performance of the compressed image, a loop filter is set. The filtered output F`n is the reconstructed image, which can be used as a reference image.
frame and field
Frame encoding should be used for images with less activity or static images, and field encoding should be used for moving images with greater activity.
Macroblock, slice: A macroblock consists of a 16*16 brightness pixel and an additional 8*8Cb and 8*8 color pixel block.
The I slice value includes the I macro slice, and the P slice includes P and pixels as a reference for intra-frame detection. The decoded pixels in other slices cannot be used as a reference for intra-frame prediction.
The P macroblock uses the previously coded image as a reference image for interframe prediction, and an intraframe coded macroblock can be further divided into macroblocks.
B macroblocks use bidirectional reference pictures.
grades and grades
1 Basic level: Support intra-frame and inter-frame coding by using I slice and P slice, and support entropy coding (CAVLC) using context-based variable length coding of catalpa photography, mainly used for videophone, conference TV, wireless communication, etc. Real-time video communication.
2 Main grades: Support interlaced video, adopt B-slice inter-frame coding and weighted prediction inter-frame coding; support CABAC, mainly used for digital broadcast TV and digital video storage.
3 Extended profile: support effective switching between code streams (sp and SI), improve bit error performance (data segmentation), but do not know interlaced video and CABAC.
Encoded data format
1 H.264 video format
H.264 supports 4:2:0 continuous even interlaced video encoding and decoding.
The function of H.264 is divided into two layers: video coding layer (VCL, video coding layer) and network abstraction layer (NAL, Network Abstraction Layer). The VCL data is the output of the encoding process, which represents the video data sequence after being compressed and encoded.
Each NAL unit includes a raw byte sequence (RBSP, raw byte sequence payload) and a set of NAL header information corresponding to video coded data.
In H.264, you can choose from up to 15 parameter images to select the best matching image.
For the prediction of inter-coded macroblocks and macroblock partitions in P slices, the parameter images can be selected from Table 0, and for the prediction of inter-coded macroblocks and macroblocks and macroblock partitions in B slices, the reference images can be selected from tables 0 and 1 .
5.3.5 Slices and slice groups
1. Chips: There are 5 different types of coding chips, IPB SP SI
Selection and coding of intra prediction mode
H.264 adopts the Lagrangian rate-distortion optimization (RDO, ) strategy to select the optimal coding mode. By traversing all possible coding modes, the minimum distortion cost mode is finally selected as the best intra prediction mode .
5.5 Inter Prediction
There are three methods for B-film prediction: one forward and one backward, two forwards, and two backwards.
5.6 SP/SI Technology
The basic principle of SP frame coding is similar to that of P frame, and it is still based on motion compensation predictive coding of inter-frame prediction. The difference between the two is that SP frame can reconstruct the same image frame with reference to different reference frames. Using this point, SP frame It can replace I frame and is widely used in applications such as inter-stream switching, splicing, random access, fast forward and rewind, and error recovery.
SI frame: Spatial prediction frame.
When the contents of the video streams are the same but the encoding parameters are different, it is more effective to use the SP frame, and when the contents of the video streams are very different, it will be more effective to use the SI frame.
5.8 CAVLC : Context Adaptive Variable Length Entropy Coding.
Entropy coding is a lossless compression coding method, and the monkeys generated by it can recover data without distortion after decoding. Entropy coding is based on the statistical properties of random processes.
5.9 CABAC : Context Adaptive Binary Arithmetic Entropy Coding
Conclusion: Compared with other mainstream entropy coding methods, CABAC has higher coding efficiency. Using a set of video images with a quality of 28-40DB for actual measurement, the application of CABAC can further increase the bit rate by 9%-14%.
5.11 Deblocking filter : DCT transform, the quantization process of transform coefficients is relatively rough, so the transform coefficients restored by inverse quantization have errors, which will cause visual discontinuity on the image block boundary. The second comes from motion compensated prediction.
5.12 IDR images are generally I slices or SI slices. When receiving an IDR picture, the decoder immediately marks the picture in the buffer as "non-reference", and subsequent slices are coded without picture reference. Usually the first picture of a coded video sequence is an IDR picture.

Chapter 6 Syntax and Semantics of H.264

In H.264, the biggest difference in the hierarchical structure is that the sequence layer and the image layer are canceled at the same time, and most of the syntactic elements that originally belonged to the sequence and image headers are freed to form two-level parameter levels of sequence and image, and the rest are placed Into the sheet. A parameter set is an independent unit of data.

Only when the encoder thinks that the content of the parameter set needs to be updated, will it send a new parameter set, because the parameter set is independent. It can be retransmitted multiple times or protected by special techniques.

The first picture of a sequence is called an IDR picture (even if the decoding refreshes the picture), and the IDR picture is an I picture. H.264 introduces an IDR picture as a decoding resynchronization. The queue is cleared, all the decoded data is output or discarded, the parameters are searched again, and a new sequence is started. In this way, if a major error occurs in the transmission of the previous sequence, such as severe packet loss or other reasons causing data misalignment, it can be resynchronized alive here. Pictures after an IDR picture will never refer to data from a picture before the IDR to decode.

The difference between an IDR image and an I image is that an IDR image must be an I image, but an I image is not necessarily an IDR image. There can be many I images in a sequence, and the images after the I image can refer to the images before the I image for motion reference.

In order to improve coding efficiency. H.264 transmits the actual width of the image minus 1.

When the data is stored on the medium, the start code is added in front of each NAL: 0x000001.

Chapter 10 Scalable Coding of H.264

The scalability of video coding includes temporal scalability, spatial scalability, and quality scalability.

Temporal scalability: refers to decomposing the video stream into information representing different frame rates, in which the basic layer retains the information of the lowest frame rate, and the frame rate increases as the number of frames increases, so that users can watch more coherent Smooth picture.

Spatial scalability: refers to decomposing the video stream into information representing different resolutions, in which the basic layer retains the information of the lowest resolution, and the resolution increases as the number of layers increases, so that users can watch more delicate images picture.

Quality scalability: Decompose the pixel value into different levels. In the basic layer, each pixel has only a small bit rate, and the picture quality is rough. As the number of layers increases, the bit rate of each pixel also increases, thus It can display richer image content.

Guess you like

Origin blog.csdn.net/Doubao93/article/details/118259383