H264 encoding principle

Preface

H264 is a new generation of coding standard. It is known for its high compression and high quality and support for streaming media transmission on multiple networks. In terms of coding, I understand his theoretical basis: referring to the statistical results of images over a period of time, it is shown that In several images, the pixels that are generally different are only points within 10%, the brightness difference does not exceed 2%, and the chromaticity difference only changes within 1%. Therefore, for an image with little change, we can first encode a complete image frame A, and then not encode all the images in the subsequent B frames, but only write the difference from the A frame, so that the size of the B frame is only the complete frame. 1/10 or less! If the C frame after the B frame does not change much, we can continue to encode the C frame in the way of referring to the B, and the cycle continues. This image is called a sequence (sequence is a piece of data with the same characteristics). When an image changes greatly from the previous image and cannot be generated by referring to the previous frame, then we end the previous sequence and start the next Sequence, that is, a complete frame A1 is generated for this image, and subsequent images are generated with reference to A1, and only the differences from A1 are written.
   Three types of frames are defined in the H264 protocol. A fully encoded frame is called an I frame, a frame that is generated by referring to the previous I frame and contains only the difference part of the encoding is called a P frame, and a frame that refers to the previous and after frame encoding is called a B frame.
   The core algorithms used by H264 are intra-frame compression and inter-frame compression. Intra-frame compression is an algorithm for generating I frames, and inter-frame compression is an algorithm for generating B and P frames.

 

Description of the sequence
    In H264, images are organized in units of sequences. A sequence is a data stream of a piece of image coded, starting with an I frame and ending with the next I frame.
    The first image in a sequence is called an IDR image (immediate refresh image), and IDR images are all I-frame images. H.264 introduces IDR image for decoding resynchronization. When the decoder decodes the IDR image, it immediately clears the reference frame queue, outputs or discards all the decoded data, searches for the parameter set again, and starts a new sequence. In this way, if there is a major error in the previous sequence, you can get a chance to resynchronize here. The image after the IDR image will never be decoded using the data of the image before the IDR.
       A sequence is a data stream generated after encoding a segment of images with not too big differences in content. When the motion changes are relatively small, a sequence can be very long, because less motion changes means that the content of the image has little change, so you can edit an I frame, and then keep P and B frames. When the movement changes a lot, a sequence may be relatively short, for example, it contains one I frame and 3 or 4 P frames.

 

Description of the three frames

1. I frame
I frame: Intra-frame coding frame, I frame represents the key frame, you can understand it as the complete preservation of this frame of picture; only the data of this frame can be completed when decoding (because it contains the complete picture)
Features of I frame:
1) It is a full-frame compression coded frame. It performs JPEG compression encoding and transmission of full frame image information;
2) Only I frame data can be used to reconstruct the complete image when decoding;
3) I frame describes the details of the image background and moving subjects;
4) I frame does not need Generated with reference to other pictures;
5) I frame is the reference frame of P frame and B frame (its quality directly affects the quality of subsequent frames in the same group);
6) I frame is the basic frame of the frame group GOP (the first frame ), there is only one I frame in a group;
7) I frames do not need to consider the motion vector;
8) I frames occupy a relatively large amount of data.

2. P frame 

P frame: forward predictive coding frame. P frame represents the difference between this frame and the previous key frame (or P frame). When decoding, it is necessary to superimpose the difference defined by this frame with the previously buffered picture to generate the final picture. (That is, the difference frame, P frame does not have complete picture data, only the data that is different from the picture of the previous frame)
P frame prediction and reconstruction: P frame is based on I frame as reference frame, and P frame is found in I frame The predicted value and motion vector of "a certain point" are transmitted together with the predicted difference value and motion vector. At the receiving end, according to the motion vector, the predicted value of "a certain point" of the P frame is found from the I frame and added with the difference to obtain the sample value of the "certain point" of the P frame, thereby obtaining a complete P frame.
P frame characteristics:
1) P frame is an encoded frame separated by 1~2 frames after I frame;
2) P frame uses motion compensation to transmit the difference between it and the previous I or P frame and the motion vector (prediction error);
3) During decoding, the predicted value in the I frame must be summed with the prediction error to reconstruct the complete P frame image;
4) The P frame belongs to the inter-frame coding of forward prediction. It only refers to the I frame or the P frame that is closest to it in front;
5) The P frame can be the reference frame of the P frame after it, or the reference frame of the B frame before and after it;
6) Since the P frame is a reference frame, it It may cause the spread of decoding errors;
7) Because it is a differential transmission, the compression of P frames is relatively high.

3. B frame

B frame: Bidirectional predictive interpolation coding frame. The B frame is a two-way difference frame, that is, the B frame records the difference between the current frame and the previous and next frames (the specifics are more complicated, there are 4 cases, but I say this is simpler). In other words, to decode the B frame, you must not only obtain the previous The buffered picture of, and the picture after decoding, the final picture is obtained by superimposing the front and rear pictures with the data of the current frame. The B-frame compression rate is high, but the CPU will be tired when decoding.
Prediction and reconstruction of
B-frames B-frames use the previous I or P frame and the following P frame as reference frames to "find out" the predicted value and two motion vectors of the "point" of the B frame, and take the sum of the prediction differences Motion vector transmission. The receiving end "finds (calculates)" the predicted value in the two reference frames according to the motion vector and sums it with the difference to obtain the "some point" sample value of the B frame, thereby obtaining the complete B frame.
B frame characteristics
1) B frame is predicted by the previous I or P frame and the following P frame;
2) B frame is the prediction error between it and the previous I or P frame and the following P frame And motion vectors;
3) B frame is a two-way predictive coding frame;
4) B frame has the highest compression ratio, because it only reflects the changes of the moving subject between reference frames C, and the prediction is more accurate;
5) B frame is not a reference frame, it will not Cause the spread of decoding errors.

Note: I, B, and P frames are artificially defined according to the needs of the compression algorithm, and they are all real physical frames. Generally speaking, the compression ratio of I frame is 7 (similar to JPG), P frame is 20, and B frame can reach 50. It can be seen that using B-frames can save a lot of space, and the saved space can be used to save more I-frames, so that better picture quality can be provided at the same bit rate.

 

Description of the compression algorithm The compression method of
h264:
1. Grouping: divide several frames of images into one group (GOP, that is, a sequence). In order to prevent motion changes, the number of frames should not be too large.
2. Defining frames: Define each frame image in each group as three types, namely I frame, B frame and P frame;
3. Predicted frame: Use I frame as the basic frame, predict P frame with I frame, and then I frame and P frame predict B frame;
4. Data transmission: Finally, the difference information between I frame data and prediction is stored and transmitted.
    Intraframe compression is also called Spatial compression. When compressing a frame of image, only the data of the current frame is considered without considering the redundant information between adjacent frames, which is actually similar to still image compression. Intraframe generally uses lossy compression algorithm. Since intraframe compression encodes a complete image, it can be decoded and displayed independently. Intra-frame compression generally does not achieve high compression, which is similar to encoding jpeg.  
    The principle of interframe compression is that the data of several adjacent frames have great correlation, or the characteristics of little change in the information of the two frames before and after. That is, continuous video has redundant information between adjacent frames. According to this feature, compressing the redundancy between adjacent frames can further increase the amount of compression and reduce the compression ratio. Inter-frame compression is also called Temporal compression, which compresses data by comparing data between different frames on the time axis. Inter-frame compression is generally lossless. The frame differencing algorithm is a typical time compression method. It compares the difference between the current frame and the adjacent frames, and only records the difference between the current frame and its adjacent frames, which can greatly reduce the amount of data.
      By the way, lossy (Lossy) compression and lossless (Lossy less) compression. Lossless compression means that the data before compression and after decompression are exactly the same. Most lossless compression uses RLE run length encoding algorithm. Lossy compression means that the data after decompression is inconsistent with the data before compression. In the compression process, some images or audio information that are not sensitive to human eyes and ears will be lost, and the lost information cannot be recovered. Almost all high compression algorithms use lossy compression, so as to achieve the goal of low data rate. The lost data rate is related to the compression ratio. The smaller the compression ratio, the more data is lost, and the decompression effect is generally worse. In addition, some lossy compression algorithms use multiple repetitions of compression, which will cause additional data loss.

Guess you like

Origin blog.csdn.net/u010868213/article/details/108771260