1. H264 coding principle and basic concept: I frame, P frame, B frame

1. I frame, P frame, B frame:
Frame is the basic unit of video image.
The key frame is also called the I frame, which is an important frame in the inter-frame compression coding; it is a full-frame compressed coded frame; the
complete image can be reconstructed only with the data of the I frame during decoding; the
I frame does not need to refer to other pictures And generate.
Video files are composed of multiple consecutive pictures.

Compression method used: divide several frames into a group (GOP):
1. Define frame: define each group of intra-frame images as three types, namely I frame, P frame, B frame:
2. Predicted frame: I frame is the basic frame, the I frame is used to predict the P frame, and then the I frame and the P frame are used to predict the B frame;
3. Data transmission: the I frame data and the predicted difference information are stored and transmitted;

(1) I-frame (an IDR Frame)
the I frame (I frame), also known as an internal picture (intra picture), I frame is typically the first frame of each GOP (as used for MPEG video compression techniques), and
after Compressed moderately, as a reference point for random access, can be used as an image.
I-frame features:
1. It is a full-frame compression coded frame, and the complete image can be reconstructed with only I-frame data during decoding; 2. I-
frames do not need to be generated with reference to other frames, and occupy a relatively large amount of data information;
3 . I frame is the first frame in GOP, it is the reference frame of P frame and B frame, its quality directly affects the same group of P frame and B frame; 4.
I frame can only go to the image spatial information redundancy;

(2) P frame (Pre Frame) The
P frame method compresses the data of the frame according to the difference between the frame and the adjacent previous frame (I frame or P frame).
The joint compression method of P-frame and I-frame can achieve higher compression without obvious compression trace.
The P frame is based on the I frame as a reference frame. The prediction value and motion vector of a certain point in the P frame are found in the I frame, and the prediction difference and the motion vector are taken and transmitted together.
At the receiving end, the sample value of a certain point of the P frame is found from the I frame according to the motion vector, and the sample value of the certain point of the P frame is obtained by adding the difference value, thereby obtaining a complete P frame.

Features of P frame:
1. P frame belongs to the inter-frame coding of forward prediction, it only refers to the previous I frame or P frame closest to it;
2. When decoding, the prediction value in the I frame must be summed with the prediction error. Reconstruct the complete P frame image;
3. Since the P frame is a reference frame, it may cause the expansion of decoding errors ; 4. The P
frame can remove the temporal redundant information of the image;

(3) B frame (Bidirectional Frame)
B image (frame) is an encoded image that considers both the encoded frames before the source image sequence and the time redundant information between the encoded frames after the source image sequence
to compress the amount of data to be transmitted , Also called bidirectional prediction frame; generally, I frame has the lowest compression efficiency, P frame is higher, and B frame is the highest.

B frame features: 1. B frame
is a two-way predictive coding frame, which is predicted by the previous I frame or P frame and the following P frame; 2.
B frame records the prediction error and motion vector of the front and back images;
3.B The frame is not a reference frame and will not cause the expansion of decoding errors; 4.
B frame can remove the time redundant information of the image;

2. Pixel, macro block, slice, frame, sequence:
composition relationship:
pixel ----> macro block ----> slice ----> slice group ----> frame ----> sequence
Pixel ---->Macro Block---->Slice---->Slice Group----->Frame---->GOP
(1) Pixel:
Pixel refers to the image represented by a sequence of numbers One of the smallest units in and
indivisible . Indivisible means that it can no longer be cut into smaller units or elements. It exists as a small grid with a single color.
Each dot matrix image contains a certain amount of pixels, these pixels determine the size of the image on the screen.

The number of different colors that a pixel can express depends on the bit per pixel (BPP).
This maximum number can be obtained by taking the color depth power of two.
For example, common values ​​are:
8 bpp[2^8=256; (256 colors)];
16 bpp[2^16=65536; (65,536 colors, called high color)];
24 bpp[2^24= 16777216; (16,777,216 colors, called true color)];
48 bpp[2^48=281474976710656; 281,474,976,710,656 colors, used in many professional scanners].

(2) Macro Block:
Macro block is the basic unit of H.264 encoding. One frame of data must first be divided into multiple blocks (4x4 pixels) to be processed.
Obviously a macro block is composed of several blocks, usually macro blocks The size is 16x16 pixels.
Macroblocks are divided into I, P, and B macroblocks:
I macroblock: (ie, the macroblock of the I frame) can only use the decoded pixels in the current slice as a reference for intra prediction;
P macroblock: (ie, I frame Macroblocks) can use the previously decoded frames as reference frames for intra prediction;
B macroblocks: (i.e. macroblocks of B frames) use the preceding and following frames as references for intra prediction;

(3) Slice:
A slice contains only one NALU;
a frame of image can be encoded into one or more slices, and each slice contains an integer number of macroblocks,
that is , each slice contains at least one macroblock, and at most includes the macroblock of the entire image.
The purpose of the slice is to limit the spread and transmission of the error code and keep the code slices independent of each other.
Slice structure:
[slice header] indicates the slice type, which frame it belongs to, reference frame, etc.
[slice data] contains an integer number of macroblocks

(3) Slice Group: A
slice group is a subset of several macroblocks in a coded image, including one or several slices.

(4) Frame:
Frame, a piece of image information composed of film groups is a frame, which can be understood as a piece of image we usually see.
Frame is also the smallest unit that constitutes a video stream. Frames can be divided into I frames and P Frame and B frame.
Note: The pictures we usually touch are in "RGB" format, while video frames are usually in "YUV" format.

(5) Sequence GOP:
A sequence is a series of data streams generated after encoding a segment of images with little difference in content, that is, a group of video frames between two I frames;
for a segment of images with little change, we can first Encode a complete image frame A, the subsequent B frames do not encode all the images, only the difference from the A frame is written,
so the size of the B frame is only 1/10 of the complete frame or less! If the C frame after the B frame does not change much, we can continue to encode the C frame in the way of referring to the B frame, and the cycle continues.
This image is called a sequence (sequence is a piece of data with the same characteristics). When an image has a large change from the previous image and cannot be generated by referring to the previous frame,
then we end the previous sequence and start the next Sequence, that is, a complete frame A1 is generated for this image, and subsequent images are generated with reference to A1, and only the differences from A1 are written.

3. Image spatial redundancy and temporal redundancy:
Spatial redundancy:
The most important type of data redundancy in static images, I frame data can only remove the spatial redundancy information of the image.
There is often spatial continuity between the colors of sampled points on the surface of the same scene, but the method of representing object colors based on discrete pixel sampling usually does not take advantage of this continuity.
For example: there is a continuous area in the image, the pixels of which are the same color, and the space is redundant.

Temporal redundancy:
The redundancy often contained in sequence images, P and B frames are to remove the temporal redundancy information of the image.
There are often temporal and spatial correlations between a group of continuous pictures, but the way to represent moving images based on discrete time sampling usually does not take advantage of this coherence.
For example: Two people in the room are chatting. During this chat, the background (room and furniture) is always the same without moving at the same time,
and the same two people are chatting, only movement and position change.

4. Influencing factors of video encoding quality: compression ratio, algorithm complexity, reduction degree:
compression ratio: the larger the compression ratio, the smaller the amount of compressed data, of course the complexity of the algorithm is also higher; the
smaller the compression ratio, the compression The larger the amount of data, the lower the complexity of the algorithm;
reduction degree: compression ratio and algorithm complexity will affect the reduction degree;
compression ratio, algorithm complexity, and reduction degree are three contradictory concepts. The actual situation has some trade-offs.

Guess you like

Origin blog.csdn.net/yanghangwww/article/details/103658891