Common concepts of H264

H264, it should be a common technical term for readers, so what is h264?

who I am

H.264 is a video coding standard.

In the spelling of terms, Xiaocheng takes the understanding as the standard.

This article introduces the common concepts of H264.

Warning, this article is relatively boring, and readers can give up reading at any time.

(1) Where does H264 come from?

When Xiao Cheng introduced the concept of media format before, he mentioned the International Organization for Standardization (ISO), and now is the time for it to appear.

H264 is the product of the International Organization for Standardization (ISO) and the International Telecommunication Union (ITU).

But ISO is the big boss, and it gives the direction to benefit mankind. What really does this is his subordinate MPEG, which is the Moving Picture Experts Group.

Similarly, ITU also has an expert group to take the lead, called VCEG, that is, the Video Coding Expert Group.

H.264 has many nicknames, such as: H.264/AVC, AVC, H.264/MPEG-4 AVC, and so on.

h264

(2) What are the advantages of H264?

flying

h264 is the crystallization of the cooperation between ISO and ITU, and before that, these two companies have their own products, such as ITU has h261, h263, h263+, and MPEG has MPEG-1, MPEG-2, MPEG-4 and so on.

Naturally, the joint research results of the two companies cannot be worse than the previous ones.

The advantage of h264 is that it has a higher compression rate (that is, a lower bit rate) under the same image quality.

Therefore, students who watch videos have benefits, because this means that they can save more traffic and have faster transmission speed!

(3) Design of H264

Body shape

(a) vcl and nalu

H264 is divided into different parts in design, involving complex concepts, such as the division of vcl and nal, intra-frame and inter-frame prediction coding, integer transformation, entropy coding, and so on.

Xiaocheng only introduces some simple concepts here.

In terms of design, h264 is divided into two layers: vcl and nal.

vcl, video coding layer, is the video coding layer, which is responsible for coding video and is independent of the network environment.
nal, network abstraction layer, the network abstraction layer, encapsulates the data provided by vcl and applies it to network transmission.

The basic unit of the nal layer is called nalu.

nalu, network abstraction layer unit, network abstraction layer unit.

The general structure of nalu is as follows:

nalu structure

RBSP, which is raw data (may be encoded video data, or other data), plus a "0" bit for alignment.

The header of nalu, a total of 8 bits:

forbidden_bit(1bit): Forbidden bit, used for error correction.
nal_reference_bit (2bit): The importance level identifier, the larger the more important, 0 is the least important.
nal_unit_type (5bit): The lower 5 bits are used to distinguish the type of nalu.

Type of nalu:
nalu type
The nal_reference_bit in the table identifies the degree of importance (the bigger the more important).

When nal_unit_type is 1 to 5, it is slice data, that is, video encoded data. If the value is 5, it can be understood as an IDR frame, that is, the first I frame of an image sequence.

In addition to encapsulating slice data (video data), nalu can also encapsulate other types of data. For example, when nal_unit_type is 7 and 8, the corresponding sequence parameters (sps) and image parameters (pps) are required for decoding. .

The content of nalu is the content of rbsp, and the content of rbsp is also classified as follows:
rbsp type

(b) I frame, etc.

Then, Xiao Cheng introduces concepts such as I frame and P frame, which are concepts that readers may encounter frequently.

Compression is to save storage capacity and transmission bandwidth. Small size and good quality are the goals pursued.

The key point of video compression is to remove redundancy.

What is redundancy? What you have and what I have (related) is redundancy, and what I don't feel is redundancy.

There are two directions for h264 coding to remove redundancy, one is intra-frame prediction coding, and the other is inter-frame prediction coding.

Intra-frame prediction focuses on the redundancy of an independent picture (regardless of the connection with the previous and subsequent pictures), and removes the redundancy of this picture (redundancy between macroblocks).

Inter-frame prediction, which focuses on the redundancy between the preceding and following pictures, preserves only the differences, and relies on the reference frame.

Frames generated by inter-frame predictive coding are divided into I frames, P frames and B frames.

All frames participate in the grouping, the group of this picture, also called the picture sequence, or GOP.

And GOP is often the length of the group of pictures, which can be set.

The images in an image sequence are closely related, that is, when the scene is changed (a big change comes), another GOP should be created.

MPEG2, h264 and HEVC have different definitions of GOP, so it is necessary to clarify which standard is when discussing GOP. Here we are talking about h264.

In the image sequence of h264, starting with an IDR frame and ending with the next IDR frame, multiple I frames can appear in an image sequence.

The first I frame of the GOP, called IDR frame, is distinguished from other ordinary I frames, so IDR is I, but I may not be IDR.

The appearance of IDR means that the history is invalid (historical mistakes should not affect the current group), and the previous image can no longer be relied on, but the encoding must be restarted.

Generally speaking, when the continuous image changes are small (such as slowly sliding the screen during screen recording), the GOP value is large, and after one IDR, it can be continuously represented by P or B frames; when the change is large (sliding the screen quickly), the GOP value is small. Maybe one IDR adds two or three P frames and then cuts to the next set of sequences. At this time, the encoded volume is also larger, and if it is transmitted, it will produce a bandwidth peak.

IDR, called instant decodable frame.

I frame, namely Intra-predicted Frame, intra-frame prediction frame. There are also many synonyms for I-frames, such as keyframe, independent full image, base frame, etc.

I-frames are similar to JPEG's compression algorithm.

P frames and B frames are both inter-frame predictive coding, that is, they depend on other frames, and they are only the difference content (difference value and motion vector).

P frame, forward prediction coded frame. The P frame represents the difference between this frame and a previous key frame (or P frame). When decoding, it is necessary to superimpose the difference of this frame with the previously cached image to generate the final image.

B frame, bidirectional predictive coded frame. The B frame depends on the previous I or P frame, and depends on the following P frame, so when decoding the B frame, it is necessary to obtain both the previous cached picture and the decoded picture before the final picture can be superimposed.

In terms of compression ratio, B>P>I, and the same is true in terms of decoding complexity. Generally (I+P) this level is used the most.

Both I and P will become reference frames, so pay attention to the impact, if you are worse, others will be worse.

So far, Xiao Cheng has introduced some common concepts of H264.


To sum up, this article introduces some common concepts of the H264 encoding standard, and I hope readers have a conceptual understanding or understanding of H264.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324648485&siteId=291194637