H264 NALU analysis

1.H264 introduction

H.264 started in 1999, formed a draft in 2003, and was finally finalized in 2007 to be verified. In the ITU standard, it is called H.264. In the MPEG standard, it is a component of MPEG-4 – MPEG-4 Part 10, also called Advanced Video Codec. Therefore, it is often called MPEG-4 AVC or directly called AVC. .

2.H264 codec analysis

2.1. H264 encoding principle

In the process of audio and video transmission, the transmission of video files is a big problem. For example, for a video with a resolution of 1920*1080, each pixel takes up 3 bytes for RGB, and the frame rate is 25, then the transmission bandwidth requirements are is: 1920 * 1080 * 3 * 25 / 1024/1024=148.315MB/s. This network rate is undesirable for our current network technology, so video compression and encoding technology came into being. For video files, Video is composed of single picture frames, and videos are generally continuous, so each frame has similarities. Video compression technology uses this to compress pictures.

H264 uses a block size of 16 * 16 to perform pixel comparison on the video frame image for compression encoding.
Insert image description here

2.2 H264 I frame, P frame, B frame

H264 uses intra-frame compression and extra-frame compression to increase the coding rate.
H264 uses a unique I frame, P frame, and B frame strategy to achieve compression between consecutive frames.
Insert image description here

As shown in FIG:

Classification of frames Chinese significance
I-wall intra-coded frame I frame is usually the first frame of each GOP (a video compression technology used by MPEG). After moderate compression, it can be used as a reference point for random access and can be regarded as an image. I frame can see that an image has been compressed product, so the I-frame can be restored into a complete picture by itself through the decompression algorithm
P frame Forward predictive coding frame Encoded images that compress the amount of transmitted data by fully incorporating less temporal redundancy information than previous encoded frames in the image sequence, also called predicted frames. It is necessary to refer to the previous I frame or P frame to generate a complete picture.
B frame Bidirectional prediction frame It is necessary to consider both the previous frame and the following frame. The temporal redundant information between the encoded frames is used to compress the encoded image that transmits the amount of data. It is also called a bidirectional prediction frame. Generate a complete picture by referring to the previous I frame or P frame and the next P frame or I frame.

Compression ratio: B>P>I

2.3 H264 encoding structure analysis

In addition to realizing video compression, H264 also provides corresponding video encoding and fragmentation strategies for network transmission; similar to the way network data is encapsulated into IP frames, it is called a group (GOP, group of pictures) in H264. (slice), macroblock (Macroblock), these together form the code stream hierarchical structure of H264.
H264 organizes it into sequence (GOP), picture (pictrue), slice (Slice), macroblock (Macroblock), sub There are five levels of subblocks.
GOP (picture group) is mainly used to describe the number of frames between one IDR frame and the next IDR frame.
Insert image description here
H264 divides video into continuous frames for transmission, using I frames, P frames, and B frames between consecutive frames. At the same time, for intra-frame video, the image is divided into slices, macro blocks and word blocks for segmentation and transmission; through this process, the video file is compressed and packaged.

IDR (Instantaneous Decoding Refresh)
The first image of a sequence is called an IDR frame. The IDR frame must be an I frame, but the I frame is not necessarily an IDR frame. Since other frames are needed to restore P frames and B frames, When decompressing images, the previously restored images will be stored in a queue, but not all images will always be placed in the queue after decompression. When an IDR frame is encountered, the previous queue will be cleared, so that if the previous If a sequence encounters a major error, subsequent frames will not be affected.
Insert image description here

2.4 WAVES

Insert image description here

  • SPS : Sequence parameter set. SPS stores a set of global parameters of the coded video sequence (Coded video sequence).
  • PPS : Image parameter set, corresponding to the parameters of a certain image or several images in a sequence.
  • I-frame : Intra-coded frame, which can be independently decoded to generate a complete picture.
  • P frame: Forward prediction encoding frame, it needs to refer to an I or B in front of it to generate a complete picture.
  • B frame : Bi-directional predictive interpolation coding frame refers to the previous I or P frame and the following P frame to generate a complete picture.
  • Before sending an I frame, SPS and PPS must be sent at least once.

2.4.1 NALU structure

The H.264 naked stream is composed of NALUs. Its function is divided into two layers, VCL (Video Coding Layer) and NAL (Network Extraction Layer):

  • VCL: Includes the core compression engine and syntax-level definitions of blocks, macroblocks, and slices. The design goal is to enable efficient encoding as independent of the network as possible.
  • NAL: Responsible for adapting the bit strings generated by VCL to various network environments and diverse environments, covering all slice-level and above grammars.

Before VCL performs data transmission or storage, these encoded VCL data need to be mapped or encapsulated into NAL units (NALU)

A NALU = a set of NALU header information + a raw byte sequence payload (RBSP, Raw Byte Sequence Payload).

The main structure of the NALU structural unit is as follows: An original H264 original unit usually consists of three parts: [StartCode][NALUHeader][NALU Payload], where StartCode is used to identify the start of a NALU unit and must be "00 00 00 01" or "00 00 01", except that it is basically equivalent to NALU Header+ RBSP.
Insert image description here
For FFmpeg, the packet after MP4 file demultiplexing does not have startcode, but the packet read from the TS file does have startcode.

2.4.2 Parsing NALU

Each NALU unit is a variable byte-length string of certain syntax elements, including one byte of header information (used to indicate the data type), and a number of payload data. Among them
Insert image description here
:

  • T is the load data type, occupying 5 bits
  • nal_unit_type: The type of this NALU unit, 1~12 are used by H.264, 24~31 are used by applications other than H.264

  • R is an important indicator, occupying 2 bits
  • nal_ref_idc.: 00~11, which seems to indicate the importance of this NALU. For example, the NALU decoder of 00 can discard it without affecting the playback of the image. 0~3, the larger the value, the more important the current NAL is and needs to be received first. Protect. If the current NAL is an important unit such as a frame belonging to a reference frame, a sequence parameter set, or an image parameter set, this syntax element must be greater than 0.

  • F is a forbidden bit, occupying 1 bit
  • forbidden_zero_bit: This bit must be 0 in the H.264 specification.

nal_unit_type Contents of NAL units and RBSP syntax structures
0 unspecified
1 Encoding slice of a non-IDR image slice_layer_without_partitioning_rbsp()
2 Encode slice data partition block A slice_data_partition_a_layer_rbsp()
3 Encode slice data partition block B slice_data_partition_b_layer_rbsp()
4 Encoded slice data partition block C slice_data_partition_c_layer_rbsp()
5 Coding slice (slice) of IDR image slice_layer_without_partitioning_rbsp()
6 Auxiliary enhancement information (SEI)sei_rbsp( )
7 Sequence parameter set seq_parameter_set_rbsp()
8 Image parameter set pic_parameter_set_rbsp()
9 Access unit delimiter access_unit_delimiter_rbsp()
10 End of sequence end_of_seq_rbsp()
11 end of stream
12 Data input
13 Sequence parameter set extension

2.4.3 annexb mode

h264 has a total of 2 packaging formats:

  • One is the annexb mode, the traditional mode, including startcode, SPS and PPS, in ES
  • One is mp4 mode. Generally, mp4 and mkv are mp4 mode. There is no startcode, SPS, PPS and other information. It is encapsulated in the container. The first 4 bytes of each frame are the length of the frame.

Generally, decoders only support annexb mode, so mp4 needs to be converted. Use h264_mp4toannexb_filter in ffmpeg.
Insert image description here

Guess you like

Origin blog.csdn.net/m0_60565784/article/details/131263748