H264 basic concept study notes
- NALU(Network Abstract Layer Unit)
- GOP (Group of Pictures ) is mainly used to describe the number of frames between one IDR frame and the next IDR frame.
- H264 organizes it into five levels: sequence (GOP) , picture (pictrue) , slice (Slice) , macroblock (Macroblock) , and subblock (subblock) .
- H264 divides the video into consecutive frames for transmission, and uses I , P, and B frames between consecutive frames. At the same time, for intra-frame content, the image is divided into blocks, macroblocks and word blocks for transmission; this process realizes compression and packaging of video files.
- IDR ( Instantaneous Decoding Refresh , instantaneous decoding refresh ) .
- The first image in a sequence is called an IDR image (ie refresh image), and IDR images are all I- frame images
- Its core function is to resynchronize the decoding . When the decoder decodes the IDR image, the reference frame queue is cleared immediately , all the decoded data is output or discarded, the parameter set is searched again, and a new one is started. sequence.
- B- frames cannot be used as reference frames.
NALU
- SPS : Sequence parameter set. A set of global parameters of a coded video sequence (Coded video sequence) is stored in the SPS .
- PPS : Image parameter set, which corresponds to the parameters of a certain image or a certain image in a sequence.
- I frame : Intra-frame coded frame, which can be decoded independently to generate a complete picture.
- P frame : Forward predictive coding frame. It needs to refer to the previous I or B to create a complete picture.
- B frame: Bidirectional predictive interpolation coding frame, you must refer to the previous I or P frame and the following P frame to create a complete picture
- Before sending an I frame, SPS and PPS must be sent at least once .
NALU structure
- The original H.264 stream (bare stream) is composed of one NALU, and its functions are divided into two layers :
- VCL (Video Coding Layer) : including core compression engine and block, macroblock and syntax level definition, the design goal is to be as unique as possible to the network for efficient coding
- The NAL (Open networks extraction layer) : responsible for VCL associate the resulting string ⽐ Laid Open networks adapted to various environments and polyols, covering all still pictures level above the level of grammar
- Before data transmission or storage in the VCL, these encoded VCL data are mapped or encapsulated into NAL units.
- 1 NALU = 1 set of NALU header information corresponding to video encoding + 1 raw byte sequence payload (RBSP, Raw Byte Sequence Payload)
- The main structure of the NALU structural unit is as follows:
- An original H.264 NALU unit usually consists of three parts: [StartCode] [NALU Header] [NALU Payload]
- The Start Code is used to indicate that this is the beginning of a NALU unit, and it must be "00 00 00 01" or "00 00 01"
- In addition, it is basically equivalent to a NAL header + RBSP;
- After FFmpeg is de-multiplexed, the packet read from the MP4 file does not carry the startcode, but the packet read from the TS file carries the startcode.
Analyze NALU
- Each NAL unit is a variable-byte character string of a certain syntax element, including one byte of header information (used to indicate data type), and several integer bytes of payload data .
- NALU header information (one byte):
(Note: Picture reference: https://www.jianshu.com/p/31ed32fd7b6b )
- T is the load data type, which occupies 5bit
- nal_unit_type : The type of this NALU unit , 1 to 12 are used by H.264 , and 24 to 31 are used by applications other than H.264
- R is the importance indicator, occupying 2 bits
- nal_ref_idc.: Take 00~11, which seems to indicate the importance of this NALU
- For example , the NALU decoder of 00 can discard it without affecting the playback of the image, 0 to 3 , the larger the value, the more important the current NAL , and it needs to be protected first.
- If the current NAL is an important unit of the reference frame, or the sequence parameter set, or the image parameter set, this syntax element must be greater than 0 .
- F is forbidden position, occupying 1bit
- forbidden_zero_bit : It is specified in the H.264 specification that this bit must be 0
- The H.264 standard states that when the data stream is stored on the medium , a start code: 0x000001 or 0x00000001 is added before each NALU to indicate the start and end position of a NALU :
- Under this mechanism, the start code is detected in the code stream as the start identifier of a NALU . When the next start code is detected, the current NALU ends .
- The 3-byte 0x000001 is only used in one occasion, that is , when a complete frame is compiled into multiple slices (slices) , the NALU containing these slices uses a 3- byte start code.
- In other cases, it is 4 bytes 0x00000001 .
H264 annexb mode
- H264 has two packages:
- Annexb mode , traditional mode, with startcode, SPS and PPS are in ES (refer to: "The difference between H264 ES PS TS stream" https://blog.csdn.net/coloriy/article/details/80623192 )
- mp4 mode , shoots as usual mp4 MKV are mp4 mode, no startcode, SPS and PPS, and other information is packaged in the container , before each ⼀ ⾯ a frame is four bytes of the frame of ⻓ .
- Many decoders only support the annexb mode , so you need to convert mp4 : use h264_mp4toannexb_filte r to do the conversion in ffmpeg
const AVBitStreamFilter *bsfilter = av_bsf_get_by_name("h264_mp4toannexb"); AVBSFContext *bsf_ctx = NULL; // 2 初始化过滤器上下⽂ av_bsf_alloc(bsfilter, &bsf_ctx); //AVBSFContext; // 3 添加解码器属性 6avcodec_parameters_copy(bsf_ctx->par_in, ifmt_ctx->streams[videoindex]->cod ecpar); av_bsf_init(bsf_ctx);
GOP group of pictures
- In the video coding sequence, GOP stands for Group of pictures, which refers to the distance between two I frames .
- Reference (reference period) refers to the distance between two P frames .
- One I frame occupies more bytes than one P frame, and one P frame occupies more bytes than one B frame.
- Therefore, under the premise of the same bit rate, the larger the GOP value, the greater the number of P and B frames, and the average number of bytes occupied by each I , P , and B frame is, and it is easier to obtain Better image quality; the larger the Reference, the greater the number of B- frames, and similarly, it is easier to obtain better image quality.
- There is a limit to improving the image quality by increasing the GOP value. In the case of scene switching, the H.264 encoder will automatically and forcefully insert an I frame. At this time, the actual GOP value is shortened.
- In a GOP, the P and B frames are predicted from the I frame. When the image quality of the I frame is lower, it will affect the image quality of the subsequent P and B frames in a GOP until the next GOP. It is possible to recover from the beginning, so the GOP value should not be set too large.
- Since the complexity of P and B frames is greater than that of I frames, too many P and B frames will affect the coding efficiency and reduce the coding efficiency.
- In addition, the excessive GOP will also affect the response speed of the Seek operation. Since the P and B frames are predicted from the previous I or P frames, the Seek operation needs to be directly positioned. When decoding a certain P or B frame, need to present the decoded GOP in the I -frame and the previous N- th frame prediction can, GOP values ⻓ the more, the more predictive frame to be decoded, Seek time response of the more ⻓.
- Expansion: refer to https://www.jianshu.com/p/31ed32fd7b6b