(Audio and video study notes): H264 basic concepts

H264 basic concept study notes

  • NALU(Network Abstract Layer Unit)
  • GOP (Group of Pictures ) is mainly used to describe the number of frames between one IDR frame and the next IDR frame.
  • H264 organizes it into five levels: sequence (GOP) , picture (pictrue) , slice (Slice) , macroblock (Macroblock) , and subblock (subblock) .
  • H264 divides the video into consecutive frames for transmission, and uses I , P, and B frames between consecutive frames. At the same time, for intra-frame content, the image is divided into blocks, macroblocks and word blocks for transmission; this process realizes compression and packaging of video files.
  • IDR ( Instantaneous Decoding Refresh , instantaneous decoding refresh ) .
    • The first image in a sequence is called an IDR image (ie refresh image), and IDR images are all I- frame images
  • Its core function is to resynchronize the decoding . When the decoder decodes the IDR image, the reference frame queue is cleared immediately , all the decoded data is output or discarded, the parameter set is searched again, and a new one is started. sequence.
  • B- frames cannot be used as reference frames.

NALU

  • SPS : Sequence parameter set. A set of global parameters of a coded video sequence (Coded video sequence) is stored in the SPS .
  • PPS : Image parameter set, which corresponds to the parameters of a certain image or a certain image in a sequence.
  • I frame : Intra-frame coded frame, which can be decoded independently to generate a complete picture.
  • P frame : Forward predictive coding frame. It needs to refer to the previous I or B to create a complete picture.
  • B frame: Bidirectional predictive interpolation coding frame, you must refer to the previous I or P frame and the following P frame to create a complete picture
  • Before sending an I frame, SPS and PPS must be sent at least once .

NALU structure

  • The original H.264 stream (bare stream) is composed of one NALU, and its functions are divided into two layers :
    • VCL (Video Coding Layer) : including core compression engine and block, macroblock and syntax level definition, the design goal is to be as unique as possible to the network for efficient coding
    • The NAL (Open networks extraction layer) : responsible for VCL associate the resulting string ⽐ Laid Open networks adapted to various environments and polyols, covering all still pictures level above the level of grammar
  • Before data transmission or storage in the VCL, these encoded VCL data are mapped or encapsulated into NAL units.
  • 1 NALU = 1 set of NALU header information corresponding to video encoding + 1 raw byte sequence payload (RBSP, Raw Byte Sequence Payload)
  • The main structure of the NALU structural unit is as follows:

  • An original H.264 NALU unit usually consists of three parts: [StartCode] [NALU Header] [NALU Payload]
    • The Start Code is used to indicate that this is the beginning of a NALU unit, and it must be "00 00 00 01" or "00 00 01"
    • In addition, it is basically equivalent to a NAL header + RBSP;
  • After FFmpeg is de-multiplexed, the packet read from the MP4 file does not carry the startcode, but the packet read from the TS file carries the startcode.

Analyze NALU

  • Each NAL unit is a variable-byte character string of a certain syntax element, including one byte of header information (used to indicate data type), and several integer bytes of payload data .
  • NALU header information (one byte):

(Note: Picture reference: https://www.jianshu.com/p/31ed32fd7b6b )

  • T is the load data type, which occupies 5bit
    • nal_unit_type : The type of this NALU unit , 1 to 12 are used by H.264 , and 24 to 31 are used by applications other than H.264
  • R is the importance indicator, occupying 2 bits
    • nal_ref_idc.: Take 00~11, which seems to indicate the importance of this NALU
    • For example , the NALU decoder of 00 can discard it without affecting the playback of the image, 0 to 3 , the larger the value, the more important the current NAL , and it needs to be protected first.
    • If the current NAL is an important unit of the reference frame, or the sequence parameter set, or the image parameter set, this syntax element must be greater than 0 .
  • F is forbidden position, occupying 1bit
    • forbidden_zero_bit : It is specified in the H.264 specification that this bit must be 0
  • The H.264 standard states that when the data stream is stored on the medium , a start code: 0x000001 or 0x00000001 is added before each NALU to indicate the start and end position of a NALU :
    • Under this mechanism, the start code is detected in the code stream as the start identifier of a NALU . When the next start code is detected, the current NALU ends .
    • The 3-byte 0x000001 is only used in one occasion, that is , when a complete frame is compiled into multiple slices (slices) , the NALU containing these slices uses a 3- byte start code.
    • In other cases, it is 4 bytes 0x00000001 .

H264 annexb mode

  • H264 has two packages:
    • Annexb mode , traditional mode, with startcode, SPS and PPS are in ES (refer to: "The difference between H264 ES PS TS stream" https://blog.csdn.net/coloriy/article/details/80623192 )
    • mp4 mode , shoots as usual mp4 MKV are mp4 mode, no startcode, SPS and PPS, and other information is packaged in the container , before each ⼀ ⾯ a frame is four bytes of the frame of ⻓ .
  • Many decoders only support the annexb mode , so you need to convert mp4 : use h264_mp4toannexb_filte r to do the conversion in ffmpeg
const AVBitStreamFilter *bsfilter = av_bsf_get_by_name("h264_mp4toannexb"); 
AVBSFContext *bsf_ctx = NULL; 
// 2 初始化过滤器上下⽂ 
av_bsf_alloc(bsfilter, &bsf_ctx); //AVBSFContext; 
// 3 添加解码器属性 
6avcodec_parameters_copy(bsf_ctx->par_in, ifmt_ctx->streams[videoindex]->cod ecpar); 
av_bsf_init(bsf_ctx);

GOP group of pictures

  • In the video coding sequence, GOP stands for Group of pictures, which refers to the distance between two I frames .
  • Reference (reference period) refers to the distance between two P frames .
  • One I frame occupies more bytes than one P frame, and one P frame occupies more bytes than one B frame.
  • Therefore, under the premise of the same bit rate, the larger the GOP value, the greater the number of P and B frames, and the average number of bytes occupied by each I , P , and B frame is, and it is easier to obtain Better image quality; the larger the Reference, the greater the number of B- frames, and similarly, it is easier to obtain better image quality.
  • There is a limit to improving the image quality by increasing the GOP value. In the case of scene switching, the H.264 encoder will automatically and forcefully insert an I frame. At this time, the actual GOP value is shortened.
  • In a GOP, the P and B frames are predicted from the I frame. When the image quality of the I frame is lower, it will affect the image quality of the subsequent P and B frames in a GOP until the next GOP. It is possible to recover from the beginning, so the GOP value should not be set too large.
  • Since the complexity of P and B frames is greater than that of I frames, too many P and B frames will affect the coding efficiency and reduce the coding efficiency.
  • In addition, the excessive GOP will also affect the response speed of the Seek operation. Since the P and B frames are predicted from the previous I or P frames, the Seek operation needs to be directly positioned. When decoding a certain P or B frame, need to present the decoded GOP in the I -frame and the previous N- th frame prediction can, GOP values ⻓ the more, the more predictive frame to be decoded, Seek time response of the more ⻓.
  • Expansion: refer to  https://www.jianshu.com/p/31ed32fd7b6b

Guess you like

Origin blog.csdn.net/baidu_41388533/article/details/114756342