H264—frame, slice, parameter set, NALU and other concepts

h264 is an encoding and compression format, which can be encoded using the x264 library. The source code is open and can be downloaded and compiled.

-------------------------------------------------------------------------------------------------------------

H.264 Codec


h264 conceptually distinguishes between the Video Coding Layer (VCL) and the Network Abstraction Layer (NAL).


VCL contains Codec's signaling processing functions; and prediction mechanisms such as transform, quantization, and motion compensation; and loop filters. He follows the general concept of most video codecs today, macro-based encoders, using motion compensation based inter-picture prediction and transcoding of residual signals.

The (NAL) encoder encapsulates fragments of the output of the VCL encoder into network abstraction layer units (NAL units), which are suitable for transport over packet networks or for use in packet-oriented multiplexing environments.

-------------------------------------------------------------------------------------------------------------

Network Abstraction Layer Unit (NALU) type

The NAL unit type byte format is as follows: 
    
      +---------------+ 
      |0|1|2|3|4|5|6|7| 
      +-+-+-+- +-+-+-+-+ 
      |F|NRI| Type | 
      +---------------+ 
 
   The semantics of NAL unit type byte components are specified in the H.264 specification, briefly The description is as follows. 
 
   F: 1 bit 
      forbidden_zero_bit. The H.264 specification states that a setting of 1 indicates a syntax violation. 
 
   NRI: 2 bits 
      The nal_ref_idc.00 value indicates the reconstructed reference picture of the NAL unit that is not used for inter picture prediction. Such NAL units can be discarded without risking 
      the integrity of the reference picture. A value greater than 0 indicates that decoding of the NAL unit requires maintaining the integrity of the reference picture. 
 
   Type: 5 bits 
      nal_unit_type. This part specifies the NAL unit load type defined in Table 7-1 of [1] and later in this document. To refer to all currently defined NAL unit types 
      and their semantics

  • 0: Not specified
  • 1: Segments that do not use data division in non-IDR images
  • 2: Segmentation of Class A data in non-IDR images
  • 3: Segmentation of B-type data in non-IDR images
  • 4: Segmentation of C-type data in non-IDR images
  • 5: Fragment of IDR image
  • 6: Supplemental Enhancement Information (SEI)
  • 7: Sequence Parameter Set (SPS)
  • 8: Image Parameter Set (PPS)
  • 9: Separator
  • 10: Sequence terminator
  • 11: Stream terminator
  • 12: Fill data
  • 13: Sequence parameter set extension
  • 14: Prefixed NAL unit
  • 15: Subsequence parameter set
  • 16 – 18: Reserved
  • 19: Auxiliary coded picture segment without data division
  • 20: Encoded Fragment Extension
  • 21 – 23: Reserved
  • 24 – 31: Not specified
In the actual H264 data frame, the frame is often preceded by a 00 00 00 01 or 00 00 01 delimiter. Generally speaking, the first frame data compiled by the encoder is PPS and SPS, followed by an IDR frame (about IDR and I). Difference http://blog.csdn.net/jammg/article/details/52357245 )

For example, after converting rgb to yuv (x264 only supports yuv encoding and compression), the front of the 264 file encoded by the x264 encoder is generally like this:

00 00 00 01 67 ....(sps)....... 00 00 00 01 68 .........(pps)....... 00 00 00 01 65 ......(IDR)...........

[Because the length information is not given in the NAL syntax, the actual transmission and storage systems need to add additional headers to delimit each NAL unit. ]

-------------------------------------------------------------------------------------------------------------

Parameter Set Concept (SPS/PPS)
 
   A very basic design concept of H.264 is to generate self-contained packets, making mechanisms such as RFC2429 header repetition or MPEG-4's Header Extension Coding (HEC) [11] unnecessary. 
   This is achieved by decoupling the relative information of more than one segment from the media stream. High-level meta information SHOULD be sent reliably/asynchronously, not in advance with RTP 
   packet streams containing fragmented packets. (For applications that do not send this information over an out-of-band transport channel, means are also provided by sending this information in-band). A combination of high-level parameters is called a parameter set. 
   The H.264 specification includes two types of parameter sets: sequential parameter sets and image parameter sets. An active sequence parameter set remains unchanged in an encoded video sequence, and an active image parameter set 
   remains unchanged in an encoded image. The order and picture parameter set structure contains information such as picture size, optional coding mode used, macroblock to slice group mapping, etc. 
 
   In order to change picture parameters (eg picture size) without synchronously transmitting parameter set modifications to the fragment packet stream, encoders and decoders may maintain more than one 
   list of order and picture parameter sets. Each slice header contains a codeword indicating the order and picture parameter set used. 
 
   This mechanism allows to decouple the transmission of parameter sets from the packet flow, to transmit them by external means (i.e., as a side effect of capability exchange), or through a (reliable or unreliable) control protocol 
   they are never transmitted but are applied by design specifications A fix is ​​even possible.


Frame and slice

-------------------------------------------------------------------------------------------------------------

size relationship

For some concepts that appear in H.264, the order from large to small is: sequence, image (mostly called frame, including I, P, B frame), slice group, slice (including I, P, B slice, SP slice, SI slice), NALU, macroblock, sub-macroblock, block, pixel .

NOTE: Images are organized in sequences .

-------------------------------------------------------------------------------------------------------------

frame, NALU, slice

(1) In the H.264 protocol, an image is a collection concept , and the top field, bottom field, and frame can all be called images (the image concept in this paper is a collection concept). Therefore, we can know that for the H.264 protocol, the names we are usually familiar with, such as: I frame, P frame, B frame, etc., are actually all we have embodied and refined the concept of image . The "frame" we mentioned in H.264 usually refers to an image that is not divided into fields;
(2) If the FMO (Flexible Macroblock Ordering)  mechanism is not used, an image has only one slice group ;
(3), If multiple slices are not used, there is only one slice in a slice group ;
(4) If the DP ( data division ) mechanism is not used , a slice is a NALU , and a NALU is a slice .

      Otherwise, a slice consists of three NALUs (that is, the three  belong to one slice);  
   2 Encoded slice data partition block A slice_data_partition_a_layer_rbsp()

   3 Encoded slice data partition block Bslice_data_partition_b_layer_rbsp( )

   4 Encoded slice data partition block Cslice_data_partition_c_layer_rbsp( )
also corresponds to the above:        
        H264NT_SLICE_DPA,
        H264NT_SLICE_DPB,
        H264NT_SLICE_DPC,

a frame can contain one or more Slices, slices are composed of macroblocks, which are the encodingThe basic unit of theory.

An image consists of  1 to N slice groups , and each slice group consists of one or several slices . A slice consists of one NALU or three NALUs (if there is data division). In the picture decoding process, the picture is always decoded , and then the decoded macroblocks are reassembled into pictures according to the picture group. In this sense, a slice is actually the largest decoding unit .
-------------------------------------------------- -------------------------------------------------- ---------

I,P,B frame dependencies

I frame is coded independently and does not depend on other frame data.

P frame depends on I frame data. 

B frame depends on I frame, P frame or other B frame data.


Correspondingly, 1 (coded slice of non-IDR image), 2 (coded slice data partition block A), 3 (coded slice data partition block B), 4 (coded slice data partition block C) in NAL nal_unit_type , 5 (coded strips of IDR images) types and three encoding modes of Slice: I_slice, P_slice, B_slice The five types in NAL nal_unit_type represent what information the next data represents and how to block it. 
I_slice, P_slice, B_slice represent slices of type I, type P, slices of type B. Among them, I_slice is intra-frame prediction mode coding; P_slice is unidirectional prediction coding or intra-frame mode; B_slice is bidirectional prediction or intra-frame mode .


// H.264 NAL type
   enum H264NALTYPE
    {
        H264NT_NAL = 0,
        H264NT_SLICE, //P 帧
        H264NT_SLICE_DPA,
        H264NT_SLICE_DPB,
        H264NT_SLICE_DPC,
        H264NT_SLICE_IDR, // I 帧
        H264NT_SEI,
        H264NT_SPS,
        H264NT_PPS,
   };

// 0x00 0x00 0x00 0x01 0x65(0x45) The first four bytes are the frame header, and 0x65 is the key frame

// 0x00 0x00 0x01 0x65(0x45) is also a keyframe


H264GetNALType(unsigned char * pBSBuf, const int nBSLen)
{
if ( nBSLen < 5 ) //  incomplete NAL unit
   return H264NT_NAL;


unsigned char * pBS = (unsigned char *)pBSBuf;

int nType = pBS[4] & 0x1F; // NAL type in fixed position 
if ( nType <= H264NT_PPS )
    return nType;// nTYPE is 5 means key frame


return 0;
}

-------------------------------------------------------------------------------------------------------------

NAL syntax and semantics

NAL layer syntax:

In the code stream output by the encoder, the basic unit of data is the syntax element.

Syntax characterizes the organizational structure of syntactic elements.

Semantics describes the specific meaning of syntactic elements.

Each packet has a header, and the decoder can easily detect the boundary of the NAL, and take out the NAL for decoding in turn.

However, in order to save the code stream, H.264 does not additionally set up a syntax element indicating the starting position in the header of the NAL.

If the encoded data is stored on the medium, since the NALs are closely connected in sequence, the decoder cannot tell where each NAL starts and ends in the data stream.

Solution: Add start code before each NAL: 0X000001

On some types of media, for the convenience of addressing, the data stream is required to be aligned in length, or an integer multiple of a certain constant. So add a few bytes of 0 to pad before the start code.

Detect start of NAL:

0X000001 and 0X000000

We must consider when 0X000001 and 0X000000 appear inside NAL

solution:

H.264 proposes a "anti-competition" mechanism:

0X000000——0X00000300

0X000001——0X00000301

0X000002——0X00000302

0X000003——0X00000303

For this, we can know:

In a NAL unit, the following three-byte sequence should not occur at any byte-aligned position

0X000000

0X000001

0X000002

Forbidden_zero_bit =0;

Nal_ref_idc: Indicates the priority of NAL. 0 to 3, the larger the value, the more important the current NAL is and needs to be protected first. If the current NAL is a slice belonging to a reference frame, or a sequence parameter set, or an important unit of an image parameter set, this syntax element must be greater than 0.

Nal_unit_type: the type of the current NAL unit


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325953787&siteId=291194637