h264 is an encoding and compression format, which can be encoded using the x264 library. The source code is open and can be downloaded and compiled.
-------------------------------------------------------------------------------------------------------------
H.264 Codec
h264 conceptually distinguishes between the Video Coding Layer (VCL) and the Network Abstraction Layer (NAL).
VCL contains Codec's signaling processing functions; and prediction mechanisms such as transform, quantization, and motion compensation; and loop filters. He follows the general concept of most video codecs today, macro-based encoders, using motion compensation based inter-picture prediction and transcoding of residual signals.
The (NAL) encoder encapsulates fragments of the output of the VCL encoder into network abstraction layer units (NAL units), which are suitable for transport over packet networks or for use in packet-oriented multiplexing environments.
-------------------------------------------------------------------------------------------------------------
Network Abstraction Layer Unit (NALU) type
The NAL unit type byte format is as follows:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+- +-+-+-+-+
|F|NRI| Type |
+---------------+
The semantics of NAL unit type byte components are specified in the H.264 specification, briefly The description is as follows.
F: 1 bit
forbidden_zero_bit. The H.264 specification states that a setting of 1 indicates a syntax violation.
NRI: 2 bits
The nal_ref_idc.00 value indicates the reconstructed reference picture of the NAL unit that is not used for inter picture prediction. Such NAL units can be discarded without risking
the integrity of the reference picture. A value greater than 0 indicates that decoding of the NAL unit requires maintaining the integrity of the reference picture.
Type: 5 bits
nal_unit_type. This part specifies the NAL unit load type defined in Table 7-1 of [1] and later in this document. To refer to all currently defined NAL unit types
and their semantics
-
0: Not specified
-
1: Segments that do not use data division in non-IDR images
-
2: Segmentation of Class A data in non-IDR images
-
3: Segmentation of B-type data in non-IDR images
-
4: Segmentation of C-type data in non-IDR images
-
5: Fragment of IDR image
-
6: Supplemental Enhancement Information (SEI)
-
7: Sequence Parameter Set (SPS)
-
8: Image Parameter Set (PPS)
-
9: Separator
-
10: Sequence terminator
-
11: Stream terminator
-
12: Fill data
-
13: Sequence parameter set extension
-
14: Prefixed NAL unit
-
15: Subsequence parameter set
-
16 – 18: Reserved
-
19: Auxiliary coded picture segment without data division
-
20: Encoded Fragment Extension
-
21 – 23: Reserved
-
24 – 31: Not specified
For example, after converting rgb to yuv (x264 only supports yuv encoding and compression), the front of the 264 file encoded by the x264 encoder is generally like this:
00 00 00 01 67 ....(sps)....... 00 00 00 01 68 .........(pps)....... 00 00 00 01 65 ......(IDR)...........
[Because the length information is not given in the NAL syntax, the actual transmission and storage systems need to add additional headers to delimit each NAL unit. ]
-------------------------------------------------------------------------------------------------------------
Parameter Set Concept (SPS/PPS)
A very basic design concept of H.264 is to generate self-contained packets, making mechanisms such as RFC2429 header repetition or MPEG-4's Header Extension Coding (HEC) [11] unnecessary.
This is achieved by decoupling the relative information of more than one segment from the media stream. High-level meta information SHOULD be sent reliably/asynchronously, not in advance with RTP
packet streams containing fragmented packets. (For applications that do not send this information over an out-of-band transport channel, means are also provided by sending this information in-band). A combination of high-level parameters is called a parameter set.
The H.264 specification includes two types of parameter sets: sequential parameter sets and image parameter sets. An active sequence parameter set remains unchanged in an encoded video sequence, and an active image parameter set
remains unchanged in an encoded image. The order and picture parameter set structure contains information such as picture size, optional coding mode used, macroblock to slice group mapping, etc.
In order to change picture parameters (eg picture size) without synchronously transmitting parameter set modifications to the fragment packet stream, encoders and decoders may maintain more than one
list of order and picture parameter sets. Each slice header contains a codeword indicating the order and picture parameter set used.
This mechanism allows to decouple the transmission of parameter sets from the packet flow, to transmit them by external means (i.e., as a side effect of capability exchange), or through a (reliable or unreliable) control protocol
they are never transmitted but are applied by design specifications A fix is even possible.
Frame and slice
-------------------------------------------------------------------------------------------------------------
size relationship
For some concepts that appear in H.264, the order from large to small is: sequence, image (mostly called frame, including I, P, B frame), slice group, slice (including I, P, B slice, SP slice, SI slice), NALU, macroblock, sub-macroblock, block, pixel .
NOTE: Images are organized in sequences .
-------------------------------------------------------------------------------------------------------------
frame, NALU, slice
(1) In the H.264 protocol, an image is a collection concept , and the top field, bottom field, and frame can all be called images (the image concept in this paper is a collection concept). Therefore, we can know that for the H.264 protocol, the names we are usually familiar with, such as: I frame, P frame, B frame, etc., are actually all we have embodied and refined the concept of image . The "frame" we mentioned in H.264 usually refers to an image that is not divided into fields;
(2) If the FMO (Flexible Macroblock Ordering) mechanism is not used, an image has only one slice group ;
(3), If multiple slices are not used, there is only one slice in a slice group ;
(4) If the DP ( data division ) mechanism is not used , a slice is a NALU , and a NALU is a slice .
3 Encoded slice data partition block Bslice_data_partition_b_layer_rbsp( )
4 Encoded slice data partition block Cslice_data_partition_c_layer_rbsp( )
also corresponds to the above:
H264NT_SLICE_DPA,
H264NT_SLICE_DPB,
H264NT_SLICE_DPC,
a frame can contain one or more Slices, slices are composed of macroblocks, which are the encodingThe basic unit of theory.
An image consists of 1 to N slice groups , and each slice group consists of one or several slices . A slice consists of one NALU or three NALUs (if there is data division). In the picture decoding process, the picture is always decoded , and then the decoded macroblocks are reassembled into pictures according to the picture group. In this sense, a slice is actually the largest decoding unit .
-------------------------------------------------- -------------------------------------------------- ---------
I,P,B frame dependencies
I frame is coded independently and does not depend on other frame data.
P frame depends on I frame data.B frame depends on I frame, P frame or other B frame data.
Correspondingly, 1 (coded slice of non-IDR image), 2 (coded slice data partition block A), 3 (coded slice data partition block B), 4 (coded slice data partition block C) in NAL nal_unit_type , 5 (coded strips of IDR images) types and three encoding modes of Slice: I_slice, P_slice, B_slice The five types in NAL nal_unit_type represent what information the next data represents and how to block it.
I_slice, P_slice, B_slice represent slices of type I, type P, slices of type B. Among them, I_slice is intra-frame prediction mode coding; P_slice is unidirectional prediction coding or intra-frame mode; B_slice is bidirectional prediction or intra-frame mode .
// H.264 NAL type
enum H264NALTYPE
{
H264NT_NAL = 0,
H264NT_SLICE, //P 帧
H264NT_SLICE_DPA,
H264NT_SLICE_DPB,
H264NT_SLICE_DPC,
H264NT_SLICE_IDR, // I 帧
H264NT_SEI,
H264NT_SPS,
H264NT_PPS,
};
// 0x00 0x00 0x00 0x01 0x65(0x45) The first four bytes are the frame header, and 0x65 is the key frame
// 0x00 0x00 0x01 0x65(0x45) is also a keyframe
H264GetNALType(unsigned char * pBSBuf, const int nBSLen)
{
if ( nBSLen < 5 ) // incomplete NAL unit
return H264NT_NAL;
unsigned char * pBS = (unsigned char *)pBSBuf;
int nType = pBS[4] & 0x1F; // NAL type in fixed position
if ( nType <= H264NT_PPS )
return nType;// nTYPE is 5 means key frame
return 0;
}
-------------------------------------------------------------------------------------------------------------
NAL syntax and semantics
NAL layer syntax:
In the code stream output by the encoder, the basic unit of data is the syntax element.
Syntax characterizes the organizational structure of syntactic elements.
Semantics describes the specific meaning of syntactic elements.
Each packet has a header, and the decoder can easily detect the boundary of the NAL, and take out the NAL for decoding in turn.
However, in order to save the code stream, H.264 does not additionally set up a syntax element indicating the starting position in the header of the NAL.
If the encoded data is stored on the medium, since the NALs are closely connected in sequence, the decoder cannot tell where each NAL starts and ends in the data stream.
Solution: Add start code before each NAL: 0X000001
On some types of media, for the convenience of addressing, the data stream is required to be aligned in length, or an integer multiple of a certain constant. So add a few bytes of 0 to pad before the start code.
Detect start of NAL:
0X000001 and 0X000000
We must consider when 0X000001 and 0X000000 appear inside NAL
solution:
H.264 proposes a "anti-competition" mechanism:
0X000000——0X00000300
0X000001——0X00000301
0X000002——0X00000302
0X000003——0X00000303
For this, we can know:
In a NAL unit, the following three-byte sequence should not occur at any byte-aligned position
0X000000
0X000001
0X000002
Forbidden_zero_bit =0;
Nal_ref_idc: Indicates the priority of NAL. 0 to 3, the larger the value, the more important the current NAL is and needs to be protected first. If the current NAL is a slice belonging to a reference frame, or a sequence parameter set, or an important unit of an image parameter set, this syntax element must be greater than 0.
Nal_unit_type: the type of the current NAL unit