Detailed explanation of SPS PPS

https://zhuanlan.zhihu.com/p/27896239

https://zhuanlan.zhihu.com/p/27896239


Detailed explanation of SPS PPS in H264 code stream

Detailed explanation of SPS PPS in H264 code stream

DaveBobo DaveBobo
7 months ago
  • 1 Where do SPS and PPS come from?
  • 2 What does each parameter in SPS and PPS do?
  • 3 How to parse the SPS and PPS strings of H.264 contained in SDP?

1 Client packet capture

When doing client video decoding, the Wireshark packet capture tool is generally used to analyze the received H264 stream, as shown below:

Here we can see the SPS and PPS that play a key role in decoding the video.

Double-click the SPS content as follows:

Double-click the PPS content as follows:

Then from the above sps we know the width and height of the image.

Width=(19+1)*16=320

High=(14+1)*16=240

Why? Refer below

2 SPS PPS detailed explanation

2.1 SPS syntax elements and their meanings

A variety of different NAL Unit types are specified in the H.264 standard protocol
, wherein type 7 indicates that the data stored in the NAL Unit is a Sequence Paramater
Set. Among the various syntax elements of H.264, the information in the SPS is crucial. If the data in it is missing or erroneous, the decoding process is likely to fail. SPS and the image parameter set PPS that will be described later are also usually used as initialization information of decoder instances in video processing frameworks of some platforms (such as VideoToolBox of iOS, etc.).

SPS stands for Sequence Paramater Set, also known as sequence parameter set. A set of global parameters of a coded video sequence (Coded
video sequence) are stored in the SPS.
The so-called coded video sequence is a sequence composed of the structure of the frame-by-frame pixel data of the original video after being coded. The parameters on which the encoded data of each frame depends are stored in the image parameter set. In general, the NAL
Unit of SPS and PPS is usually located at the beginning of the entire code stream. However, in some special cases, these two structures may also appear in the middle of the code stream. The main reasons may be:

  • The decoder needs to start decoding in the middle of the stream;
  • The encoder changes the parameters of the code stream (such as image resolution, etc.) during the encoding process;

When making a video player, in order to allow the subsequent decoding process to use the parameters contained in the SPS, the data in it must be parsed. The SPS format specified in the H.264 standard protocol is located in section 7.3.2.1.1 of the document, as shown in the following figure:

Each of these syntax elements and their meanings are as follows:

(1) profile_idc:

Identifies the profile of the current H.264 stream. We know that three commonly used profiles are defined in H.264:

Benchmark profile: baseline profile;

Main profile: main profile;

Extended profile: extended profile;

In the SPS of H.264, the first byte represents profile_idc, and according to the value of profile_idc, it can be determined which grade the code stream conforms to. The judgment rule is:

profile_idc = 66 → baseline profile;

profile_idc = 77 → main profile;

profile_idc = 88 → extended profile;

In the new version of the standard, it also includes High, High 10, High 4:2:2, High 4:4:4, High 10 Intra, High
4:2:2 Intra, High 4:4:4 Intra, CAVLC 4 :4:4 Intra, etc., each represented by a different profile_idc.

In addition, constraint_set0_flag ~ constraint_set5_flag are some other additional restrictive conditions that are added to the code stream in terms of coding levels.

In our experimental code stream, profile_idc = 0x42 = 66, so the grade of the code stream is the baseline profile.

(2) level_idc

Identifies the Level of the current code stream. The encoded Level defines parameters such as the maximum video resolution and maximum video frame rate under certain conditions, and the level complied with by the code stream is specified by level_idc.

In the current code stream, level_idc = 0x1e = 30, so the level of the code stream is 3.

(3) seq_parameter_set_id

Indicates the id of the current sequence parameter set. Through the id value, the image parameter set pps can refer to the parameters in the sps it represents.

(4) log2_max_frame_num_minus4

The value used to calculate MaxFrameNum. The calculation formula is MaxFrameNum = 2^(log2_max_frame_num_minus4 +
4). MaxFrameNum is the upper limit of frame_num, and frame_num is a representation method of image sequence number, which is often used as a means of reference frame marking in inter-frame coding.

(5) pic_order_cnt_type

Represents a method for decoding picture order count (POC). POC is another way to measure the image serial number, which has a different calculation method from frame_num. The value of this syntax element is 0, 1, or 2.

(6) log2_max_pic_order_cnt_lsb_minus4

The value used to calculate MaxPicOrderCntLsb, which represents the upper limit of the POC. The calculation method is MaxPicOrderCntLsb = 2^(log2_max_pic_order_cnt_lsb_minus4 + 4).

(7) max_num_ref_frames

Used to indicate the maximum number of reference frames.

(8) gaps_in_frame_num_value_allowed_flag

Flag indicating whether discontinuous values ​​are allowed in frame_num.

(9) pic_width_in_mbs_minus1

Used to calculate the width of the image. The unit is the number of macroblocks, so the actual width of the image is:

frame_width = 16 × (pic\_width\_in\_mbs_minus1 + 1);

(10) pic_height_in_map_units_minus1

Use PicHeightInMapUnits to measure the height of an image in a video. PicHeightInMapUnits is not an explicit height of the image in pixels or macroblocks, but needs to consider whether the macroblock is frame coded or field coded. PicHeightInMapUnits is calculated as:

PicHeightInMapUnits = pic\_height\_in\_map\_units\_minus1 + 1;

(11) frame_mbs_only_flag

Flag bit, indicating the encoding method of the macroblock. When the flag is 0, the macroblock may be frame coded or field coded; when the flag is 1, all macroblocks use frame coding. The meaning of PicHeightInMapUnits is different according to the value of the flag. When it is 0, it means the height of one field of data calculated by macroblocks, and when it is 1, it means the height of one frame of data calculated by macroblocks.

The calculation method of the actual height FrameHeightInMbs of the image calculated according to the macroblock is:

FrameHeightInMbs = ( 2 − frame_mbs_only_flag ) * PicHeightInMapUnits

(12) mb_adaptive_frame_field_flag

Flag bit, indicating whether macroblock-level frame-field adaptive coding is used. When the flag is 0, there is no switching between frame coding and field coding; when the flag is 1, the macroblock may choose between frame coding and field coding modes.

(13) direct_8x8_inference_flag

Flag bit, used for derivation and calculation of motion vectors in B_Skip and B_Direct modes.

(14) frame_cropping_flag

Flag bit, indicating whether the output image frame needs to be cropped.

(15) vui_parameters_present_flag

Flag bit, indicating whether VUI information exists in the SPS.

2.2 PPS syntax elements and their meanings

In addition to the sequence parameter set SPS, another important parameter set in H.264 is the picture parameter set Picture Paramater
Set (PPS). Usually, PPS is similar to SPS, and is stored in a NAL Unit in the H.264 bare code stream, but
the nal_unit_type value of the PPS NAL Unit is 8; while in the package format, PPS is usually stored together with SPS in the in the header of the video file.

In the H.264 protocol document, the structure of PPS is defined in Section 7.3.2.2, and the specific structure is shown in the following table:

Each of these syntax elements and their meanings are as follows:

(1) pic_parameter_set_id

Indicates the id of the current PPS. A certain PPS will be referenced by the corresponding slice in the code stream. The way the slice refers to the PPS is to save the id value of the PPS in the slice header. The value range is [0,255].

(2) seq_parameter_set_id

Indicates the id of the active SPS referenced by the current PPS. In this way, the parameters in the corresponding SPS can also be obtained from the PPS. The value range is [0,31].

(3) entropy_coding_mode_flag

Entropy coding mode flag, the flag bit indicates the algorithm selected for entropy coding/decoding in the code stream. For some syntax elements, under different encoding configurations, different entropy encoding methods are selected. For example, in a macroblock syntax element, the syntax element descriptor of the macroblock type mb_type is "ue(v)
| ae(v)", and exponential Golomb coding is used under the settings such as baseline profile, and CABAC is used under settings such as main profile. coding.

The role of the flag entropy_coding_mode_flag is to control this algorithm selection. When the value is 0, the algorithm on the left is selected, usually Exponential Golomb coding or CAVLC; when the value is 1, the algorithm on the right is selected, usually CABAC.

(4) bottom_field_pic_order_in_frame_present_flag

The flag is used to indicate whether the two syntax elements delta_pic_order_cnt_bottom and delta_pic_order_cn in another slice header exist. These two syntax elements indicate the calculation method of the POC of the bottom field of a certain frame.

(5) num_slice_groups_minus1

Indicates the number of slice groups in a frame. When the value is 0, all slices in a frame belong to a slice group. A slice group is a grouping of macroblocks in a frame, defined in section 3.141 of the protocol document.

(6) num_ref_idx_l0_default_active_minus1、num_ref_idx_l0_default_active_minus1


Indicates the default values ​​of the syntax elements num_ref_idx_l0_active_minus1 and num_ref_idx_l1_active_minus1 of the P/SP/B slice when the num_ref_idx_active_override_flag flag in the Slice Header is 0 .

(7) weighted_pred_flag

Flag bit, indicating whether weighted prediction is enabled in the P/SP slice.

(8) weighted_bipred_idc

Indicates the method of weighted prediction in B Slice, the value range is [0, 2]. 0 means default weighted prediction, 1 means explicit weighted prediction, 2 means implicit weighted prediction.

(9) pic_init_qp_minus26和pic_init_qs_minus26

Represents the initial quantization parameter. The actual quantization parameter is calculated from this parameter and slice_qp_delta/slice_qs_delta in the slice header.

(10) chroma_qp_index_offset

The quantization parameter used to calculate the chroma components, the value range is [-12, 12].

(11) deblocking_filter_control_present_flag

Flag bit, used to indicate whether there is information for deblocking filter control in the Slice header. When the flag bit is 1, the slice header contains the corresponding information of deblocking filtering; when the flag bit is 0, there is no corresponding information in the slice header.

(12) constrained_intra_pred_flag

If the flag is 1, it means that the I macroblock can only use the information from the I and SI type macroblocks when performing intra-frame prediction; if the flag is 0, it means that the I macroblock can use the information from the Inter type macroblock.

(13) redundant_pic_cnt_present_flag

Flag bit used to indicate whether the redundant_pic_cnt syntax element exists in the Slice header. When the flag bit is 1, the slice header contains redundant_pic_cnt; when the flag bit is 0, there is no corresponding information in the slice header.

3 Parse the SPS and PPS strings of H.264 contained in SDP

When using RTP to transmit H264, the sdp protocol description needs to be used. There are two items: Sequence Parameter Sets (SPS)
and Picture Parameter Set
(PPS) that need to be used, so where do these two get from? The answer is from the H264 code Obtained from the stream. In the H264 code stream, "0x00 0x00 0x01" or "0x00
0x00 0x00 0x01" is the start code. After finding the start code, use the lower 5 bits of the first byte after the start code to judge Whether it is 7(sps) or 8(pps),
and data[4] & 0x1f == 7 ||
data[4] & 0x1f ==
8. Then remove the start code from the obtained nal and perform base64 encoding to obtain the information It can be used for sdp.sps and pps need to be separated by commas.

The SPS and PPS strings of H.264 in SDP contain the information parameters required to initialize the H.264 decoder, including the profile, level, image width and height, deblock filter, etc. used for encoding.

Since both SPS and PPS in SDP are encoded in BASE64, it is not easy to understand. There is a tool software that can parse SPS and PPS in SDP. The download address is: davebobo

The usage is to enter on the command line:

spsparser sps.txt pps.txt output.txt

For example, the content in sps.txt is:

Z0LgFNoFglE=

The content in pps.txt is:

aM4wpIA =

The final analysis result is:

Here we need to mention these two parameters in particular

pic_width_in_mbs_minus1 = 21

pic_height_in_mbs_minus1 = 17

Indicates the width and height of the image, respectively, minus 1 from the value in macroblocks (16x16)

Therefore, the actual width is (21+1)*16 = 352 and the height is (17+1)*16 = 288

At this point, you should know the problems left over from the first part of the client-side packet capture calculation of the image width and height.

Reference:

cnblogs.com/lidabo/p/65

blog.csdn.net/heanyu/ar

blog.csdn.net/shaqoneal

blog.csdn.net/shaqoneal


If the article is beneficial to more friends, please share it in the circle of friends. [Dry goods of video and audio image technology, streaming media, image algorithm,


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325513442&siteId=291194637