H264 stream packaging analysis (essence) | H.264 video transmission system implementing RTP protocol

H264 code stream packaging analysis

SODB data bit string --> the most original encoded data

RBSP original byte sequence payload --> A trailing bit (RBSP trailing bits is a bit "1") is added after SODB, and a number of bits "0" are used for byte alignment.

EBSP Extended Byte Sequence Payload --> The imitation check byte (0X03) is added on the basis of RBSP. The reason is: When NALU is added to Annexb, the start code StartCodePrefix before each group of NALU needs to be added. If The slice corresponding to the NALU is the start of a frame, which is represented by 4-bit bytes, ox00000001, otherwise, ox000001 is represented by 3-bit bytes. If two bytes are consecutively 0, insert a byte of 0x03. Remove 0x03 when decoding. Also known as the unpacking operation.

The function of h264 is divided into two layers, video coding layer (VCL) and network extraction layer (NAL)

        The VCL data is the video data sequence after being compressed and encoded. After the VCL data is encapsulated into NAL units, it can be used for transmission or storage. The NAL unit format is as follows:

   

Nal head

EBSP

Nal head

EBSP

Nal head

EBSP


NAL unit
    Each NAL unit is a variable-length byte string of certain syntax elements, including header information containing one byte (used to indicate the data type), and several integer bytes of payload data. A NAL unit can carry a coded slice, A/B/C type data partition, or a sequence or picture parameter set.

  NAL units are delivered sequentially by RTP sequence number. Among them, T is the load data type, occupying 5 bits; R is the importance indicator bit, occupying 2 bits; the last F is the prohibition bit, occupying 1 bit. details as follows:

  (1) NALU type bit

  It can represent 32 different types of features of NALU. Types 1 to 12 are defined by H.264, and types 24 to 31 are used for other than H.264. The RTP payload specification uses some of these values ​​to define packet aggregation and splitting. Other values ​​are reserved for H.264.

  (2) Importance indicator bit

  Used to mark the importance of a NAL unit during reconstruction, the higher the value, the more important. A value of 0 indicates that this NAL unit is not used for prediction and therefore can be discarded by the decoder without error propagation; a value higher than 0 indicates that this NAL unit is to be used for drift-free reconstruction, and higher values The greater the impact of loss.

  (3) Disable bit

  The default value in encoding is 0. When the network recognizes that there is a bit error in this unit, it can be set to 1, so that the receiver can discard the unit. It is mainly used to adapt to different types of network environments (such as wired and wireless environments) . For example, for a gateway from wireless to wired, one side is a wireless non-IP environment, and the other side is a bit error free environment of the wired network. Assuming that a NAL unit arrives at the wireless side and the checksum detection fails, the gateway may choose to remove the NAL unit from the NAL stream, or forward a known corrupted NAL unit to the receiver. In this case, a smart decoder will try to reconstruct this NAL unit (which is known to contain bit errors). Rather, a non-intelligent decoder will simply discard this NAL unit. The NAL unit structure specifies a general format for packet-oriented or for streaming transport subsystems. In H.320 and MPEG-2 systems, a stream of NAL units should be within NAL unit boundaries, with a 3-byte start prefix before each NAL unit. In a packet transmission system, the frame boundary of the NAL unit is determined by the transmission procedure of the system, so the above-mentioned start prefix code is not required. A group of NAL units is called an access unit, delimited followed by timing information (SEI) to form the basic coded picture. The primary coded picture (PCP) consists of a set of coded NAL units, followed by a redundant coded picture (RCP), which is a redundant representation of the same video picture of the PCP, and is used to recover information in case of PCP loss during decoding. If the coded video picture is the last picture of the coded video sequence, the end of the sequence NAL unit shall appear to indicate the end of the sequence. An image sequence has only one sequence parameter set and is decoded independently. The end of the stream shall appear if the encoded picture is the last picture of the entire stream of NAL units. 

  H.264 adopts the above strict access unit, which not only makes H.264 adaptable to various networks, but also further improves its anti-error capability. The setting of the serial number can find out which VCL unit is lost. The redundant coded picture makes it possible to obtain a relatively "rough" picture even if the basic coded picture is lost.


H.264 Video Transmission System Realizing RTP Protocol

1. Introduction
       With the development of the information industry, people's requirements for information resources have gradually shifted from text and pictures to audio and video, and more and more emphasis has been placed on real-time and interactive access to resources. But people face another inevitable embarrassment, which is having to spend a lot of time waiting for files to be transferred while seeing vivid and clear media presentations on the Internet. In order to solve this contradiction, a new media technology came into being, which is streaming media technology. Streaming media has gradually become people's first choice due to its advantages of small startup delay and saving client storage space. Streaming media network applications are also developing continuously around the world. Among them, the real-time streaming protocol RTP specifies the standard data packet format for transmitting audio and video on the Internet. It is used in conjunction with the transmission control protocol RTCP and has become one of the most commonly used protocols in streaming media technology. 
        H.264/AVC is a new-generation video coding standard jointly formulated by the Joint Video Team (JVT), which is a joint effort of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). It has a high data compression ratio. Under the same image quality, the compression ratio of H.264 is more than 2 times that of MPEG-2 and 1.5 to 2 times that of MPEG-4. At the same time, the layered design of video coding layer (VCL) and network abstraction layer (NAL) is very suitable for real-time transmission of streaming media technology. This article is based on the RTP protocol to stream and package H.264 video to achieve a basic streaming media server function. At the same time, the open source player VLC is used as the receiver to form a complete H.264 video transmission system.

2. Setting of key parameters of RTP protocol

         RTP protocol is a new protocol proposed by IETF in 1996 for real-time data transmission. The RTP protocol is actually composed of two parts, the real-time transport protocol RTP (Real-time Transport Protocol) and the real-time transport control protocol RTCP (Real-time Transport Control Protocol). RTP protocol provides users with real-time transmission service of continuous media data based on multicast or unicast network; RTCP protocol is the control part of RTP protocol, which is used to monitor the quality of data transmission in real time and provide congestion control and flow control for the system. The RTP protocol is described in detail in RFC3550. Each RTP data packet consists of a fixed header (Header) and a payload (Payload). The meaning of the first 12 bytes of the header is fixed, and the payload can be audio or video data. The format of the RTP fixed header is shown in Figure 1: 


       The key parameter settings are explained as follows:
      (1) Flag bit (M): 1 bit, the meaning of this flag bit is generally defined by a specific media application framework (profile), and the purpose is to mark important events in the RTP stream.
     (2) Payload type (PT): 7 bits, used to indicate the specific format of the RTP payload. In RFC3551, the default value is specified for the RTP transmission payload type of the commonly used audio and video format. For example, type 2 indicates that the RTP data packet carries the voice data encoded by the ITU G.721 algorithm, and the frequency is 8000HZ, and monophonic.
    (3) Sequence number: 16 bits, each time an RTP packet is sent, the sequence number increases by 1. The receiver can use it to detect packet loss and restore packet order.
   (4) Timestamp: 32 bits, the timestamp indicates the sampling time of the first byte in the RTP data packet, reflecting the deviation of each RTP packet from the initial value of the timestamp. For the RTP sender, the sampling time must be derived from a linearly monotonically increasing clock.
       It is not difficult to see from the format of the RTP data packet, which contains information such as the type, format, serial number, timestamp, and whether there is additional data of the transmission medium. These all provide the corresponding basis for real-time streaming media transmission. The transmission control protocol RTCP provides congestion control and flow control for RTP transmission. For its specific packet structure and the meaning of each field, please refer to RFC3550, which will not be repeated here.

At the end of the article, scan the code to receive the audio and video learning package for free , including (C/C++, Linux, FFmpeg webRTC rtmp hls rtsp ffplay srs, etc.)

 

3. H.264 elementary stream structure and its transmission mechanism

3.1 Structure of the H.264 elementary stream

The structure of elementary stream (ES) of H.264 is divided into two layers, including video coding layer (VCL) and network adaptation layer (NAL). The video coding layer is responsible for efficient video content representation, while the network adaptation layer is responsible for packaging and delivering data in the appropriate manner required by the network. The benefits of introducing NAL and separating it from VCL include two aspects: first, to separate signal processing and network transmission, VCL and NAL can be implemented on different processing platforms; In different network environments, the gateway does not need to reconstruct and re-encode the VCL bit stream because of different network environments.
       The basic stream of H.264 consists of a series of NALU (Network Abstraction Layer Unit), and the amount of data of different NALUs varies. The H.264 draft pointed out [2] that when the data stream is stored on the medium, a start code: 0x000001 is added before each NALU to indicate the start and end position of a NALU. Under such a mechanism, * the start code is detected in the code stream as a NALU start identifier. When the next start code is detected, the current NALU ends. Each NALU unit consists of a one-byte NALU header (NALU Header) and several bytes of payload data (RBSP). The format of the NALU header is shown in Figure 2:

                                                                                                                    
        F: forbidden_zero_bit.1 bit, 1 if there is a syntax conflict. When the network recognizes that this unit has a bit error, it can be set to 1 so that the receiver drops the unit. 
        NRI: nal_ref_idc.2 bit, used to indicate the importance level of this NALU. The larger the value, the more important the current NALU is. When it is greater than 0, it is not specified.

 Type: 5 bits, indicating the type of NALU. The details are shown in Table 1:

                                                                                     
      It should be noted that the NALUs with NRI values ​​of 7 and 8 are the sequence parameter set (sps) and the image parameter set (pps), respectively. A parameter set is a set of infrequently changing data that provides decoding information for a large number of VCL NALUs. The sequence parameter set acts on a series of consecutive coded pictures, while the picture parameter set acts on one or more independent pictures in the coded video sequence. If * does not receive these two parameter sets correctly, then other NALUs cannot be decoded either. Therefore, they are generally sent before other NALUs are sent, and are transmitted using a different channel or a more reliable transport protocol (such as TCP), which can also be repeated.

3.2 Transmission Mechanism for H.264 Video The
       structure of RTP protocol and H.264 elementary stream was discussed above, so how to use RTP protocol to transmit H.264 video? An effective way is to strip it from H.264 video Each NALU is output, the corresponding RTP header is added before each NALU, and then the data packet containing the RTP header and NALU is sent out. The following will be described separately from the RTP header and NALU.
      The format of the complete RTP fixed packet header has been pointed out in Figure 1 above. According to RFC3984[3], the specific settings of each bit are given in detail here.
      V: Version number, 2 digits. According to RFC3984, the currently used RTP version number should be set to 0x10.
      P: padding bit, 1 bit. No special encryption algorithm is currently used, so this bit is set to 0.
      X: Extended bit, 1 bit. The current fixed header is not followed by header extensions, so this bit is also 0.
      CC: CSRC count, 4 bits. Indicates the number of CSRCs following the RTP fixed header. For the basic streaming media server to be implemented in this article, the mixer is not used, and this bit is also set to 0x0.
       M: Mark bit, 1 bit. If the current NALU is the last NALU of an access unit, the M bit is set to 1; or when the current RTP packet is the last fragment of a NALU (the fragmentation of NALU will be described later), the M bit is set to 1. The M bit remains 0 in all other cases. 
       PT: Payload type, 7 bits. There is currently no default PT value specified for the H.264 video format. So a value greater than 95 is fine. Here it is set to 0x60 (96 decimal).
      SQ: Serial number, 16 bits. The starting value of the sequence number is a random value, which is set to 0 here. Each time an RTP data packet is sent, the sequence number value increases by 1.
      TS: Timestamp, 32 bits. Like the serial number, the starting value of the timestamp is also a random value, which is set to 0 here. According to RFC3984, the clock frequency corresponding to the timestamp must be 90000HZ.
      SSRC: Sync source flag, 32 bits. The SSRC shall be randomly generated so that no two synchronization sources in the same RTP session have the same SSRC identifier. There is only one sync source here, so set it to 0x12345678.
      For each NALU, its size varies according to the amount of data it contains. In an IP network, when the size of the IP packet to be transmitted exceeds the Maximum Transmission Unit (MTU), IP fragmentation occurs. The maximum IP telegram (MTU) size that can be transmitted in an Ethernet environment is 1500 bytes. If the sent IP packet is larger than the MTU, the packet will be split and transmitted, which will generate a lot of packet fragmentation, increase the packet loss rate, and reduce the network speed. For video transmission, if the RTP packet is larger than the MTU and is arbitrarily unpacked by the underlying protocol, it may cause delayed playback of the player at the receiving end or even fail to play normally. Therefore, for NALU units larger than MTU, unpacking must be performed.

RFC3984 gives 3 different RTP packaging schemes:

(1) Single NALU Packet: Only one NALU is encapsulated in an RTP packet. In this paper, this packing scheme is adopted for NALUs less than 1400 bytes.
       (2) Aggregation Packet: Encapsulate multiple NALUs in one RTP packet. This packing scheme can be used for smaller NALUs to improve transmission efficiency.
       (3) Fragmentation Unit: One NALU is encapsulated in multiple RTP packets. In this paper, this scheme is used for unpacking processing for NALUs larger than 1400 bytes.

4. Implementation of H.264 Streaming Media Transmission System

      A complete streaming media transmission system consists of two parts, the server side and the client side [5][6]. For the server side, its main task is to read the H.264 video, separate each NALU unit from the code stream, analyze the type of NALU, set the corresponding RTP header, encapsulate the RTP data packet and send it. For the client, its main task is to receive the RTP data packet, parse the NALU unit from the RTP packet, and then send it to * for decoding and playback. The frame of the streaming media transmission system is shown in Figure 3.

                                                                                                

5 Conclusion

The server side of the streaming media transmission system designed in this paper runs on Windows XP system, and uses VLC player as the client to receive H.264 video RTP data packets. After testing, the client can play smoothly after 2 seconds of buffering. When the transmission speed is set to 30 frames per second, there is no phenomenon such as packet loss and smear. There is no noticeable difference in the video.

AnyChat adopts the international leading video coding standard H.264 (MPEG-4 part 10 AVC /H.264) coding, H.264/AVC has a special performance in terms of compression efficiency, generally reaching MPEG-2 and MPEG-4 About 2 times the compression efficiency of the simplified class. H.264 has many new features that differ from the old standard, which together achieve coding efficiency improvements. Especially in intra prediction and coding, inter prediction and coding, variable vector block size, quarter pixel motion estimation, multi-reference frame prediction, adaptive loop deblocking filter, integer transform, quantization and transform coefficients Scanning, entropy coding, weighted prediction and other implementations have their own unique considerations.

 Bairui Technology adopts advanced demosaicing technology to ensure that there will be no blurry screen, mosaic and other phenomena in the process of video communication. Free test download address:
http://www2.bairuitech.com/downloads/bairuisoft/AnyChatCoreSDK_V3.0.rar

H264 video file frame format transmission encapsulation and other miscellaneous

rfc3984
Standards Track [Page 2] RFC 3984 RTP Payload Format for H.264 Video February 2005 1. Implement H264 video streaming media nalu unit packet start 0x 00 00 00 01
according to RFC3984 protocol H. 264 NAL format and analyzer http://hi.baidu.com/zsw%5Fdavy/b ... c409cc7cd92ace.html http://hi.baidu.com/zsw_davy/blo ... 081312c8fc7acc.html ---- ------------------------------Bitstream information ------------------ -----------------------------









①NALU (Network Abstract Layer Unit): The bit streams in the two standards are in NAL units, each NAL unit contains an RBSP, and the header information of the NALU defines the type of RBSP. Types generally include sequence parameter set (SPS), image parameter set (PPS), enhancement information (SEI), slice (Slice), etc. Among them, SPS and PPS belong to parameter sets, and the two standards adopt the parameter set mechanism to combine some main parameters. The sequence, image parameters (decoded image size, number of slice groups, number of reference frames, quantization and filter parameter flags, etc.) are separated from other parameters and decoded first by the decoder. In addition, in order to enhance the clarity of the image, AVS-M adds the picture header (Picture head) information. In the process of reading NALU, there is a start code 0x000001 before each NALU. In order to prevent the internal 0x000001 sequence competition, the H.264 encoder inserts a new byte before the last byte - 0x03, so the decoder detects that In this sequence, 0x03 needs to be deleted, and AVS-M only needs to recognize the start code 0x000001.


②Read the macroblock type (mb type) and the macroblock coding template (cbp): The encoded and decoded image is divided into macroblocks, and a macroblock consists of a 16*16 luminance block and a corresponding 8*8cb and a 8*8cr chrominance block composition.


(a) The division of macroblocks is different in the intra-frame and inter-frame prediction of the two standards. In H.264, the I_slice luminance block has Intra_4*4 and Intra_16*16 modes, and the chroma block has only 8*8 mode; P_slice macroblocks are divided into 16*16, 16*8, 8*16, 8*8, 8*4, 4*8, 4*4 a total of 7 modes. In AVS-M, the I_slice luminance block has two modes: I_4*4 and I_Direct, and the division of macroblocks in P_slice is consistent with the division in H.264.


(b) The calculation of the macroblock cbp value of the two standards is also different. In H.264, the luminance (chrominance) cbp of the Intra_16*16 macroblock is directly obtained by reading the mb type; the luminance cbp=coded_block_pattern%16 of the non-Intra_16*16 macroblock, and the chrominance cbp=coded_block_pattern/16. Among them, the lowest 4 bits of the luminance cbp are valid, and each bit determines whether the residual coefficient of the corresponding macroblock can be 0; when the chrominance cbp is 0, the corresponding residual coefficient is 0, and when the cbp is 1, the DC residual coefficient is not 0 , when the AC coefficient is 0 and cbp is 2, the DC and AC residual coefficients are not 0. In AVS-M, when the macroblock type is not P_skip, the index value of cbp is directly obtained from the code stream, and the index value is used to look up the table to obtain the codenum value, and then the codenum is used to look up the table to obtain the intra/inter frame cbp respectively. This cbp is 6 bits, and each bit represents whether the macroblock can contain non-zero coefficients when it is divided into 8*8. When the transform coefficient is not 0, it is necessary to further read the value of each bit in cbp_4*4 to determine whether an 8*8 block contains a non-zero coefficient. Can the coefficients of four 4*4 blocks be 0?
-------------------------------------------------- --------------------------------------------
In general the code of H264 There are two ways to pack the stream, one is the annex-b byte stream format, which is the default output format of most encoders, that is, the 3~4 bytes at the beginning of each frame are the start_code of H264, 0x00000001 or 0x000001.
The other is the original NAL packaging format, that is, the first few bytes (1, 2, 4 bytes) are the length of the NAL, not the start_code. At this time, a global data must be used to obtain the profile of the encoder. Level, PPS, SPS and other information can be decoded.
----------------------------------------------------------------------------
AVC vs. H.264
AVC and H.264 are synonymous. The standard is known by the full names "ISO/IEC 14496-10" and "ITU-T Recommendation H.264". In addition, a number of alternate names are used (or have been) in reference to this standard. These include:
 

  • MPEG-4 part 10
  • MPEG-4 AVC
  • AVC
  • MPEG-4 (in the broadcasting world MPEG4 part 2 is ignored)
  • H.264
  • JVT (Joint Video Team, nowadays rarely used referring to actual spec)
  • H.26L (early drafts went by this name)


All of the above (and those I've missed) include the Annex B byte-stream format. Unlike earlier MPEG1/2/4 and H.26x codecs, the H.264 specification proper does not define a full bit-stream syntax. It describes a number of NAL (Network Abstraction Layer) units, a sequence of which can be decoded into video frames. These NAL units have no boundary markers, and rely on some unspecified format to provide framing.

Annex B of of the document specifies one such format, which wraps NAL units in a format resembling a traditional MPEG video elementary stream, thus making it suitable for use with containers like MPEG PS/TS unable to provide the required framing. Other formats, such as ISO base media based formats, are able to properly separate the NAL units and do not need the Annex B wrapping.

The H.264 spec suffers from a deficiency. It defines several header-type NAL units (SPS and PPS) without specifying how to pack them into the single codec data field available in most containers. Fortunately, most containers seem to have adopted the packing used by the ISO format known as MP4.

1. H.264 start code
   When transmitting h264 data over the network, a UDP packet is a NALU, and the decoder can easily detect and decode the NAL boundary. But if the encoded data is stored as a file, the original decoder will not be able to separate the start and end positions of each NAL from the data stream, so h.264 uses the start code to solve this problem.

   During H.264 encoding, add the start code 0x000001 before each NAL, the decoder detects the start code in the code stream, and the current NAL ends. In order to prevent 0x000001 data from appearing inside NAL, h.264 proposes the 'anti-competition emulation prevention' mechanism. After encoding a NAL, if two consecutive 0x00 bytes are detected, a 0x03 will be inserted after it. When the decoder If the data of 0x000003 is detected inside the NAL, 0x03 is discarded and the original data is restored.
0x000000 >>>>>> 0x00000300
0x000001 >>>>>> 0x00000301
0x000002 >>>>>> 0x00000302
0x000003 >>>>>> 0x00000303

Attach the algorithm flow of detecting start code in h.264 decoding nalu  
for(;;)
{ if next 24 bits are 0x000001 {        startCodeFound = true        break;









}// for(;;)
if(true == startCodeFound)
{
    //startcode found
    // Flush the start code found
    flush 24 bits  
    //Now navigate up to next start code and put the in between stuff
    // in the nal structure.
    for(;;)
    {
      get next 24 bits & check if it equals to 0x000001
      if(false == (next 24 bits == 000001))
      {
         // search for pattern 0x000000
         check if next 24 bits are 0x000000
         if(false == result)
         {
                // copy the byte into the buffer
                copy one byte to the Nal unit             
         }
         else
         {                 break;          }       }       else       {              break;       }    }//for(;;) }    2. MPEG4 start code        The characteristic of MPEG4 is VOP, there is no concept of NALU, and startcode is still used to demarcate each frame. The start code of MPEG4 is 0x000001. In addition, many start codes in MPEG4 are also very useful, such as video_object_sequence_start_code 0x000001B0 indicates the start of a video object sequence, VO_start_code 0x000001B6 indicates the start of a VOP. The two bits after 0x000001B6 are 00 for I frame, 01 means P frame, 10 means B frame.











1. Introduction

The main objectives of H.264:

1. High video compression ratio

2. Good network affinity

Solution:

VCL video coding layer

NAL network abstraction layer Network abstraction layer

VCL: Definition of core algorithm engine, block, macroblock and slice syntax level

NAL: syntax level above slice level (such as Sequence parameter set and image parameter set), and supports the following functions: independent slice decoding, unique start code guarantee, SEI and stream format encoding data transmission VCL Design goal: Efficient encoding and decoding NAL

as independent of the network as possible

Design goal: Pack data into corresponding formats according to different networks, and adapt the bit strings generated by VCL to various networks and diverse environments.

NALU header structure: NALU type (5bit), importance indication bit (2bit), prohibition bit (1bit).

NALU type: 1 to 12 are used by H.264, 24 to 31 are used by applications other than H.264.

Importance Indication: Indicates the importance of this NAL unit when it is used for reconstruction. The larger the value, the more important it is.

Forbidden bit: When the network finds that the NAL unit has a bit error, it can set this bit to 1, so that the receiver can discard the unit.

2. NAL syntax semantics

NAL layer syntax:

In the code stream output by the encoder, the basic unit of data is the syntax element.

Syntax characterizes the organizational structure of syntactic elements.

Semantics describes the specific meaning of syntactic elements.

Each packet has a header, and the decoder can easily detect the boundary of the NAL, and take out the NAL for decoding in turn.

However, in order to save the code stream, H.264 does not additionally set up a syntax element indicating the starting position in the header of the NAL.

If the encoded data is stored on the medium, since the NALs are closely connected in sequence, the decoder cannot tell where each NAL starts and ends in the data stream.

Solution: Add a start code before each NAL: 0X000001

On some types of media, for the convenience of addressing, the data stream is required to be aligned in length, or an integer multiple of a certain constant. So add a few bytes of 0 to pad before the start code.

Detecting the start of NAL:

0X000001 and 0X000000

We must consider when 0X000001 and 0X000000 appear inside NAL

Solution:

H.264 proposes a "anti-contention" mechanism :

0X000000——0X00000300

0X000001——0X0000003030

0X000002——0X00000300302 Therefore, we can know: In the NAL unit, the following three-byte sequence should not appear in any byte-aligned position 0X000000 0X000001 0X000002 Forbidden_zero_bit =0; Nal_ref_idc: Indicates the priority of NAL. 0 to 3, the larger the value, the more important the current NAL is and needs to be protected first. If the current NAL is a slice belonging to a reference frame, or a sequence parameter set, or an important unit of an image parameter set, this syntax element must be greater than 0. Nal_unit_type: the type of the current NAL unit



















3. NAL layer processing of H.264

Structure diagram:

NAL uses NALU (NAL unit) as a unit to support the transmission of encoded data in a network based on packet switching technology.

It defines the data format that conforms to the requirements of the transport layer or storage medium, and at the same time gives header information, thus providing an interface between video coding and the outside world.

NALU: defines the basic format available for packet-based and bitstream-based systems

RTP encapsulation: only for native NAL interfaces based on NAL units.

Three different data forms:

SODB data bit string --> the most original coded data

RBSP original byte sequence load --> a trailing bit (RBSP trailing bits a bit "1") is added after SODB and several bits " 0" for byte alignment

EBSP extended byte sequence load --> imitation check byte (0X03) is added on the basis of RBSP. The reason is: When NALU is added to Annexb, it is necessary to add before each group of NALU The start code StartCodePrefix, if the slice corresponding to the NALU is the beginning of a frame, it is represented by 4-bit bytes, ox00000001, otherwise, ox000001 is represented by 3-bit bytes. When encoding, every time two consecutive bytes of 0 are encountered, a byte of 0x03 is inserted. Remove 0x03 when decoding. Also known as shelling operation

Process:

1. The SODB output by the VCL layer is encapsulated into nal_unit. Nal_unit is a general encapsulation format, which can be applied to the ordered byte stream mode and the IP packet exchange mode.

2. For different transport networks (circuit switching|packet switching), encapsulate nal_unit into the encapsulation format for different networks.



The specific process of the first step:

The bit stream SODB (String Of Data Bits) output by the VCL layer goes through the following three steps between the nal_unit:

1. The SODB byte is aligned and packaged into RBSP (Raw Byte Sequence Payload).

2. In order to prevent byte contention between the byte stream of RBSP and the SCP (start_code_prefix_one_3bytes, 0x000001) in the orderly byte stream transmission mode, the first three bytes of RBSP are detected cyclically. Add emulation_prevention_three_byte (0x03) before the section, the specific method:

nal_unit( NumBytesInNALunit ) { forbidden_zero_bit nal_ref_idc nal_unit_type NumBytesInRBSP = 0 for( i = 1; i < NumBytesInNALunit; i++ ) { if( i + 2 < NumBytesInNALunit && next_bits( 34 ) = = 0x0000 ) { rbsp_byte[ NumBytesInRBSP++ ] rbsp_byte[ NumBytesInRBSP++ ] i += 2 emulation_prevention_three_byte /* equal to 0x03 */ } else rbsp_byte[ NumBytesInRBSP++ ] }



























}

3. The RBSP after anti-byte competition processing adds a header of one byte (forbidden_zero_bit+ nal_ref_idc+ nal_unit_type) to encapsulate it into a nal_unit.

The specific process of the second step:



case1: Encapsulation of an ordered byte stream



byte_stream_nal_unit( NumBytesInNALunit ) { while ( next_bits( 24 ) != 0x000001 ) zero_byte /* equal to 0x00 */ if( more_data_in_byte_stream( ) ) { start_code_prefix_one_3bytes /* equal to 0x000001 */ nal_unit( NumBytesInNALunit ) } } Similar to H.320 and MPEG-2/H.222.0 In other transmission systems, NAL is transmitted as an ordered continuous byte or bit stream, while relying on the data itself to identify NAL unit boundaries. In such an application system, the H.264/AVC specification defines a byte stream format, and each NAL unit is preceded by a 3-byte prefix, that is, a synchronization byte. In bitstream applications, each image needs to add an additional byte to locate the boundary. There is also an optional feature that adds additional data to the byte stream to expand the amount of transmitted data, which can achieve fast boundary positioning and restore synchronization Case2: RTP packaging and encapsulation of IP networks Rules for packet packaging (1) Less overhead , so that the MTU size can be in the range of 100 ~ 64k bytes; (2) The importance of the packet can be judged without decoding the data in the packet;























(3) The payload specification should ensure that packets that are not decodable due to the loss of other bits can be identified without decoding;

(4) Support splitting NALUs into multiple RTP packets;

    (5) Support multiple NALUs in one RTP packet middle.

The header of the RTP can be the header of the NALU, and can implement the above packing rules.

A NALU is placed in an RTP packet, the NALU (including the NALU header that is also used as the payload header) is placed in the RTP payload, and the RTP header value is set. In order to prevent the IP layer from dividing the large packet again, the size of the fragmented packet is generally smaller than the MTU size. Due to the different paths of packet transmission, the decoding end needs to reorder the slice groups. The order information contained in RTP can be used to solve this problem.

NALU segmentation

For pre-encoded content, the NALU may be larger than the MTU size limit. Although the segmentation of the IP layer can make the data block smaller than 64 kilobytes, it cannot realize protection at the application layer, thus reducing the effect of the unequal weight protection scheme. Since UDP packets are smaller than 64 kilobytes and the length of a slice is too small for some applications, application layer packaging is part of the RTP packaging scheme.

The new discussion scheme (IETF) should meet the following characteristics:

(1) The blocks of NALU are transmitted in ascending order of RTP sequence number;

(2) The first and last NALU blocks can be marked;

(3) Lost blocks can be detected. piece.

NALU merging

Some NALUs such as SEI, parameter set, etc. are very small, and merging them together is beneficial to reduce header overhead. There are two types of set groupings:

(1) Single Time Set Grouping (STAP), which is combined by timestamp;

(2) Multiple Time Set Grouping (MTAP), where different time stamps can also be combined.

NAL standardizes the format of video data, mainly providing header information to suit the transmission and storage of various media. NAL supports a variety of networks, including:

1. Any real-time wired and wireless Internet service using RTP/IP protocol

2. 3. As MP4 file storage and multimedia information file service

. MPEG-2 system

4. Other network

NALs specify a general format suitable for both packet-oriented and streaming. In fact, the way of packet transmission and stream transmission is the same, the difference is that a start code prefix is ​​added in front of the transmission.

In a system like Internet/RTP packet-oriented transmission protocol, the packet structure contains the packet boundary identification byte. In this case, no sync bytes are required.

NAL units are divided into two types: VCL and non-VCL.

VCL NAL units contain video image sampling information, and

non-VCL contains various related additional information, such as parameter sets (header information, applied to a large number of VCL NAL units), additional performance-enhancing additional information Information, timing information, etc.

Parameter set:

Parameter set is information that rarely changes and is used for decoding a large number of VCL NAL units. It is divided into two types:

1. The sequence parameter set acts on a series of continuous video images, that is, a video sequence.

Between the two IDR images is the sequence parameter set. The difference between IDR and I frame is shown below.

2. Picture parameter sets, acting on one or more individual picture parameter sets in a video sequence. The

mechanism of the sequence and picture parameter sets reduces the transmission of repeated parameters. Each VCL NAL unit contains an identifier

pointing to the relevant picture parameter set, each The image parameter set contains an identifier, which points to the content of the relevant sequence parameter set.

Therefore, only a small number of pointer information is used to refer to a large number of parameters, which greatly reduces the information transmitted repeatedly by each VCL NAL unit.

The sequence and picture parameter sets can be sent before sending the VCL NAL unit, and repeated transmission, greatly improving the error correction capability. Sequence and image parameter sets can be delivered "in-band" or other, more reliable, "out-of-band" channels.

Guess you like

Origin blog.csdn.net/yinshipin007/article/details/126681601