h264 RTP packaging for webrtc source code reading

This article analyzes the code and version of webrtc packaging h264 rtp packagem98

1. RTP protocol

1.1 Overview of RTP protocol

  • Real-time Transport Protocol (RTP) is a network protocol that allows real-time audio and video data transmission over the network. The RTP protocol is mainly used to solve the problem of real-time transmission of multimedia data, especially for applications that are sensitive to delay and data loss.
  • The RTP protocol consists of two closely related parts: the RTP data protocol and the RTP control protocol (RTCP). The RTP data protocol is responsible for data transmission, and RTCP is responsible for monitoring service quality and providing synchronization and identification information.
  • The RTP protocol does not guarantee reliable data transmission, because in real-time applications, reducing delay and jitter is more important than ensuring data integrity. Therefore, RTP usually runs over unreliable transport protocols such as UDP.

1.2 Main features of RTP protocol

  • Real-time: The RTP protocol can handle the transmission of real-time data, including delay-sensitive communications such as video conferencing and voice calls.
  • Multicast transmission: RTP supports multicast, which can send data to multiple receivers at the same time.
  • Provide a sequence number: RTP provides a sequence number for each data packet so that the receiver can reassemble the data according to the sequence number.
  • Provide timestamps: RTP provides timestamps for each data packet so that the receiver can play the data synchronously.
  • Provide payload identification: RTP can identify different types of payloads (such as audio, video, etc.).

1.3 RTP protocol header structure

RTP header
in:

  • V (version number): This is a 2-bit field used to identify the version of RTP. The current version is 2.
  • P (padding): If this bit is set, then the end of the RTP header will contain some padding bytes which are not included in the length of the packet.
  • X (extended): If this bit is set, then the RTP header will contain an extended header. Head extensions can be used to store information such as video rotation angles.
  • CC (CSRC Count): This is a 4-bit field that indicates the number of CSRC identifiers.
  • M (Mark): This is a 1-bit field whose meaning is defined by the specific payload format. For example, in audio applications, if the M bit is set, then the packet usually contains the start of a time period.
  • PT (Payload Type): This is a 7-bit field used to identify the type of payload data in the RTP packet.
  • Sequence number: This is a 16-bit field used to identify the sequence number of the RTP packet sent by the sender.
  • Timestamp: This is a 32-bit field used to identify the sampling instance of the data.
  • SSRC (Sync Source): This is a 32-bit field that identifies the source of the RTP packet.
  • CSRC (contributing source): This is an optional field used to identify the contributing source for compound packets or mixed audio data, mainly used for MCU mixing.
  • Extension header: If X is set, the end of the RTP header will contain extension header information.

2. RTP packaging H.264

2.1 H.264 WAVE

H.264 divides video data into a series of Network Abstraction Layer Units (NALU). Each NALU contains a part of video data and has its own header structure. Each NALU consists of a NALU Header and RBSP. The figure below shows the NALU Header.
NAL unit header

  • F (forbidden bit): Occupies 1 bit, used to indicate whether the NALU is forbidden.
  • NRI (importance indication): Occupies 2 bits and is used to indicate the U importance of NAL. A higher importance value indicates that the NALU contains important video data.
  • Type (type): Occupies 5 bits and is used to indicate the type of NALU. Different types correspond to different video data, such as key frame, key frame, SPS (sequence parameter set), etc. Common h.264naul types are:
    0: Not used
    1: Non-key frame (P frame)
    2: Key frame (I frame)
    3: SPS (Sequence parameter set)
    4: PPS (Image parameter set)
    5: Slice layer (Slice Layer)
    6: Extended NALU
    7: Extended NALU
    8: Extended NALU
    9-23: Reserved
    24-31: Not used

2.2 RTP payload structure

RTP payloads are divided into three different payload structures:

  • Single NALU unit structure (Single NAL Unit Packet): Contains a single NALU, and the Type is consistent with H.264NALU at this time.
    RTP payload format for single NAL unit packet
    In the case of single NALU packaging, the entire H.264 NALU is packaged in one RTP packet. This method is applicable to the case where the size of the NALU does not exceed the maximum size limit of the RTP packet.

  • Aggregation Packet: Contains multiple NALUs.
    RTP payload format for aggregation packets
    At this time, the Type value should be set according to the table below.
    Type field for STAPs and MTAPs
    The packet in webrtc only supports the STAP-A (Single-Time Aggregation Packet) structure. As the name suggests, STAP-A means that all NALUs combined into RTP packets share the same timestamp.
    The following figure shows an example of a complete RTP packet structure in which two NALUs are combined into one RTP packet:
    insert image description here

  • Fragmentation Unit: When an H.264 NALU is relatively large, for network transmission, one NALU can be split into multiple RTPs for transmission. webrtc supports the fragmentation structure of FU-A, if shown:
    insert image description here
    The FU Indicator structure is the same as the H.264 NALU header structure, if shown. Among them, F and NRI are consistent with H.264 NALU, and the value of Type is 28\29, corresponding to FU-A and FU-B respectively.
    FU indicator
    The FU header structure is shown in the figure:
    FU header
    Among them:
    S stands for Start bit, if it is set to 1, it means that the RTP packet is the first packet of FU;
    E stands for End bit, if it is set to 1, it means that the RTP packet is the first packet of FU The last packet;
    R stands for Reserverd bit, a reserved bit, must be set to 0;
    Type is consistent with the H.264 NALU header.

2.3 Packing mode

There are three packaging modes specified by RTP, namely Single NAL unit mode, Non-interleaved mode and Interleaved mode. webrtc supports Single NAL unit mode and Non-interleaved mode. The corresponding relationship between the three packaging modes and the RTP payload structure is shown in the figure:
Summary of allowed NAL unit types for each packetization mode

3. webrtc packaging h264 process

In the video collection, encoding, and transmission of webrtc source code reading , we have analyzed RTPSenderVideo::SendVideothat in this function, RtpPacketizer::CreateRTP packaging of video data will be called, and RtpPacketizerH264::NextPacketthe data will be converted into real RTP packets.

  std::unique_ptr<RtpPacketizer> packetizer =
      RtpPacketizer::Create(codec_type, payload, limits, video_header);
  .......
    if (!packetizer->NextPacket(packet.get()))
      return false;

We use the video encoding format as H.264 for analysis.

RtpPacketizerH264::RtpPacketizerH264(rtc::ArrayView<const uint8_t> payload,
                                     PayloadSizeLimits limits,
                                     H264PacketizationMode packetization_mode)
    : limits_(limits), num_packets_left_(0) {
    
    
  // Guard against uninitialized memory in packetization_mode.
  RTC_CHECK(packetization_mode == H264PacketizationMode::NonInterleaved ||
            packetization_mode == H264PacketizationMode::SingleNalUnit);

//对H264打包时,需要去除H264码流中的StartCode,并以NALU为单位进行打包
//根据H264码流格式,通过StartCode,区分不同NALU,每个input_fragments_元素为一个H264 NALU
  for (const auto& nalu :
       H264::FindNaluIndices(payload.data(), payload.size())) {
    
    
    input_fragments_.push_back(
        payload.subview(nalu.payload_start_offset, nalu.payload_size));
  }

  if (!GeneratePackets(packetization_mode)) {
    
     //打包为RTP包
    // If failed to generate all the packets, discard already generated
    // packets in case the caller would ignore return value and still try to
    // call NextPacket().
    num_packets_left_ = 0;
    while (!packets_.empty()) {
    
    
      packets_.pop();
    }
  }
}

3.1 Parsing NALU

std::vector<NaluIndex> FindNaluIndices(const uint8_t* buffer,
                                       size_t buffer_size) {
    
    
  //H264的StratCode有两种:0x00 0x00 0x01或 0x00 0x00 0x00 0x01,跟据StartCode来区分不同NALU
  std::vector<NaluIndex> sequences;
  if (buffer_size < kNaluShortStartSequenceSize)
    return sequences;

  static_assert(kNaluShortStartSequenceSize >= 2,
                "kNaluShortStartSequenceSize must be larger or equals to 2");
  const size_t end = buffer_size - kNaluShortStartSequenceSize;
  for (size_t i = 0; i < end;) {
    
    
    if (buffer[i + 2] > 1) {
    
    
      i += 3;
    } else if (buffer[i + 2] == 1) {
    
    
      if (buffer[i + 1] == 0 && buffer[i] == 0) {
    
    
        // We found a start sequence, now check if it was a 3 of 4 byte one.
        NaluIndex index = {
    
    i, i + 3, 0};
        if (index.start_offset > 0 && buffer[index.start_offset - 1] == 0)
          --index.start_offset;

        // Update length of previous entry.
        auto it = sequences.rbegin();
        if (it != sequences.rend())
          it->payload_size = index.start_offset - it->payload_start_offset;

        sequences.push_back(index);
      }

      i += 3;
    } else {
    
    
      ++i;
    }
  }

  // Update length of last entry, if any.
  auto it = sequences.rbegin();
  if (it != sequences.rend())
    it->payload_size = buffer_size - it->payload_start_offset;

  return sequences;
}

There will be a StartCode (start code) before each NALU of the H.264 original code stream. The start code has two formats: 0x00 0x00 0x01 or 0x00 0x00 0x00 0x01. According to this criterion, find the start code in the data and divide it by the position of the start code to find the boundary of the NALU.

3.2 Packing a single NALU

bool RtpPacketizerH264::PacketizeSingleNalu(size_t fragment_index) {
    
    
  // Add a single NALU to the queue, no aggregation.
  size_t payload_size_left = limits_.max_payload_len;
  if (input_fragments_.size() == 1)
    payload_size_left -= limits_.single_packet_reduction_len;
  else if (fragment_index == 0)
    payload_size_left -= limits_.first_packet_reduction_len;
  else if (fragment_index + 1 == input_fragments_.size())
    payload_size_left -= limits_.last_packet_reduction_len;
  rtc::ArrayView<const uint8_t> fragment = input_fragments_[fragment_index];
  //比较可承载负载大小与NALU大小,若NALU太大则SingleNalu模式不能打包整个NALU
  if (payload_size_left < fragment.size()) {
    
    
    RTC_LOG(LS_ERROR) << "Failed to fit a fragment to packet in SingleNalu "
                         "packetization mode. Payload size left "
                      << payload_size_left << ", fragment length "
                      << fragment.size() << ", packet capacity "
                      << limits_.max_payload_len;
    return false;
  }
  RTC_CHECK_GT(fragment.size(), 0u);
  packets_.push(PacketUnit(fragment, true /* first */, true /* last */,
                           false /* aggregated */, fragment[0]));
  ++num_packets_left_;
  return true;
}

This packaging method is only selected when the SingleNalUnit mode is forced to be selected. Webrtc generally chooses the Non-interleaved mode.

3.3 Packaging FU-A

bool RtpPacketizerH264::PacketizeFuA(size_t fragment_index) {
    
    
  // Fragment payload into packets (FU-A).
  rtc::ArrayView<const uint8_t> fragment = input_fragments_[fragment_index];

  PayloadSizeLimits limits = limits_;
  // Leave room for the FU-A header.
  limits.max_payload_len -= kFuAHeaderSize;
  // Update single/first/last packet reductions unless it is single/first/last
  // fragment.
  if (input_fragments_.size() != 1) {
    
    
    // if this fragment is put into a single packet, it might still be the
    // first or the last packet in the whole sequence of packets.
    if (fragment_index == input_fragments_.size() - 1) {
    
     //最后一包
      limits.single_packet_reduction_len = limits_.last_packet_reduction_len;
    } else if (fragment_index == 0) {
    
     //第一包
      limits.single_packet_reduction_len = limits_.first_packet_reduction_len;
    } else {
    
    
      limits.single_packet_reduction_len = 0;
    }
  }
  if (fragment_index != 0) //非第一包,设置第一包标志为0
    limits.first_packet_reduction_len = 0;
  if (fragment_index != input_fragments_.size() - 1) //非最后一包,设置最后一包标志位0
    limits.last_packet_reduction_len = 0;

  // Strip out the original header.
  size_t payload_left = fragment.size() - kNalHeaderSize;
  int offset = kNalHeaderSize;

  //按FU-A分包的大小,计算每个FU的载荷大小
  std::vector<int> payload_sizes = SplitAboutEqually(payload_left, limits);
  if (payload_sizes.empty())
    return false;

  for (size_t i = 0; i < payload_sizes.size(); ++i) {
    
     //根据计算出来的FU包的载荷大小进行分包
    int packet_length = payload_sizes[i];
    RTC_CHECK_GT(packet_length, 0);
    packets_.push(PacketUnit(fragment.subview(offset, packet_length),
                             /*first_fragment=*/i == 0,
                             /*last_fragment=*/i == payload_sizes.size() - 1,
                             false, fragment[0]));
    offset += packet_length;
    payload_left -= packet_length;
  }
  num_packets_left_ += payload_sizes.size();
  RTC_CHECK_EQ(0, payload_left);
  return true;
}

This is the most common RTP packaging method for h264 in webrtc. Pack according to FU-A's packing rules.

3.4 Packaging STAP-A

size_t RtpPacketizerH264::PacketizeStapA(size_t fragment_index) {
    
    
  // Aggregate fragments into one packet (STAP-A).
  size_t payload_size_left = limits_.max_payload_len;
  if (input_fragments_.size() == 1)
    payload_size_left -= limits_.single_packet_reduction_len;
  else if (fragment_index == 0)
    payload_size_left -= limits_.first_packet_reduction_len;
  int aggregated_fragments = 0;
  size_t fragment_headers_length = 0;
  rtc::ArrayView<const uint8_t> fragment = input_fragments_[fragment_index];
  RTC_CHECK_GE(payload_size_left, fragment.size());
  ++num_packets_left_;

  //计算出stap-a中每个nalu所需要的空间
  auto payload_size_needed = [&] {
    
    
    size_t fragment_size = fragment.size() + fragment_headers_length;
    if (input_fragments_.size() == 1) {
    
    
      // Single fragment, single packet, payload_size_left already adjusted
      // with limits_.single_packet_reduction_len.
      return fragment_size;
    }
    if (fragment_index == input_fragments_.size() - 1) {
    
    
      // Last fragment, so STAP-A might be the last packet.
      return fragment_size + limits_.last_packet_reduction_len;
    }
    return fragment_size;
  };

  while (payload_size_left >= payload_size_needed()) {
    
    
    RTC_CHECK_GT(fragment.size(), 0);
    packets_.push(PacketUnit(fragment, aggregated_fragments == 0, false, true,
                             fragment[0]));
    payload_size_left -= fragment.size();
    payload_size_left -= fragment_headers_length;

    fragment_headers_length = kLengthFieldSize;
    // If we are going to try to aggregate more fragments into this packet
    // we need to add the STAP-A NALU header and a length field for the first
    // NALU of this packet.
    if (aggregated_fragments == 0)
      fragment_headers_length += kNalHeaderSize + kLengthFieldSize;
    ++aggregated_fragments;

    // Next fragment.
    ++fragment_index;
    if (fragment_index == input_fragments_.size())
      break;
    fragment = input_fragments_[fragment_index];
  }
  RTC_CHECK_GT(aggregated_fragments, 0);
  packets_.back().last_fragment = true;
  return fragment_index;
}

3.5 NextPacket

bool RtpPacketizerH264::NextPacket(RtpPacketToSend* rtp_packet) {
    
    
  RTC_DCHECK(rtp_packet);
  if (packets_.empty()) {
    
    
    return false;
  }

  PacketUnit packet = packets_.front();
  if (packet.first_fragment && packet.last_fragment) {
    
       //如果一个包既是第一包又是最后一包,则说明这是Single NAL unit包
    // Single NAL unit packet.
    size_t bytes_to_send = packet.source_fragment.size();
    uint8_t* buffer = rtp_packet->AllocatePayload(bytes_to_send);
    memcpy(buffer, packet.source_fragment.data(), bytes_to_send);
    packets_.pop();
    input_fragments_.pop_front();
  } else if (packet.aggregated) {
    
      //组合包 stap-a
    NextAggregatePacket(rtp_packet);
  } else {
    
     //分片模式 fu-a
    NextFragmentPacket(rtp_packet);
  }
  rtp_packet->SetMarker(packets_.empty());  //如果是最后一包,则置marker位
  --num_packets_left_;
  return true;
}

Depending on the packaging type, different functions are called.

  • If it is a Single NAL unit packet, it will be directly entered into the RTP packet according to the Single NALU rule.
  • If it is an aggregated packet, call NextAggregatePacket for encapsulation, and the encapsulation rules are consistent with the Aggregation Packet in 2.2.
  • If it is a combination package, call NextFragmentPacket for encapsulation, and the encapsulation rules are consistent with the fragmentation structure (Fragmentation Unit) in 2.2.
  • If it is the last packet, the marker position of the RTP header will be set to 1 to represent the frame end flag.

So far, the process of webrtc packaging H.264 into RTP packets has been analyzed.

References

RTP: A Transport Protocol for Real-Time Applications(RFC3550)
RTP Payload Format for H.264 Video(RFC6184)

Guess you like

Origin blog.csdn.net/qq_36383272/article/details/131535444