In audio and video communication, network jitter and delay are common problems, which will lead to audio and video quality degradation and poor user experience. In order to solve these problems, WebRTC introduces an important component of Jitter Buffer (jitter buffer). Jitter Buffer is a buffer for receiving and processing audio and video data in network transmission. Its main function is to solve the problems caused by network jitter and delay to provide more stable and smooth audio and video transmission. Jitter Buffer enables audio and video data to be decoded and played in the correct order and timing by adjusting the receiving and playing time of data packets.
This article will analyze the implementation and version of the jitter buffer from the webrtc source code m98.

1. RTP packet reception and analysis

1. RTP packet receiving process

Similar to the process of P2P, read data from the underlying socket, to UDPPort::OnReadPacket.
PhysicalSocketServer::WaitSelect> ProcessEvents> SocketDispatcher::OnEvent> SignalReadEvent> AsyncUDPSocket::OnReadEvent> SignalReadPacket> > AllocationSequence::OnReadPacket> UDPPort::HandleIncomingPacket>UDPPort::OnReadPacket
from UDPPort::OnReadPacketto Callmodule.
UDPPort::OnReadPacket> Connection::OnReadPacket> Connection::SignalReadPacket> P2PTransportChannel::OnReadPacket> P2PTransportChannel::SignalReadPacket> DtlsTransport::OnReadPacket> DtlsTransport::SignalReadPacket> RtpTransport::OnReadPacket> RtpTransport::OnRtpPacketReceived> RtpTransport::DemuxPacket> > RtpDemuxer::OnRtpPacket> BaseChannel::OnRtpPacket> > WebRtcVideoChannel::OnPacketReceived> Call::DeliverPacket>Call::DeliverRtp

2. RTP packet analysis

The RTP packet is parsed Call::DeliverRtpby calling RtpPacketReceivedthe function in it.Parse

  if (!parsed_packet.Parse(std::move(packet)))
    return DELIVERY_PACKET_ERROR;

RtpPacketReceivedInherited from RtpPacket, that is, it will call RtpPacket::Parse> RtpPacket::ParseBufferfor analysis.
The RTP header structure has been learned in the h264 RTP packaging of webrtc source code reading , as shown in the figure below, according to the RTP header structure, the key parameters in the RTP header are analyzed.

3. RTP packet H264 analysis

Call::DeliverRtpAfter parsing the RTP packet, it will be called RtpStreamReceiverController::OnRtpPacketto process the parsed RTP packet.

    if (video_receiver_controller_.OnRtpPacket(parsed_packet)) {
    
    
      receive_stats_.AddReceivedVideoBytes(length,
                                           parsed_packet.arrival_time());
      event_log_->Log(
          std::make_unique<RtcEventRtpPacketIncoming>(parsed_packet));
      return DELIVERY_OK;
    }

RtpStreamReceiverController::OnRtpPacket > RtpDemuxer::OnRtpPacket > RtpVideoStreamReceiver2::OnRtpPacket > RtpVideoStreamReceiver2::ReceivePacket

Then RtpVideoStreamReceiver2::ReceivePacketcall the subclass of VideoRtpDepacketizer to parse the rtp data, taking H264 as an example, it will be called VideoRtpDepacketizerH264::Parse.

  absl::optional<VideoRtpDepacketizer::ParsedRtpPayload> parsed_payload =
      type_it->second->Parse(packet.PayloadBuffer());

The parsing process is the reverse process of packaging. For details, please refer to the h264 RTP packaging of webrtc source code reading , which will not be analyzed in this article.

2. The principle of video JitterBuffer

Sequence and assemble the received RTP packets through PacketBuffer, and assemble them into a complete frame.
Fill the assembled frame with reference frame information through RtpFrameReferenceFinder.
Use FrameBuffer to judge whether the complete frame filled with reference frame information is a continuous frame, and cache it.
FrameBuffer judges whether a frame is decodable according to whether the frames it depends on have been decoded, and hands it to the decoder for decoding.

Through the above steps, it can be ensured that each frame is complete and reliable, and the reference frame of each frame is complete and reliable, then when the reference frames are all decoded, the frame can be decoded.

1. Determine the complete frame

PacketBuffer::InsertResult PacketBuffer::InsertPacket(
    std::unique_ptr<PacketBuffer::Packet> packet) {
    
    
  PacketBuffer::InsertResult result;

  uint16_t seq_num = packet->seq_num;
  size_t index = seq_num % buffer_.size();

  if (!first_packet_received_) {
    
    
    first_seq_num_ = seq_num;
    first_packet_received_ = true;
  } else if (AheadOf(first_seq_num_, seq_num)) {
    
    //seq_num 在 first_seq_num之前
    // If we have explicitly cleared past this packet then it's old,
    // don't insert it, just silently ignore it.
    if (is_cleared_to_first_seq_num_) {
    
    
      return result;
    }

    first_seq_num_ = seq_num;
  }

  if (buffer_[index] != nullptr) {
    
    //buffer中 index对应的槽被占用了
    // Duplicate packet, just delete the payload.
    if (buffer_[index]->seq_num == packet->seq_num) {
    
    
      return result;
    }

    // The packet buffer is full, try to expand the buffer.
    //buffer满了的话，就扩展buffer,每次都是翻倍扩展
    while (ExpandBufferSize() && buffer_[seq_num % buffer_.size()] != nullptr) {
    
    
    }
    index = seq_num % buffer_.size(); //计算扩展后的新index

    // Packet buffer is still full since we were unable to expand the buffer.
    if (buffer_[index] != nullptr) {
    
     //buffer已经扩展到最大了，就清空buffer，申请一个新的关键帧
      // Clear the buffer, delete payload, and return false to signal that a
      // new keyframe is needed.
      RTC_LOG(LS_WARNING) << "Clear PacketBuffer and request key frame.";
      ClearInternal();
      result.buffer_cleared = true;
      return result;
    }
  }

  packet->continuous = false;
  buffer_[index] = std::move(packet);

  UpdateMissingPackets(seq_num); //更新丢包信息，用于NACK等计算

  result.packets = FindFrames(seq_num);  //查找一个完整帧
  return result;
}

Insert a packet of data through InsertPacketthe function, and FindFramesjudge the complete frame through the function.

std::vector<std::unique_ptr<PacketBuffer::Packet>> PacketBuffer::FindFrames(
    uint16_t seq_num) {
    
    
  std::vector<std::unique_ptr<PacketBuffer::Packet>> found_frames;
  for (size_t i = 0; i < buffer_.size() && PotentialNewFrame(seq_num); ++i) {
    
      //该包是潜在的新帧
    size_t index = seq_num % buffer_.size();
    buffer_[index]->continuous = true;   //设置包的连续性


    //找到一帧的最后一包，往前推找到第一包，就是完整的一帧
    if (buffer_[index]->is_last_packet_in_frame()) {
    
    
      uint16_t start_seq_num = seq_num;

      // Find the start index by searching backward until the packet with
      // the `frame_begin` flag is set.
      int start_index = index;
      size_t tested_packets = 0;
      int64_t frame_timestamp = buffer_[start_index]->timestamp;

      // Identify H.264 keyframes by means of SPS, PPS, and IDR.
      bool is_h264 = buffer_[start_index]->codec() == kVideoCodecH264;
      bool has_h264_sps = false;
      bool has_h264_pps = false;
      bool has_h264_idr = false;
      bool is_h264_keyframe = false;
      int idr_width = -1;
      int idr_height = -1;
      while (true) {
    
    
        ++tested_packets;

        if (!is_h264 && buffer_[start_index]->is_first_packet_in_frame())   //vp8及vp9判断逻辑就是找到第一包的标志
          break;

        if (is_h264) {
    
    
          const auto* h264_header = absl::get_if<RTPVideoHeaderH264>(
              &buffer_[start_index]->video_header.video_type_header);
          if (!h264_header || h264_header->nalus_length >= kMaxNalusPerPacket)
            return found_frames;

          for (size_t j = 0; j < h264_header->nalus_length; ++j) {
    
    
            if (h264_header->nalus[j].type == H264::NaluType::kSps) {
    
    
              has_h264_sps = true;
            } else if (h264_header->nalus[j].type == H264::NaluType::kPps) {
    
    
              has_h264_pps = true;
            } else if (h264_header->nalus[j].type == H264::NaluType::kIdr) {
    
    
              has_h264_idr = true;
            }
          }
          if ((sps_pps_idr_is_h264_keyframe_ && has_h264_idr && has_h264_sps &&
               has_h264_pps) ||
              (!sps_pps_idr_is_h264_keyframe_ && has_h264_idr)) {
    
    
            is_h264_keyframe = true;
            // Store the resolution of key frame which is the packet with
            // smallest index and valid resolution; typically its IDR or SPS
            // packet; there may be packet preceeding this packet, IDR's
            // resolution will be applied to them.
            if (buffer_[start_index]->width() > 0 &&
                buffer_[start_index]->height() > 0) {
    
    
              idr_width = buffer_[start_index]->width();
              idr_height = buffer_[start_index]->height();
            }
          }
        }

        if (tested_packets == buffer_.size())  //已经把所有buffer缓存的包过了一遍了，没有找到与当前帧时间戳不一样的帧，
                                              //也就是说当前帧之前的帧已经都不在buffer里了，或者当前帧就是第一帧，那么buffer中第一包就是帧的起始包
          break;

        start_index = start_index > 0 ? start_index - 1 : buffer_.size() - 1;

        // In the case of H264 we don't have a frame_begin bit (yes,
        // `frame_begin` might be set to true but that is a lie). So instead
        // we traverese backwards as long as we have a previous packet and
        // the timestamp of that packet is the same as this one. This may cause
        // the PacketBuffer to hand out incomplete frames.
        // See: https://bugs.chromium.org/p/webrtc/issues/detail?id=7106
        if (is_h264 && (buffer_[start_index] == nullptr ||
                        buffer_[start_index]->timestamp != frame_timestamp)) {
    
    
          break;
        }

        --start_seq_num;
      }

      if (is_h264) {
    
    
        // Warn if this is an unsafe frame.
        if (has_h264_idr && (!has_h264_sps || !has_h264_pps)) {
    
    
          RTC_LOG(LS_WARNING)
              << "Received H.264-IDR frame "
                 "(SPS: "
              << has_h264_sps << ", PPS: " << has_h264_pps << "). Treating as "
              << (sps_pps_idr_is_h264_keyframe_ ? "delta" : "key")
              << " frame since WebRTC-SpsPpsIdrIsH264Keyframe is "
              << (sps_pps_idr_is_h264_keyframe_ ? "enabled." : "disabled");
        }

        // Now that we have decided whether to treat this frame as a key frame
        // or delta frame in the frame buffer, we update the field that
        // determines if the RtpFrameObject is a key frame or delta frame.
        //更新帧类型信息
        const size_t first_packet_index = start_seq_num % buffer_.size();
        if (is_h264_keyframe) {
    
    
          buffer_[first_packet_index]->video_header.frame_type =
              VideoFrameType::kVideoFrameKey;
          if (idr_width > 0 && idr_height > 0) {
    
    
            // IDR frame was finalized and we have the correct resolution for
            // IDR; update first packet to have same resolution as IDR.
            buffer_[first_packet_index]->video_header.width = idr_width;
            buffer_[first_packet_index]->video_header.height = idr_height;
          }
        } else {
    
    
          buffer_[first_packet_index]->video_header.frame_type =
              VideoFrameType::kVideoFrameDelta;
        }

        // If this is not a keyframe, make sure there are no gaps in the packet
        // sequence numbers up until this point.
        //如果当前帧是P帧，而且在当前序号前面还有丢包，就意味着当前帧之前有帧不完整，所以继续缓存
        if (!is_h264_keyframe && missing_packets_.upper_bound(start_seq_num) !=
                                     missing_packets_.begin()) {
    
    
          return found_frames;
        }
      }

      const uint16_t end_seq_num = seq_num + 1;
      //把找到的完整帧的所有包放进found_frames中
      uint16_t num_packets = end_seq_num - start_seq_num;
      found_frames.reserve(found_frames.size() + num_packets);
      for (uint16_t i = start_seq_num; i != end_seq_num; ++i) {
    
    
        std::unique_ptr<Packet>& packet = buffer_[i % buffer_.size()];
        RTC_DCHECK(packet);
        RTC_DCHECK_EQ(i, packet->seq_num);
        // Ensure frame boundary flags are properly set.
        packet->video_header.is_first_packet_in_frame = (i == start_seq_num);
        packet->video_header.is_last_packet_in_frame = (i == seq_num);
        found_frames.push_back(std::move(packet));
      }

      //前面已经判断过如果是P帧，且前面有空洞会返回，所以这里：如果是P帧则不会清理，如果是I帧，则清理I帧之前的所有丢包
      missing_packets_.erase(missing_packets_.begin(),
                             missing_packets_.upper_bound(seq_num));
    }
    ++seq_num;
  }
  return found_frames;
}

First determine whether the packet is a potential new frame. The judgment logic is roughly: if it is the first packet of a frame, it is a potential new frame; otherwise, if the previous packet of the packet is continuous, it is a potential new frame. Then if the last packet is found, there is a complete frame. For vpx, there are signs of the first packet and the last packet of the frame in the code stream, so it can be judged directly; for h264, the flag of the last packet is set in the Mark bit of RTP, and there will be no misjudgment; But there is a problem with the judgment of the first packet (see the bug report in the comments for details , I don’t understand why this is so), so the change of the timestamp is used to judge the first packet of the frame.

2. Fill in the reference frame information

RtpFrameReferenceFinder::ReturnVector RtpSeqNumOnlyRefFinder::ManageFrame(
    std::unique_ptr<RtpFrameObject> frame) {
    
    
  FrameDecision decision = ManageFrameInternal(frame.get()); 

  //根据ManageFrameInternal的结果，如果是kStash就把帧缓存下来，
  //如果是kHandOff，说明帧是连续的，该帧可能是之前缓存帧的参考帧，所以当该帧连续时，可以判断缓存帧中是否可以传播连续性
  RtpFrameReferenceFinder::ReturnVector res;
  switch (decision) {
    
    
    case kStash:
      if (stashed_frames_.size() > kMaxStashedFrames)
        stashed_frames_.pop_back();
      stashed_frames_.push_front(std::move(frame));
      return res;
    case kHandOff:
      res.push_back(std::move(frame));
      RetryStashedFrames(res);
      return res;
    case kDrop:
      return res;
  }

  return res;
}

According to the result of ManageFrameInternal, if it is kStash, the frame is cached. If it is kHandOff, it means that the frame is continuous. This frame may be the reference frame of the previous cached frame, so when the frame is continuous, it can be judged whether the cached frame can be transmitted. continuity.

RtpSeqNumOnlyRefFinder::FrameDecision
RtpSeqNumOnlyRefFinder::ManageFrameInternal(RtpFrameObject* frame) {
    
    
  if (frame->frame_type() == VideoFrameType::kVideoFrameKey) {
    
      //关键帧、则插入新的GOP缓存
    last_seq_num_gop_.insert(std::make_pair(
        frame->last_seq_num(),
        std::make_pair(frame->last_seq_num(), frame->last_seq_num())));
  }

  // We have received a frame but not yet a keyframe, stash this frame.
  if (last_seq_num_gop_.empty())  //说明至今没收到关键帧，则缓存
    return kStash;

  // Clean up info for old keyframes but make sure to keep info
  // for the last keyframe.
  auto clean_to = last_seq_num_gop_.lower_bound(frame->last_seq_num() - 100);  //最多缓存100个gop，清理掉100个之前的gop缓存
  for (auto it = last_seq_num_gop_.begin();
       it != clean_to && last_seq_num_gop_.size() > 1;) {
    
    
    it = last_seq_num_gop_.erase(it);
  }

  // Find the last sequence number of the last frame for the keyframe
  // that this frame indirectly references.
  //找到第一个大于该帧序列号的关键帧，那么该关键帧的前一个关键帧对应的gop就是该帧所在的gop
  auto seq_num_it = last_seq_num_gop_.upper_bound(frame->last_seq_num());
  if (seq_num_it == last_seq_num_gop_.begin()) {
    
    
    RTC_LOG(LS_WARNING) << "Generic frame with packet range ["
                        << frame->first_seq_num() << ", "
                        << frame->last_seq_num()
                        << "] has no GoP, dropping frame.";
    return kDrop;
  }
  seq_num_it--;

  // Make sure the packet sequence numbers are continuous, otherwise stash
  // this frame.
  //rtp数据中可能带有填充包，last_picture_id_gop对应的是真实编码数据的序列号，
  //last_picture_id_with_padding_gop对应的是可能填充包数据的序列号
  uint16_t last_picture_id_gop = seq_num_it->second.first;
  uint16_t last_picture_id_with_padding_gop = seq_num_it->second.second;
  if (frame->frame_type() == VideoFrameType::kVideoFrameDelta) {
    
    
    uint16_t prev_seq_num = frame->first_seq_num() - 1;

    //判断该帧是否连续，即前一帧有没有缓存在gop中
    if (prev_seq_num != last_picture_id_with_padding_gop)
      return kStash;
  }

  RTC_DCHECK(AheadOrAt(frame->last_seq_num(), seq_num_it->first));

  // Since keyframes can cause reordering we can't simply assign the
  // picture id according to some incrementing counter.
  frame->SetId(frame->last_seq_num());
  frame->num_references =
      frame->frame_type() == VideoFrameType::kVideoFrameDelta;   //设置参考帧个数，关键帧就是0；否则是1，Webrtc默认每个帧前面的一帧是该帧的参考帧
  frame->references[0] = rtp_seq_num_unwrapper_.Unwrap(last_picture_id_gop);
  if (AheadOf<uint16_t>(frame->Id(), last_picture_id_gop)) {
    
    
    seq_num_it->second.first = frame->Id();
    seq_num_it->second.second = frame->Id();
  }

  UpdateLastPictureIdWithPadding(frame->Id());
  frame->SetSpatialIndex(0);
  frame->SetId(rtp_seq_num_unwrapper_.Unwrap(frame->Id()));
  return kHandOff;
}

By RtpFrameReferenceFinder::ManageFramefilling in the key frame information of the good frame, and returning the consecutive frames to RtpVideoStreamReceiver2.

3. FrameBuffer insert frame

int64_t FrameBuffer::InsertFrame(std::unique_ptr<EncodedFrame> frame) {
    
    
  TRACE_EVENT0("webrtc", "FrameBuffer::InsertFrame");
  RTC_DCHECK(frame);

  MutexLock lock(&mutex_);

  int64_t last_continuous_frame_id = last_continuous_frame_.value_or(-1);

  if (!ValidReferences(*frame)) {
    
      //判断帧的参考帧是否有效，判断规则：1、参考帧必须在该帧前面
                                    // 2、每帧的参考帧不能相同 （实际上webrtc默认每帧的参考帧就是前面一帧）
    RTC_LOG(LS_WARNING) << "Frame " << frame->Id()
                        << " has invalid frame references, dropping frame.";
    return last_continuous_frame_id;
  }

  if (frames_.size() >= kMaxFramesBuffered) {
    
    
    if (frame->is_keyframe()) {
    
    
      RTC_LOG(LS_WARNING) << "Inserting keyframe " << frame->Id()
                          << " but buffer is full, clearing"
                             " buffer and inserting the frame.";
      ClearFramesAndHistory();
    } else {
    
    
      RTC_LOG(LS_WARNING) << "Frame " << frame->Id()
                          << " could not be inserted due to the frame "
                             "buffer being full, dropping frame.";
      return last_continuous_frame_id;
    }
  }

  auto last_decoded_frame = decoded_frames_history_.GetLastDecodedFrameId();
  auto last_decoded_frame_timestamp =
      decoded_frames_history_.GetLastDecodedFrameTimestamp();
  if (last_decoded_frame && frame->Id() <= *last_decoded_frame) {
    
      
    if (AheadOf(frame->Timestamp(), *last_decoded_frame_timestamp) &&
        frame->is_keyframe()) {
    
    
      //帧id小于上一个解码的帧，但是时间戳比较新，而且还是关键帧，这可能是编码器重启了，所以这也是一个新的帧

      // If this frame has a newer timestamp but an earlier frame id then we
      // assume there has been a jump in the frame id due to some encoder
      // reconfiguration or some other reason. Even though this is not according
      // to spec we can still continue to decode from this frame if it is a
      // keyframe.
      RTC_LOG(LS_WARNING)
          << "A jump in frame id was detected, clearing buffer.";
      ClearFramesAndHistory();
      last_continuous_frame_id = -1;
    } else {
    
    
      RTC_LOG(LS_WARNING) << "Frame " << frame->Id() << " inserted after frame "
                          << *last_decoded_frame
                          << " was handed off for decoding, dropping frame.";
      return last_continuous_frame_id;
    }
  }

  // Test if inserting this frame would cause the order of the frames to become
  // ambiguous (covering more than half the interval of 2^16). This can happen
  // when the frame id make large jumps mid stream.
  if (!frames_.empty() && frame->Id() < frames_.begin()->first &&
      frames_.rbegin()->first < frame->Id()) {
    
    
    RTC_LOG(LS_WARNING) << "A jump in frame id was detected, clearing buffer.";
    ClearFramesAndHistory();
    last_continuous_frame_id = -1;
  }

  auto info = frames_.emplace(frame->Id(), FrameInfo()).first;  //插入该帧，并为该帧创建一个对应的FrameInfo

  if (info->second.frame) {
    
    
    return last_continuous_frame_id;
  }

  if (!UpdateFrameInfoWithIncomingFrame(*frame, info)) //更新该帧对应的FrameInfo
    return last_continuous_frame_id;

  if (!frame->delayed_by_retransmission())
    timing_->IncomingTimestamp(frame->Timestamp(), frame->ReceivedTime());

  // It can happen that a frame will be reported as fully received even if a
  // lower spatial layer frame is missing.
  if (stats_callback_ && frame->is_last_spatial_layer) {
    
    
    stats_callback_->OnCompleteFrame(frame->is_keyframe(), frame->size(),
                                     frame->contentType());
  }

  info->second.frame = std::move(frame);

  if (info->second.num_missing_continuous == 0) {
    
    
    info->second.continuous = true;
    PropagateContinuity(info);
    last_continuous_frame_id = *last_continuous_frame_;

    // Since we now have new continuous frames there might be a better frame
    // to return from NextFrame.
    //有个新的连续帧了，发送给FindNextFrame函数
    if (callback_queue_) {
    
    
      callback_queue_->PostTask([this] {
    
    
        MutexLock lock(&mutex_);
        if (!callback_task_.Running())
          return;
        RTC_CHECK(frame_handler_);
        callback_task_.Stop();
        StartWaitForNextFrameOnQueue();
      });
    }
  }

  return last_continuous_frame_id;
}

The information updated by UpdateFrameInfoWithIncomingFrame is mainly the continuity of the frame and the number of missing reference frames. If the frame does not lack reference frames and the frames are continuous, then the frame is decodable. The task of finding decodable frames will be restarted, and the decodable frames will be decoded.

4. Find a decodable frame

int64_t FrameBuffer::FindNextFrame(int64_t now_ms) {
    
    
  int64_t wait_ms = latest_return_time_ms_ - now_ms;
  frames_to_decode_.clear();

  // `last_continuous_frame_` may be empty below, but nullopt is smaller
  // than everything else and loop will immediately terminate as expected.
  for (auto frame_it = frames_.begin();
       frame_it != frames_.end() && frame_it->first <= last_continuous_frame_;
       ++frame_it) {
    
    
    if (!frame_it->second.continuous ||  
        frame_it->second.num_missing_decodable > 0) {
    
      //帧是连续的，且不缺失参考帧
      continue;
    }

    EncodedFrame* frame = frame_it->second.frame.get();

    if (keyframe_required_ && !frame->is_keyframe()) //如果需要关键帧，但是当前帧不是关键帧，则不能解码
      continue;

    auto last_decoded_frame_timestamp =
        decoded_frames_history_.GetLastDecodedFrameTimestamp();

    // TODO(https://bugs.webrtc.org/9974): consider removing this check
    // as it may make a stream undecodable after a very long delay between
    // frames.
    //比上次解码的帧时间戳更早，不可解码
    if (last_decoded_frame_timestamp &&
        AheadOf(*last_decoded_frame_timestamp, frame->Timestamp())) {
    
    
      continue;
    }

    // Gather all remaining frames for the same superframe.
    std::vector<FrameMap::iterator> current_superframe;
    current_superframe.push_back(frame_it);
    bool last_layer_completed = frame_it->second.frame->is_last_spatial_layer;
    FrameMap::iterator next_frame_it = frame_it;
    while (!last_layer_completed) {
    
     //vpx的逻辑
      ++next_frame_it;

      if (next_frame_it == frames_.end() || !next_frame_it->second.frame) {
    
    
        break;
      }

      if (next_frame_it->second.frame->Timestamp() != frame->Timestamp() ||
          !next_frame_it->second.continuous) {
    
    
        break;
      }

      if (next_frame_it->second.num_missing_decodable > 0) {
    
    
        bool has_inter_layer_dependency = false;
        for (size_t i = 0; i < EncodedFrame::kMaxFrameReferences &&
                           i < next_frame_it->second.frame->num_references;
             ++i) {
    
    
          if (next_frame_it->second.frame->references[i] >= frame_it->first) {
    
    
            has_inter_layer_dependency = true;
            break;
          }
        }

        // If the frame has an undecoded dependency that is not within the same
        // temporal unit then this frame is not yet ready to be decoded. If it
        // is within the same temporal unit then the not yet decoded dependency
        // is just a lower spatial frame, which is ok.
        if (!has_inter_layer_dependency ||
            next_frame_it->second.num_missing_decodable > 1) {
    
    
          break;
        }
      }

      current_superframe.push_back(next_frame_it);
      last_layer_completed = next_frame_it->second.frame->is_last_spatial_layer;
    }
    // Check if the current superframe is complete.
    // TODO(bugs.webrtc.org/10064): consider returning all available to
    // decode frames even if the superframe is not complete yet.
    if (!last_layer_completed) {
    
    
      continue;
    }

    frames_to_decode_ = std::move(current_superframe);  //将可解码的帧放到frames_to_decode_

    if (frame->RenderTime() == -1) {
    
      //设置Render时间
      frame->SetRenderTime(timing_->RenderTimeMs(frame->Timestamp(), now_ms));
    }
    bool too_many_frames_queued =
        frames_.size() > zero_playout_delay_max_decode_queue_size_ ? true
                                                                   : false;
    wait_ms = timing_->MaxWaitingTime(frame->RenderTime(), now_ms,
                                      too_many_frames_queued);  //设置下一帧等待时间

    // This will cause the frame buffer to prefer high framerate rather
    // than high resolution in the case of the decoder not decoding fast
    // enough and the stream has multiple spatial and temporal layers.
    // For multiple temporal layers it may cause non-base layer frames to be
    // skipped if they are late.
    if (wait_ms < -kMaxAllowedFrameDelayMs)
      continue;

    break;
  }
  wait_ms = std::min<int64_t>(wait_ms, latest_return_time_ms_ - now_ms);
  wait_ms = std::max<int64_t>(wait_ms, 0);
  return wait_ms;
}

FrameBufferBy FindNextFramefinding a decodable frame, by GetNextFramegetting a decodable frame, and then handing it over to the decoder for decoding.

Webrtc source code reading video RTP receiving && JitterBuffer