Overview of Audio Network Countermeasures with WebRTC

In WebRTC audio data processing, it is expected to achieve audio data processing and transmission with low latency, good interactivity, smooth sound without jitter, low bit rate and low bandwidth consumption, etc. For data transmission, WebRTC uses the UDP-based RTP/RTCP protocol. RTP/RTCP itself does not provide reliable transmission and quality assurance of data. Packet switching networks such as the public Internet naturally have problems such as loss, duplication, disorder and delay of data packet transmission. These goals of WebRTC audio data processing are difficult to achieve simultaneously, and WebRTC's audio network adversarial implementation balances these goals for different situations.

Here's a closer look at the WebRTC audio data processing pipeline, with a special focus on the logic related to audio network confrontation.

WebRTC audio data reception and decoding playback control pipeline

Previously, in the article WebRTC's audio data encoding and sending control pipeline, we analyzed the logic related to WebRTC's audio data encoding and sending control. Here, let's take a look at the audio data receiving, decoding and playing process of WebRTC.

The complete process at the conceptual abstract level of WebRTC's audio data reception and processing is roughly as follows:

-----------------------------     --------------------------     ---------------------------
|                           |     |                        |     |                         |
| webrtc::AudioDeviceModule | <== | webrtc::AudioTransport | <== | webrtc::AudioProcessing |
|                           |     |                        |     |                         |
-----------------------------     --------------------------     ---------------------------
                                                                            / \
                                                                            ||
                                          +=+===============================+=+
                                          | |
                    -------------------------------------------- 
                    |                                          |
                    |            webrtc::AudioMixer            | 
                    |                                          |
                    --------------------------------------------
                                          / \
                                          | |
-------------------------     ---------------------------------------------------------
|                       |     |                                                       |
| cricket::MediaChannel | ==> | webrtc::AudioMixer::Source/webrtc::AudioReceiveStream |
|                       |     |                                                       |
-------------------------     ---------------------------------------------------------
                                                      ||
                                                      \ /
-------------------------------------------     ---------------------
|                                         |     |                   |
| cricket::MediaChannel::NetworkInterface | <== | webrtc::Transport |
|                                         |     |                   |
-------------------------------------------     ---------------------

For the audio data receiving and processing process of WebRTC, webrtc::AudioDeviceModuleit is responsible for sending the sound PCM data to the device through the system interface for playback. webrtc::AudioDeviceModuleA dedicated playback thread is usually started internally, and the playback thread drives the entire decoding and playback process. webrtc::AudioTransportAs an adaptation and glue module, it webrtc::AudioProcessingcombines audio data playback with audio data processing and mixing. It webrtc::AudioMixersynchronously acquires and mixes each remote audio stream. In addition to returning the mixed audio data to webrtc::AudioDeviceModulethe In addition to playback, it will also be sent in webrtc::AudioProcessingas a reference signal for echo cancellation. webrtc::AudioMixer::Source/ webrtc::AudioReceiveStreamProvide decoded data for the playback process. RTCP feedback is sent in webrtc::AudioMixer::Source/ via . Also an adapter and glue module that actually sends the packets over the network. Receives audio packets from the network and sends them to / .webrtc::AudioReceiveStreamwebrtc::Transportwebrtc::Transportcricket::MediaChannel::NetworkInterfacecricket::MediaChannelwebrtc::AudioMixer::Sourcewebrtc::AudioReceiveStream

If the adaptation and glue modules on the audio data receiving and processing pipeline are omitted, the audio data receiving and processing pipeline will be simplified to something like the following:

-----------------------------     ---------------------------
|                           |     |                         |
| webrtc::AudioDeviceModule | <== | webrtc::AudioProcessing |
|                           |     |                         |
-----------------------------     ---------------------------
                                             / \
                                             ||
                    -------------------------------------------- 
                    |                                          |
                    |            webrtc::AudioMixer            | 
                    |                                          |
                    --------------------------------------------
                                          / \
                                          | |
-------------------------     ---------------------------------------------------------
|                       |     |                                                       |
| cricket::MediaChannel | ==> | webrtc::AudioMixer::Source/webrtc::AudioReceiveStream |
|                       |     |                                                       |
-------------------------     ---------------------------------------------------------
                                                      ||
                                                      \ /
------------------------------------------------------------------------
|                                                                      |
|                 cricket::MediaChannel::NetworkInterface              |
|                                                                      |
------------------------------------------------------------------------

webrtc::AudioMixer::Source/ webrtc::AudioReceiveStreamis the center of the entire process, and its implementation is located in webrtc/audio/audio_receive_stream.h/ webrtc/audio/audio_receive_stream.cc. The relevant class hierarchy is as follows:

webrtc::AudioReceiveStream

In RTC, in order to achieve interaction and low latency, audio data reception processing cannot only reorder and decode packets. It must also fully consider network confrontation, such as PLC and sending RTCP feedback, etc. This is also a quite complicated process. The design of WebRTC heavily adopts the idea of ​​separation of control flow and data flow, which is webrtc::AudioReceiveStreamalso reflected in the design and implementation of . When analyzing webrtc::AudioReceiveStreamthe design and implementation, we can also look at it from two perspectives: configuration and control, and data flow.

The configuration and control that can be webrtc::AudioReceiveStreamexecuted mainly include the following:

  • NACK, maximum size of jitter buffer, mapping of payload type and codec, etc.;
  • webrtc::TransportConfigure decryption parameters for sending RTCP packets to the network ;
  • webrtc::AudioReceiveStreamLife cycle control, such as start and stop, etc.;

For data streams, first, the data packets received from the network are sent in webrtc::AudioReceiveStream; second, during playback, the decoded data is obtained webrtc::AudioDeviceModulefrom and sent to the playback device for playback; third, RTCP feedback packets are sent to the sending end to assist in implementation Congestion control affects the encoding and sending process.webrtc::AudioReceiveStreamwebrtc::AudioReceiveStream

webrtc::AudioReceiveStreamIn the implementation, the most important data processing process - audio data reception, decoding and playback process, and related modules are as follows:

WebRTC Audio Receive, Decode and Play

The arrows in this figure indicate the direction of data flow, and the order in which data is processed in each module is from left to right. The red box at the bottom of the picture is logic closely related to network confrontation.

webrtc::AudioReceiveStreamIn the data processing process of the implementation, the input data is the audio network data packet and the RTCP packet sent from the peer, and the output data cricket::MediaChannelis the decoded PCM data, which is sent to webrtc::AudioTransport, and the constructed RTCP feedback packet, such as TransportCC, RTCP NACK packet is sent webrtc::Transportout.

webrtc::AudioReceiveStreamWithin the implementation, audio network data packets are finally sent to the buffer of NetEQ webrtc::PacketBuffer. During playback, NetEQ does decoding, PLC, etc., and the decoded data is provided to webrtc::AudioDeviceModule.

The construction process of WebRTC audio data receiving and processing pipeline

Let’s first take a look at webrtc::AudioReceiveStreamthe construction process of the implemented data processing pipeline.

webrtc::AudioReceiveStreamThe implemented data processing pipeline is built step by step. webrtc::AudioReceiveStreamLet’s look at this process based on the data processing flow chart above .

During webrtc::AudioReceiveStreamobject creation, that is, webrtc::voe::(anonymous namespace)::ChannelReceivewhen an object is created, some key objects will be created and connections between some objects will be established. This calling process is as follows:

#0  webrtc::voe::(anonymous namespace)::ChannelReceive::ChannelReceive(webrtc::Clock*, webrtc::NetEqFactory*, webrtc::AudioDeviceModule*, webrtc::Transport*, webrtc::RtcEventLog*, unsigned int, unsigned int, unsigned long, bool, int, bool, bool, rtc::scoped_refptr<webrtc::AudioDecoderFactory>, absl::optional<webrtc::AudioCodecPairId>, rtc::scoped_refptr<webrtc::FrameDecryptorInterface>, webrtc::CryptoOptions const&, rtc::scoped_refptr<webrtc::FrameTransformerInterface>)
    (this=0x61b000008c80, clock=0x602000003bb0, neteq_factory=0x0, audio_device_module=0x614000010040, rtcp_send_transport=0x619000017cb8, rtc_event_log=0x613000011f40, local_ssrc=4195875351, remote_ssrc=1443723799, jitter_buffer_max_packets=200, jitter_buffer_fast_playout=false, jitter_buffer_min_delay_ms=0, jitter_buffer_enable_rtx_handling=false, enable_non_sender_rtt=false, decoder_factory=..., codec_pair_id=..., frame_decryptor=..., crypto_options=..., frame_transformer=...) at webrtc/audio/channel_receive.cc:517
#2  webrtc::voe::CreateChannelReceive(webrtc::Clock*, webrtc::NetEqFactory*, webrtc::AudioDeviceModule*, webrtc::Transport*, webrtc::RtcEventLog*, unsigned int, unsigned int, unsigned long, bool, int, bool, bool, rtc::scoped_refptr<webrtc::AudioDecoderFactory>, absl::optional<webrtc::AudioCodecPairId>, rtc::scoped_refptr<webrtc::FrameDecryptorInterface>, webrtc::CryptoOptions const&, rtc::scoped_refptr<webrtc::FrameTransformerInterface>)
    (clock=0x602000003bb0, neteq_factory=0x0, audio_device_module=0x614000010040, rtcp_send_transport=0x619000017cb8, rtc_event_log=0x613000011f40, local_ssrc=4195875351, remote_ssrc=1443723799, jitter_buffer_max_packets=200, jitter_buffer_fast_playout=false, jitter_buffer_min_delay_ms=0, jitter_buffer_enable_rtx_handling=false, enable_non_sender_rtt=false, decoder_factory=..., codec_pair_id=..., frame_decryptor=..., crypto_options=..., frame_transformer=...) at webrtc/audio/channel_receive.cc:1137
#3  webrtc::internal::(anonymous namespace)::CreateChannelReceive(webrtc::Clock*, webrtc::AudioState*, webrtc::NetEqFactory*, webrtc::AudioReceiveStream::Config const&, webrtc::RtcEventLog*) (clock=0x602000003bb0, audio_state=
    0x628000004100, neteq_factory=0x0, config=..., event_log=0x613000011f40) at webrtc/audio/audio_receive_stream.cc:79
#4  webrtc::internal::AudioReceiveStream::AudioReceiveStream(webrtc::Clock*, webrtc::PacketRouter*, webrtc::NetEqFactory*, webrtc::AudioReceiveStream::Config const&, rtc::scoped_refptr<webrtc::AudioState> const&, webrtc::RtcEventLog*) (this=
    0x61600005be80, clock=0x602000003bb0, packet_router=
    0x61c000060908, neteq_factory=0x0, config=..., audio_state=..., event_log=0x613000011f40)
    at webrtc/audio/audio_receive_stream.cc:103
#5  webrtc::internal::Call::CreateAudioReceiveStream(webrtc::AudioReceiveStream::Config const&) (this=
    0x620000001080, config=...) at webrtc/call/call.cc:954
#6  cricket::WebRtcVoiceMediaChannel::WebRtcAudioReceiveStream::WebRtcAudioReceiveStream(webrtc::AudioReceiveStream::Config, webrtc::Call*) (this=0x60b000010fd0, config=..., call=0x620000001080) at webrtc/media/engine/webrtc_voice_engine.cc:1220
#7  cricket::WebRtcVoiceMediaChannel::AddRecvStream(cricket::StreamParams const&) (this=0x619000017c80, sp=...)
    at webrtc/media/engine/webrtc_voice_engine.cc:2025
#8  cricket::BaseChannel::AddRecvStream_w(cricket::StreamParams const&) (this=0x619000018180, sp=...)
   ebrtc/pc/channel.cc:567
#9  cricket::BaseChannel::UpdateRemoteStreams_w(std::vector<cricket::StreamParams, std::allocator<cricket::StreamParams> > const&, webrtc::SdpType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)
    (this=0x619000018180, streams=std::vector of length 1, capacity 1 = {...}, type=webrtc::SdpType::kOffer, error_desc=0x7ffff2387e00)
    at webrtc/pc/channel.cc:725
#10 cricket::VoiceChannel::SetRemoteContent_w(cricket::MediaContentDescription const*, webrtc::SdpType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (this=0x619000018180, content=0x6130000003c0, type=webrtc::SdpType::kOffer, error_desc=0x7ffff2387e00)
    at webrtc/pc/channel.cc:926
#11 cricket::BaseChannel::SetRemoteContent(cricket::MediaContentDescription const*, webrtc::SdpType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (this=0x619000018180, content=0x6130000003c0, type=webrtc::SdpType::kOffer, error_desc=0x7ffff2387e00)
    at webrtc/pc/channel.cc:292

webrtc::AudioReceiveStreamBy webrtc::Callcreating, pass in webrtc::AudioReceiveStream::Config, which contains various configurations related to NACK, jitter buffer maximum size, payload type and codec mapping, and webrtc::Transport.

webrtc::voe::(anonymous namespace)::ChannelReceiveThe object's constructor is as follows:

ChannelReceive::ChannelReceive(
    Clock* clock,
    NetEqFactory* neteq_factory,
    AudioDeviceModule* audio_device_module,
    Transport* rtcp_send_transport,
    RtcEventLog* rtc_event_log,
    uint32_t local_ssrc,
    uint32_t remote_ssrc,
    size_t jitter_buffer_max_packets,
    bool jitter_buffer_fast_playout,
    int jitter_buffer_min_delay_ms,
    bool jitter_buffer_enable_rtx_handling,
    bool enable_non_sender_rtt,
    rtc::scoped_refptr<AudioDecoderFactory> decoder_factory,
    absl::optional<AudioCodecPairId> codec_pair_id,
    rtc::scoped_refptr<FrameDecryptorInterface> frame_decryptor,
    const webrtc::CryptoOptions& crypto_options,
    rtc::scoped_refptr<FrameTransformerInterface> frame_transformer)
    : worker_thread_(TaskQueueBase::Current()),
      event_log_(rtc_event_log),
      rtp_receive_statistics_(ReceiveStatistics::Create(clock)),
      remote_ssrc_(remote_ssrc),
      acm_receiver_(AcmConfig(neteq_factory,
                              decoder_factory,
                              codec_pair_id,
                              jitter_buffer_max_packets,
                              jitter_buffer_fast_playout)),
      _outputAudioLevel(),
      clock_(clock),
      ntp_estimator_(clock),
      playout_timestamp_rtp_(0),
      playout_delay_ms_(0),
      rtp_ts_wraparound_handler_(new rtc::TimestampWrapAroundHandler()),
      capture_start_rtp_time_stamp_(-1),
      capture_start_ntp_time_ms_(-1),
      _audioDeviceModulePtr(audio_device_module),
      _outputGain(1.0f),
      associated_send_channel_(nullptr),
      frame_decryptor_(frame_decryptor),
      crypto_options_(crypto_options),
      absolute_capture_time_interpolator_(clock) {
  RTC_DCHECK(audio_device_module);

  network_thread_checker_.Detach();

  acm_receiver_.ResetInitialDelay();
  acm_receiver_.SetMinimumDelay(0);
  acm_receiver_.SetMaximumDelay(0);
  acm_receiver_.FlushBuffers();

  _outputAudioLevel.ResetLevelFullRange();

  rtp_receive_statistics_->EnableRetransmitDetection(remote_ssrc_, true);
  RtpRtcpInterface::Configuration configuration;
  configuration.clock = clock;
  configuration.audio = true;
  configuration.receiver_only = true;
  configuration.outgoing_transport = rtcp_send_transport;
  configuration.receive_statistics = rtp_receive_statistics_.get();
  configuration.event_log = event_log_;
  configuration.local_media_ssrc = local_ssrc;
  configuration.rtcp_packet_type_counter_observer = this;
  configuration.non_sender_rtt_measurement = enable_non_sender_rtt;

  if (frame_transformer)
    InitFrameTransformerDelegate(std::move(frame_transformer));

  rtp_rtcp_ = ModuleRtpRtcpImpl2::Create(configuration);
  rtp_rtcp_->SetSendingMediaStatus(false);
  rtp_rtcp_->SetRemoteSSRC(remote_ssrc_);

  // Ensure that RTCP is enabled for the created channel.
  rtp_rtcp_->SetRTCPStatus(RtcpMode::kCompound);
}

webrtc::voe::(anonymous namespace)::ChannelReceiveThe execution process of the object's constructor is as follows:

  • An object is created and the two connections labeled 1 and 2webrtc::acm2::AcmReceiver in the figure below are established ;
  • An object is created , and the parameter configuration items passed webrtc::ModuleRtpRtcpImpl2in when creating this object point to the passed in parameters, and the two connections labeled 3 and 4 in the figure below are established ;configurationoutgoing_transportwebrtc::Transport

ChannelReceive Pipeline

The modules marked in green in the figure are webrtc::voe::(anonymous namespace)::ChannelReceivethe modules that have been connected at this stage, and the modules marked in yellow are those that have not yet been connected; the solid arrows indicate the connections that have been established at this stage, and the dotted arrows indicate the connections that have not yet been established.

In the function ChannelReceiveof RegisterReceiverCongestionControlObjects(), webrtc::PacketRouterit is taken in:

#0  webrtc::voe::(anonymous namespace)::ChannelReceive::RegisterReceiverCongestionControlObjects(webrtc::PacketRouter*)
    (this=0x61b000008c80, packet_router=0x61c000060908) at webrtc/audio/channel_receive.cc:786
#1  webrtc::internal::AudioReceiveStream::AudioReceiveStream(webrtc::Clock*, webrtc::PacketRouter*, webrtc::AudioReceiveStream::Config const&, rtc::scoped_refptr<webrtc::AudioState> const&, webrtc::RtcEventLog*, std::unique_ptr<webrtc::voe::ChannelReceiveInterface, std::default_delete<webrtc::voe::ChannelReceiveInterface> >)
    (this=0x61600005be80, clock=0x602000003bb0, packet_router=0x61c000060908, config=..., audio_state=..., event_log=0x613000011f40, channel_receive=std::unique_ptr<webrtc::voe::ChannelReceiveInterface> = {...}) at webrtc/audio/audio_receive_stream.cc:130
#2  webrtc::internal::AudioReceiveStream::AudioReceiveStream(webrtc::Clock*, webrtc::PacketRouter*, webrtc::NetEqFactory*, webrtc::AudioReceiveStream::Config const&, rtc::scoped_refptr<webrtc::AudioState> const&, webrtc::RtcEventLog*)
    (this=0x61600005be80, clock=0x602000003bb0, packet_router=0x61c000060908, neteq_factory=0x0, config=..., audio_state=..., event_log=0x613000011f40)
    at webrtc/audio/audio_receive_stream.cc:98
#3  webrtc::internal::Call::CreateAudioReceiveStream(webrtc::AudioReceiveStream::Config const&) (this=0x620000001080, config=...)
    at webrtc/call/call.cc:954

This operation also occurs webrtc::AudioReceiveStreamduring object creation. ChannelReceiveThe implementation of the RegisterReceiverCongestionControlObjects()function is as follows:

void ChannelReceive::RegisterReceiverCongestionControlObjects(
    PacketRouter* packet_router) {
  RTC_DCHECK_RUN_ON(&worker_thread_checker_);
  RTC_DCHECK(packet_router);
  RTC_DCHECK(!packet_router_);
  constexpr bool remb_candidate = false;
  packet_router->AddReceiveRtpModule(rtp_rtcp_.get(), remb_candidate);
  packet_router_ = packet_router;
}

Here webrtc::PacketRouterand webrtc::ModuleRtpRtcpImpl2are connected, and the connection numbered 5 in the previous figure is also established. NetEQ creates an audio decoder when an audio decoder is needed. This process will not be described here.

In this way, webrtc::AudioReceiveStreamthe status of the internal data processing pipeline becomes as shown in the following figure:

ChannelReceive Pipeline 2

webrtc::AudioReceiveStreamWhen the lifecycle function Start()is called, webrtc::AudioReceiveStreamit is added webrtc::AudioMixer:

#0  webrtc::internal::AudioState::AddReceivingStream(webrtc::AudioReceiveStream*) (this=0x628000004100, stream=0x61600005be80)
    at webrtc/audio/audio_state.cc:59
#1  webrtc::internal::AudioReceiveStream::Start() (this=0x61600005be80) at webrtc/audio/audio_receive_stream.cc:201
#2  cricket::WebRtcVoiceMediaChannel::WebRtcAudioReceiveStream::SetPlayout(bool) (this=0x60b000010fd0, playout=true)
    at webrtc/media/engine/webrtc_voice_engine.cc:1289
#3  cricket::WebRtcVoiceMediaChannel::SetPlayout(bool) (this=0x619000017c80, playout=true)
    at webrtc/media/engine/webrtc_voice_engine.cc:1865
#4  cricket::VoiceChannel::UpdateMediaSendRecvState_w() (this=0x619000018180) at webrtc/pc/channel.cc:811

Such webrtc::AudioReceiveStreama data processing pipeline is now completed. The status of the entire audio data processing pipeline becomes as shown in the following figure:

ChannelReceive Pipeline 3

The main process of receiving and processing WebRTC audio data

In the implementation of WebRTC audio data reception processing, the buffer that saves audio data packets received from the network is NetEQ webrtc::PacketBuffer. The process of receiving audio data packets and saving them into NetEQ webrtc::PacketBufferis as follows:

#0  webrtc::PacketBuffer::InsertPacketList(std::__cxx11::list<webrtc::Packet, std::allocator<webrtc::Packet> >*, webrtc::DecoderDatabase const&, absl::optional<unsigned char>*, absl::optional<unsigned char>*, webrtc::StatisticsCalculator*, unsigned long, unsigned long, int)
    (this=0x606000030e60, packet_list=0x7ffff2629810, decoder_database=..., current_rtp_payload_type=0x61600005c5c5, current_cng_rtp_payload_type=0x61600005c5c7, stats=0x61600005c180, last_decoded_length=480, sample_rate=16000, target_level_ms=80)
    at webrtc/modules/audio_coding/neteq/packet_buffer.cc:216
#1  webrtc::NetEqImpl::InsertPacketInternal(webrtc::RTPHeader const&, rtc::ArrayView<unsigned char const, -4711l>)
    (this=0x61600005c480, rtp_header=..., payload=...) at webrtc/modules/audio_coding/neteq/neteq_impl.cc:690
#2  webrtc::NetEqImpl::InsertPacket(webrtc::RTPHeader const&, rtc::ArrayView<unsigned char const, -4711l>)
    (this=0x61600005c480, rtp_header=..., payload=...) at webrtc/modules/audio_coding/neteq/neteq_impl.cc:170
#3  webrtc::acm2::AcmReceiver::InsertPacket(webrtc::RTPHeader const&, rtc::ArrayView<unsigned char const, -4711l>)
    (this=0x61b000008e48, rtp_header=..., incoming_payload=...) at webrtc/modules/audio_coding/acm2/acm_receiver.cc:136
#4  webrtc::voe::(anonymous namespace)::ChannelReceive::OnReceivedPayloadData(rtc::ArrayView<unsigned char const, -4711l>, webrtc::RTPHeader const&) (this=0x61b000008c80, payload=..., rtpHeader=...) at webrtc/audio/channel_receive.cc:340
#5  webrtc::voe::(anonymous namespace)::ChannelReceive::ReceivePacket(unsigned char const*, unsigned long, webrtc::RTPHeader const&)
    (this=0x61b000008c80, packet=0x60700002b670 "\220\357\037\261\377\364ف\a\350\224\177\276", <incomplete sequence \336>, packet_length=67, header=...) at webrtc/audio/channel_receive.cc:719
#6  webrtc::voe::(anonymous namespace)::ChannelReceive::OnRtpPacket(webrtc::RtpPacketReceived const&)
    (this=0x61b000008c80, packet=...) at webrtc/audio/channel_receive.cc:669
#7  webrtc::RtpDemuxer::OnRtpPacket(webrtc::RtpPacketReceived const&) (this=0x620000001330, packet=...)
    at webrtc/call/rtp_demuxer.cc:249
#8  webrtc::RtpStreamReceiverController::OnRtpPacket(webrtc::RtpPacketReceived const&)
    (this=0x6200000012d0, packet=...) at webrtc/call/rtp_stream_receiver_controller.cc:52
#9  webrtc::internal::Call::DeliverRtp(webrtc::MediaType, rtc::CopyOnWriteBuffer, long) (this=
    0x620000001080, media_type=webrtc::MediaType::AUDIO, packet=..., packet_time_us=1654829839622021)
    at webrtc/call/call.cc:1606
#10 webrtc::internal::Call::DeliverPacket(webrtc::MediaType, rtc::CopyOnWriteBuffer, long)
    (this=0x620000001080, media_type=webrtc::MediaType::AUDIO, packet=..., packet_time_us=1654829839622021)
    at webrtc/call/call.cc:1637
#11 cricket::WebRtcVoiceMediaChannel::OnPacketReceived(rtc::CopyOnWriteBuffer, long)::$_2::operator()() const
    (this=0x606000074c68) at webrtc/media/engine/webrtc_voice_engine.cc:2229

When playing, webrtc::AudioDeviceModulethe PCM data will eventually be requested from NetEQ, at which point NetEQ webrtc::PacketBufferwill take the data packets out and decode them. The audio sampling points contained in the audio data packets transmitted over the network are webrtc::AudioDeviceModulenot necessarily exactly the same as the audio sampling points requested each time. For example, for audio with a sampling rate of 48kHz, webrtc::AudioDeviceModule10ms of data is requested each time, which is 480 sampling points. Each encoded frame of the OPUS audio codec contains 20ms of data, which is 960 sampling points. In this way, webrtc::AudioDeviceModuleafter NetEQ returns the sample points requested for each time, there may be remaining decoded audio data, which requires a special PCM data buffer. This data buffer is NetEQ's webrtc::SyncBuffer.

webrtc::AudioDeviceModuleThe general process of requesting playback data is as follows:

#0  webrtc::SyncBuffer::GetNextAudioInterleaved (this=0x606000062a80, requested_len=480, output=0x628000010110)
    at webrtc/modules/audio_coding/neteq/sync_buffer.cc:86
#1  webrtc::NetEqImpl::GetAudioInternal (this=0x61600005c480, audio_frame=0x628000010110, muted=0x7fffdc92a990, action_override=...)
    at webrtc/modules/audio_coding/neteq/neteq_impl.cc:939
#2  webrtc::NetEqImpl::GetAudio (this=0x61600005c480, audio_frame=0x628000010110, muted=0x7fffdc92a990, current_sample_rate_hz=0x7fffdcc933b0, 
    action_override=...) at webrtc/modules/audio_coding/neteq/neteq_impl.cc:239
#3  webrtc::acm2::AcmReceiver::GetAudio (this=0x61b000008e48, desired_freq_hz=48000, audio_frame=0x628000010110, muted=0x7fffdc92a990)
    at webrtc/modules/audio_coding/acm2/acm_receiver.cc:151
#4  webrtc::voe::(anonymous namespace)::ChannelReceive::GetAudioFrameWithInfo (this=0x61b000008c80, sample_rate_hz=48000, 
    audio_frame=0x628000010110) at webrtc/audio/channel_receive.cc:388
#5  webrtc::internal::AudioReceiveStream::GetAudioFrameWithInfo (this=0x61600005be80, sample_rate_hz=48000, audio_frame=0x628000010110)
    at webrtc/audio/audio_receive_stream.cc:393
#6  webrtc::AudioMixerImpl::GetAudioFromSources (this=0x61d000021280, output_frequency=48000)
    at webrtc/modules/audio_mixer/audio_mixer_impl.cc:205
#7  webrtc::AudioMixerImpl::Mix (this=0x61d000021280, number_of_channels=2, audio_frame_for_mixing=0x6280000042e8)
    at webrtc/modules/audio_mixer/audio_mixer_impl.cc:175
#8  webrtc::AudioTransportImpl::NeedMorePlayData (this=0x6280000041e0, nSamples=441, nBytesPerSample=4, nChannels=2, samplesPerSec=44100, 
    audioSamples=0x61c000080080, nSamplesOut=@0x7fffdc929c00: 0, elapsed_time_ms=0x7fffdc929cc0, ntp_time_ms=0x7fffdc929ce0)
    at webrtc/audio/audio_transport_impl.cc:215
#9  webrtc::AudioDeviceBuffer::RequestPlayoutData (this=0x614000010058, samples_per_channel=441)
    at webrtc/modules/audio_device/audio_device_buffer.cc:303
#10 webrtc::AudioDeviceLinuxPulse::PlayThreadProcess (this=0x61900000ff80)
    at webrtc/modules/audio_device/linux/audio_device_pulse_linux.cc:2106

Let’s look at the audio data processing, encoding and sending process of WebRTC

Take a more careful look at the audio data processing, encoding and sending process of WebRTC, and take network confrontation into account more completely. The audio data processing, encoding and sending process of WebRTC and related modules are as follows:

WebRTC Audio Send Pipeline

In the audio data processing, encoding and sending process of WebRTC, the encoder plays a huge role in network confrontation. WebRTC uses a module called audio network adapter (ANA) to adjust the encoding process according to network conditions.

The pacing module smoothly sends media data to the network. The congestion control module affects the process of sending media data by affecting the pacing module to achieve the purpose of controlling congestion.

Overview of Audio Network Countermeasures with WebRTC

From the audio collection, processing, encoding and sending process of WebRTC, as well as the audio receiving, decoding, processing and playback process, the complex mechanism of WebRTC's audio network confrontation can be roughly sorted out:

  • Audio network adapter (ANA), ANA does network confrontation by affecting the encoding process according to network conditions, and is mainly used in OPUS encoders. ANA can affect 5 parameters of the encoding process:
    • In-band FEC, the OPUS encoder can generate in-band FEC. When there is packet loss, the lost information can be partially restored through the FEC information, although the information quality of FEC may not be very high; used to resist packet loss;
    • DTX, when the data to be encoded is empty data for a long time, DTX packets can be generated to reduce the code rate. This mechanism may cause greater delay;
    • Code rate;
    • Frame length, OPUS supports encoding frame length from 10ms to 120 ms;
    • Number of channels.
  • pacing, smooth sending of data packets.
  • congestion_controller/goog_cc, congestion control detects network conditions and affects the sending rhythm by affecting pacing.
  • NACK, when a packet is lost, the receiving end requests the sending end to retransmit some data packets; the NACK list is maintained by NetEQ.
  • Jitter buffer, reorders data packets and resists network jitter. NetEQ Where to save received audio network packets.
  • PLC, when losing packets, generates lost data. Executed by NetEQ.

I haven't seen that WebRTC has an implementation of the audio out-of-band FEC mechanism.

Reference article

Dry information | One article to understand how Tencent Conference ensures high-definition audio under complex networks

Done.

Guess you like

Origin blog.csdn.net/tq08g2z/article/details/125228397