Those things about the audio of the WebRTC series

WebRTC is composed of three modules: voice engine, video engine and network transmission. The voice engine is one of the most valuable technologies in WebRTC, which realizes the collection, pre-processing, encoding, sending, receiving, decoding, mixing, and audio data of audio data. A series of processing processes such as post-processing and playback.

The audio engine mainly includes: audio device module ADM, audio encoder factory, audio decoder factory, mixer, audio pre-processing APM.

Audio working mechanism

If you want to understand the audio engine systematically, you first need to understand the core implementation class and audio data flow. Next, we will briefly analyze it.

Audio engine core class diagram:

Audio engine WebrtcVoiceEngine mainly includes audio device module AudioDeviceModule, audio mixer AudioMixer, audio 3A processor AudioProcessing, audio management class AudioState, audio encoder factory AudioEncodeFactory, audio decoder factory AudioDecodeFactory, voice media channel including sending and receiving, etc.

1. Audio device module AudioDeviceModule is mainly responsible for the hardware device layer, including the collection and playback of audio data, and related operations of hardware devices.

2. Audio Mixer AudioMixer is mainly responsible for the mixing of audio sending data (device collection and audio mixing), audio playback data mixing (multi-channel receiving audio and audio mixing).

3. AudioProcessing, an audio 3A processor, is mainly responsible for the pre-processing of audio collection data, including echo cancellation AEC, automatic gain control AGC, and noise suppression NS. APM is divided into two streams, a near-end stream and a far-end stream. Near-end flow refers to incoming data from the microphone; far-end flow refers to received data.

4. Audio management class AudioState includes audio device module ADM, audio pre-processing module APM, audio mixer Mixer and data transfer center AudioTransportImpl.

5. Audio Encoder Factory AudioEncodeFactory contains Opus, iSAC, G711, G722, iLBC, L16 and other codecs.

6. Audio Decode Factory AudioDecodeFactory contains Opus, iSAC, G711, G722, iLBC, L16 and other codecs.

Audio workflow flow chart:
 

1. The initiator collects sound through the microphone
2. The initiator sends the collected sound signal to the APM module for echo cancellation AEC, noise suppression NS, and automatic gain control processing AGC 3. The initiator sends the processed data to the
encoder 4. The initiator
sends the encoded data through the RtpRtcp transmission module, and transmits it to the receiving end through the Internet network.
5. The receiving end receives the audio data transmitted from the network, and first sends it to the NetEQ module for jitter elimination. Packet concealment decoding and other operations
6. The receiving end sends the processed audio data to the sound card device for playback.

The NetEQ module is the core module of the Webrtc voice engine

In the NetEQ module, it is roughly divided into MCU module and DSP module. The MCU is mainly responsible for the calculation and statistics of delay and jitter, and generates corresponding control commands. The DSP module is responsible for receiving and processing the corresponding data packets according to the control commands of the MCU, and transmitting them to the next link.

audio data flow

According to the audio workflow flow chart introduced above, we will continue to refine the audio data flow. It will focus on the important role played by the data transfer center AudioTransportImpl in the whole link.
 

AudioTransportImpl, the data transfer center, implements the collection data processing interface RecordDataIsAvailbale and the playback data processing interface NeedMorePlayData. RecordDataIsAvailbale is responsible for collecting audio data processing and distributing it to all sending Streams. NeedMorePlayData is responsible for mixing all received Streams, and then sends it to APM as a reference signal for processing, and finally resamples it to the sampling rate requested for output RecordDataIsAvailbale internal main

process:

  1. The audio data collected by the hardware is directly resampled to the sending sampling rate
  2. 3A processing is performed on the resampled audio data by audio pre-processing
  3. VAD processing
  4. Digital gain to adjust acquisition volume
  5. Audio data callback external for external pre-processing
  6. All the audio data that needs to be sent at the mixing sender, including collected data and audio data
  7. Calculating Energy Values ​​of Audio Data
  8. Distribute it to all sending Streams

The main process inside NeedMorePlayData:

  1. Mixes the audio data of all received Streams

1.1 Calculate the output sampling rate CalculateOutputFrequency()
1.2 Collect audio data from Source GetAudioFromSources(), select the three channels with no mute and the largest energy for mixing
1.3 Perform the mixing operation FrameCombiner::Combine()

Under certain conditions, perform noise injection, Used on the acquisition side as a reference signal

  1. Mix the local audio
  2. Digital gain to adjust playback volume
  3. Audio data callback external for external pre-processing
  4. Calculating Energy Values ​​of Audio Data
  5. Resample the audio to the requested output sample rate
  6. Send audio data to APM as a reference signal for processing

From the data flow in the above figure, why do we need FineAudioBuffer and AudioDeviceBuffer? Because WebRTC's audio pipeline only supports processing 10 ms data, different operating system platforms provide audio data with different collection and playback durations, and different sampling rates also provide data with different durations. For example, on iOS, 16K sampling rate will provide 128 frames of audio data of 8ms; 8K sampling rate will provide 128 frames of audio data of 16ms; 48K sampling rate will provide 512 frames of audio data of 10.67ms. Audio related

changes

The implementation of audio profile supports two scenarios of Voip and Music, and realizes a comprehensive technical strategy of sampling rate, encoding bit rate, encoding mode, and number of channels. iOS realizes the separation of acquisition and playback threads, and supports dual-channel playback.

  1. The adaptation scheme for the compatibility of audio 3A parameters is delivered.
  2. Adaptation of headset scene, adaptation of Bluetooth headset and ordinary headset, dynamic 3A switching adaptation.
  3. The Noise_Injection noise injection algorithm, as a reference signal, plays a particularly obvious role in the echo cancellation of the headphone scene.
  4. Support local audio files and network audio files http&https.
  5. The implementation of Audio Nack, which improves the anti-packet loss capability of audio, is currently in the process of In-band FEC.
  6. Audio processing has been optimized for single-speak and double-speak.
  7. iOS research on Built-In AGC:

(1) Built-In AGC is effective for Speech and Music, but has no effect on noise and ambient noise.
(2) The gain of the microphone hardware of different models is different, iPhone 7 Plus > iPhone 8 > iPhone X; therefore, when the software AGC and hardware AGC are both turned off, the sound level heard by the far end will be different.
(3) In addition to the switchable AGC provided by iOS, there is also an AGC that will work all the time to fine-tune the signal level; guess that the AGC that has been working is the analog AGC that comes with iOS, which may be related to hardware and has no API. Can be switched, and a switchable AGC is a digital AGC.
(4) On most iOS models, the volume of the input will decrease in the speaker mode "after the earphones are plugged in again". The current solution is to add a preGain to pull the input volume back to normal after the headphones are plugged in again.

Audio Troubleshooting

Let me share with you some of the most common audio phenomena and their causes:

Those things about audio in the original  WebRTC series - Nuggets

 

★The business card at the end of the article can receive audio and video development learning materials for free, including (FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmaps, etc.

see below!

 

Guess you like

Origin blog.csdn.net/yinshipin007/article/details/132209247
Recommended