Introduction to audio technology stack of WebRTC audio system-2

1.8 ACM module

ACM is the abbreviation of audio coding module.

The audio coding module of WebRTC can handle audio receiving and sending, and acm2the directory is the API implementation of receiving and sending. Each audio sending frame uses audio data with a duration of 10ms, which is Add10MsData()provided to the audio encoding module through the interface. The audio encoding module uses the corresponding encoder to encode the audio frame, and transmits the encoded data to the pre-registered audio packet callback. This callback packages the encoded audio into RT packets and sends them out through the transport layer. WebRTC's built-in audio encoders include G711, G722, ilbc, isac, opus, pcm16b, etc. The audio network adapter is an audio encoder (currently limited to OPU ) provides additional functionality to adapt the audio encoder to network conditions (bandwidth, packet loss, etc.). The audio receiving package is IncomingPacket()implemented, and the received data package will be processed by the jitter buffer (NetEq) module, and the processing content includes decoding. The audio decoder is created through the decoder factory class, and the decoded data is obtained through PlayData10Ms().
Please add a picture description

1.8.1 Encoding module interface class

The interface here includes two parts of encoding and decoding,

The core content of the audio encoding interface class is as follows:

// modules/audio_coding/include/audio_coding_module.h 
 30 // forward declarations
 31 class AudioDecoder;
 32 class AudioEncoder;
 33 class AudioFrame;
 34 struct RTPHeader;
 62 class AudioCodingModule {
    
    
 66  public:
 67   struct Config {
    
    
 68     explicit Config(
 69         rtc::scoped_refptr<AudioDecoderFactory> decoder_factory = nullptr);
 70     Config(const Config&);
 71     ~Config();
 72
 73     NetEq::Config neteq_config;
 74     Clock* clock;
   //工厂类创建解码器
 75     rtc::scoped_refptr<AudioDecoderFactory> decoder_factory;
   //工厂类创建NetEq
 76     NetEqFactory* neteq_factory = nullptr;
 77   };
 //这是设计模式中类和对象的创建方法,即通过这个static 方法创建  
 79   static AudioCodingModule* Create(const Config& config);

  //这里定义成了纯虚函数,这种是接口类的常用方法,纯虚函数的好处是子类必须实现
  //这些类对应的方法,否则编译报错。
 136   virtual int32_t Add10MsData(const AudioFrame& audio_frame) = 0;
 172   virtual int32_t InitializeReceiver() = 0;
 192   virtual int32_t IncomingPacket(const uint8_t* incoming_payload,
 193                                  size_t payload_len_bytes,
 194                                  const RTPHeader& rtp_header) = 0;
 216   virtual int32_t PlayoutData10Ms(int32_t desired_freq_hz,
 217                                   AudioFrame* audio_frame,
 218                                   bool* muted) = 0;
 }

The factory class design pattern is used here to implement the codec interface class. This interface does not need to be changed for most real-time audio and video application scenarios. Compared with encoding, the received data to be decoded is transmitted through the network. , which will cause problems such as packet loss, jitter, delayed arrival, and out-of-sequence, so the jitter buffer function needs to be implemented. Therefore, when AudioCodingModuleImpl implements the AudioCodingModule method of the encoding interface class, it defines the AcmReceiver (NetEq and decoding) member, the received data is sent to this module to be decoded.

 //acm2/audio_coding_module.cc
42 class AudioCodingModuleImpl final : public AudioCodingModule {
    
    
  //override是c++中重写关键词,即其集成的父类中必须有Add10MsData这个虚函数定义,否则编译报错
58   // Add 10 ms of raw (PCM) audio data to the encoder.
59   int Add10MsData(const AudioFrame& audio_frame) override;
72   // Initialize receiver, resets codec database etc.
73   int InitializeReceiver() override;
77   // Incoming packet from network parsed and ready for decode.
78   int IncomingPacket(const uint8_t* incoming_payload,
79                      const size_t payload_length,
80                      const RTPHeader& rtp_info) override;
82   // Get 10 milliseconds of raw audio data to play out, and
83   // automatic resample to the requested frequency if > 0.
84   int PlayoutData10Ms(int desired_freq_hz,
85                       AudioFrame* audio_frame,
86                       bool* muted) override;

98  private:
128   int Add10MsDataInternal(const AudioFrame& audio_frame, InputData* input_data)
129       RTC_EXCLUSIVE_LOCKS_REQUIRED(acm_mutex_);
130
131   // TODO(bugs.webrtc.org/10739): change `absolute_capture_timestamp_ms` to
132   // int64_t when it always receives a valid value.
133   int Encode(const InputData& input_data,
134              absl::optional<int64_t> absolute_capture_timestamp_ms)

137   int InitializeReceiverSafe() RTC_EXCLUSIVE_LOCKS_REQUIRED(acm_mutex_);

162   rtc::Buffer encode_buffer_ RTC_GUARDED_BY(acm_mutex_);
163   uint32_t expected_codec_ts_ RTC_GUARDED_BY(acm_mutex_);
164   uint32_t expected_in_ts_ RTC_GUARDED_BY(acm_mutex_);
165   acm2::ACMResampler resampler_ RTC_GUARDED_BY(acm_mutex_);
166   acm2::AcmReceiver receiver_;  // AcmReceiver has it's own internal lock.
}
//这是设计模式一种思想,通过Create方法创建实现,但返回类型是接口类型,这样实现了接口和实现的隔离,当接口类和实现不在同一个库中时,可以做到只需要直接连接重新编译的实现,而接口类所在库不用再次编译,开发上进行了隔离。
209 AudioCodingModuleImpl::AudioCodingModuleImpl(
210     const AudioCodingModule::Config& config)
211     : expected_codec_ts_(0xD87F3F9F),
212       expected_in_ts_(0xD87F3F9F),
213       receiver_(config),
214       bitrate_logger_("WebRTC.Audio.TargetBitrateInKbps"),
215       encoder_stack_(nullptr),
216       previous_pltype_(255),
217       receiver_initialized_(false),
218       first_10ms_data_(false),
219       first_frame_(true),
220       packetization_callback_(NULL),
221       codec_histogram_bins_log_(),
222       number_of_consecutive_empty_packets_(0) {
    
    
223   if (InitializeReceiverSafe() < 0) {
    
    
224     RTC_LOG(LS_ERROR) << "Cannot initialize receiver";
225   }
226   RTC_LOG(LS_INFO) << "Created";
227 }
633 AudioCodingModule* AudioCodingModule::Create(const Config& config) {
    
    
634   return new AudioCodingModuleImpl(config);
635 }

1.8.2 Encoded data stream

The received data packet is sent to the encoding module of the registration number for encoding after legalization and on-demand mixing. Here, it focuses on decoding the data stream. encoder_stack_ is a smart pointer of the AudioEncoder class.

231 int32_t AudioCodingModuleImpl::Encode(
232     const InputData& input_data,
233     absl::optional<int64_t> absolute_capture_timestamp_ms) {
    
    
234   // TODO(bugs.webrtc.org/10739): add dcheck that
235   // `audio_frame.absolute_capture_timestamp_ms()` always has a value.
236   AudioEncoder::EncodedInfo encoded_info;
237   uint8_t previous_pltype;
264   encoded_info = encoder_stack_->Encode(
265       rtp_timestamp,
266       rtc::ArrayView<const int16_t>(
267           input_data.audio,
268           input_data.audio_channel * input_data.length_per_channel),
269       &encode_buffer_);
}

334 // Add 10MS of raw (PCM) audio data to the encoder.
335 int AudioCodingModuleImpl::Add10MsData(const AudioFrame& audio_frame) {
    
    
336   MutexLock lock(&acm_mutex_);
//做声道数、采样率等合法性检查,适当的混音
337   int r = Add10MsDataInternal(audio_frame, &input_data_);
338   // TODO(bugs.webrtc.org/10739): add dcheck that
339   // `audio_frame.absolute_capture_timestamp_ms()` always has a value.
340   return r < 0
341              ? r
342              : Encode(input_data_, audio_frame.absolute_capture_timestamp_ms());
343 }

Encode is an interface class provided to the upper layer. It actually calls EncodeImpl() of the protected type internally to implement encoding, so each codec (opus, pcm16, g711) must implement this method.

1.8.3 Packet receiving and decoding data flow

Compared with encoding, since the network delay generally required by real-time audio references is within 300ms, jitter processing is required for receiving packets, and decoding is called by the module that handles jitter, so the class AcmReceiver is used to define the public accepting module Content.

559 // Incoming packet from network parsed and ready for decode.
560 int AudioCodingModuleImpl::IncomingPacket(const uint8_t* incoming_payload,
561                                           const size_t payload_length,
562                                           const RTPHeader& rtp_header) {
    
    
563   RTC_DCHECK_EQ(payload_length == 0, incoming_payload == nullptr);
564   return receiver_.InsertPacket(
565       rtp_header,
566       rtc::ArrayView<const uint8_t>(incoming_payload, payload_length));
567 }

1.8.4 Get the decoded data stream

The decoded data stream is in the NetEq module, there is a real-time concept and requirement here, so the method of the receiver_ class is directly called to obtain the data.

569 // Get 10 milliseconds of raw audio data to play out.
570 // Automatic resample to the requested frequency.
571 int AudioCodingModuleImpl::PlayoutData10Ms(int desired_freq_hz,
572                                            AudioFrame* audio_frame,
573                                            bool* muted) {
    
    
574   // GetAudio always returns 10 ms, at the requested sample rate.
575   if (receiver_.GetAudio(desired_freq_hz, audio_frame, muted) != 0) {
    
    
576     RTC_LOG(LS_ERROR) << "PlayoutData failed, RecOut Failed";
577     return -1;
578   }
579   return 0;
580 }

Next, look at the AudioEncoder class and the AcmReceiver class that implement encoding and reception.

1.8.5AudioEncoder class

AudioEncoder is the interface class of the encoder, which is defined in api/audio_codecs/audio_encoder.h. The encoding class is a general type, because specific implementations such as Opus and G711 will inherit this class, so this class defines some common content of the encoder , such as encoding bit rate, encoding, FEC, DTX, etc. In addition, because it is a real-time scene, the network conditions will affect the selection of the optimal parameters of the encoder. Similarly, the implementation related to network statistics is ignored here.

 64 // This is the interface class for encoders in AudioCoding module. Each codec
 65 // type must have an implementation of this class.
 66 class AudioEncoder {
    
    
 67  public:
 68   // Used for UMA logging of codec usage. The same codecs, with the
 69   // same values, must be listed in
 70   // src/tools/metrics/histograms/histograms.xml in chromium to log
 71   // correct values.
 72   enum class CodecType {
    
    
 73     kOther = 0,  // Codec not specified, and/or not listed in this enum
 74     kOpus = 1,
 75     kIsac = 2,
 76     kPcmA = 3,
 77     kPcmU = 4,
 78     kG722 = 5,
 79     kIlbc = 6,
 80
 81     // Number of histogram bins in the UMA logging of codec types. The
 82     // total number of different codecs that are logged cannot exceed this
 83     // number.
 84     kMaxLoggedAudioCodecTypes
 85   };

 144   // Accepts one 10 ms block of input audio (i.e., SampleRateHz() / 100 *
 145   // NumChannels() samples). Multi-channel audio must be sample-interleaved.
 146   // The encoder appends zero or more bytes of output to `encoded` and returns
 147   // additional encoding information.  Encode() checks some preconditions, calls
 148   // EncodeImpl() which does the actual work, and then checks some
 149   // postconditions.
 150   EncodedInfo Encode(uint32_t rtp_timestamp,
 151                      rtc::ArrayView<const int16_t> audio,
 152                      rtc::Buffer* encoded);

 154   // Resets the encoder to its starting state, discarding any input that has
 155   // been fed to the encoder but not yet emitted in a packet.
 156   virtual void Reset() = 0;
 157
 158   // Enables or disables codec-internal FEC (forward error correction). Returns
 159   // true if the codec was able to comply. The default implementation returns
 160   // true when asked to disable FEC and false when asked to enable it (meaning
 161   // that FEC isn't supported).
 162   virtual bool SetFec(bool enable);
 163
 164   // Enables or disables codec-internal VAD/DTX. Returns true if the codec was
 165   // able to comply. The default implementation returns true when asked to
 166   // disable DTX and false when asked to enable it (meaning that DTX isn't
 167   // supported).
 168   virtual bool SetDtx(bool enable);
 169
 170   // Returns the status of codec-internal DTX. The default implementation always
 171   // returns false.
 172   virtual bool GetDtx() const;
 174   // Sets the application mode. Returns true if the codec was able to comply.
 175   // The default implementation just returns false.
 176   enum class Application {
    
     kSpeech, kAudio };
 177   virtual bool SetApplication(Application application);
 //在接口类中调用子类的编码具体实现,以实现用同一个接口调用不同的编码器
 //这是每一个不同类型编码器必须实现的接口方法
  protected:
  // Subclasses implement this to perform the actual encoding. Called by
  // Encode().
  virtual EncodedInfo EncodeImpl(uint32_t rtp_timestamp,
                                 rtc::ArrayView<const int16_t> audio,
                                 rtc::Buffer* encoded) = 0;
 }

The implementation of this interface class encoding API Encodeis located in api/audio_codecs/audio_encoder.cc

//编码的信息存放于EncodedInfo
AudioEncoder::EncodedInfo AudioEncoder::Encode(
   uint32_t rtp_timestamp, //rtp时间戳的作用是用于音视频播放和同步
   rtc::ArrayView<const int16_t> audio, //带编码PCM数据
   rtc::Buffer* encoded) {
    
    //编码后的数据
 TRACE_EVENT0("webrtc", "AudioEncoder::Encode");
 RTC_CHECK_EQ(audio.size(),
              static_cast<size_t>(NumChannels() * SampleRateHz() / 100));

 const size_t old_size = encoded->size();
 EncodedInfo info = EncodeImpl(rtp_timestamp, audio, encoded);
 RTC_CHECK_EQ(encoded->size() - old_size, info.encoded_bytes);
 return info;
}

1.8.6 Opus Encoder Class Implementation

The open source implementation of the opus third-party library is based on c code. The open source third-party library is placed in the src/third_party directory. The advantage of this is to isolate the third-party library from webrtc. When the third-party library is not changed, it will not It needs to be recompiled, and the time saved for a single third library may not be much, but when there are many third libraries, compilation is still very time-consuming. Using this decoupling design can greatly improve development efficiency.

// modules/audio_coding/codecs/opus/audio_encoder_opus.h

 32 class AudioEncoderOpusImpl final : public AudioEncoder {
    
    
 33  public:
 136  protected:
 137   EncodedInfo EncodeImpl(uint32_t rtp_timestamp,
 138                          rtc::ArrayView<const int16_t> audio,
 139                          rtc::Buffer* encoded) override;
 141  private:
 146   static void AppendSupportedEncoders(std::vector<AudioCodecSpec>* specs);
 185   std::vector<int16_t> input_buffer_;
 186   OpusEncInst* inst_;
 199   friend struct AudioEncoderOpus;
 }
 ``
 其实现位于
 ```c
 // modules/audio_coding/codecs/opus/audio_encoder_opus.cc
648 AudioEncoder::EncodedInfo AudioEncoderOpusImpl::EncodeImpl(
649     uint32_t rtp_timestamp,
650     rtc::ArrayView<const int16_t> audio,
651     rtc::Buffer* encoded) {
    
    
652   MaybeUpdateUplinkBandwidth();
654   if (input_buffer_.empty())
655     first_timestamp_in_buffer_ = rtp_timestamp;
656
657   input_buffer_.insert(input_buffer_.end(), audio.cbegin(), audio.cend());
658   if (input_buffer_.size() <
659       (Num10msFramesPerPacket() * SamplesPer10msFrame())) {
    
    
660     return EncodedInfo();
661   }
//调用WebRtcOpus_Encode完成具体编码,这个函数实现于opus_interface.cc,interface.cc就是对第三方opus库函数的封装和调用,其主要是调用opus_encode或opus_multistream_encode编码。
665   const size_t max_encoded_bytes = SufficientOutputBufferSize();
666   EncodedInfo info;
667   info.encoded_bytes = encoded->AppendData(
668       max_encoded_bytes, [&](rtc::ArrayView<uint8_t> encoded) {
    
    
669         int status = WebRtcOpus_Encode(
670             inst_, &input_buffer_[0],
671             rtc::CheckedDivExact(input_buffer_.size(), config_.num_channels),
672             rtc::saturated_cast<int16_t>(max_encoded_bytes), encoded.data());
673
674         RTC_CHECK_GE(status, 0);  // Fails only if fed invalid data.
675
676         return static_cast<size_t>(status);
677       });
678   input_buffer_.clear();
679
680   bool dtx_frame = (info.encoded_bytes <= 2);
681

682   // Will use new packet size for next encoding.
683   config_.frame_size_ms = next_frame_length_ms_;
684
685   if (adjust_bandwidth_ && bitrate_changed_) {
    
    
686     const auto bandwidth = GetNewBandwidth(config_, inst_);
687     if (bandwidth) {
    
    
688       RTC_CHECK_EQ(0, WebRtcOpus_SetBandwidth(inst_, *bandwidth));
689     }
690     bitrate_changed_ = false;
691   }

As mentioned above, this class must be implemented by every encoder. For example, the implementation of ilbc encoding is as follows:

AudioEncoder::EncodedInfo AudioEncoderIlbcImpl::EncodeImpl(
    uint32_t rtp_timestamp,
    rtc::ArrayView<const int16_t> audio,
    rtc::Buffer* encoded) {
    
    
  // Save timestamp if starting a new packet.
  if (num_10ms_frames_buffered_ == 0)
    first_timestamp_in_buffer_ = rtp_timestamp;

  // Buffer input.
  std::copy(audio.cbegin(), audio.cend(),
            input_buffer_ + kSampleRateHz / 100 * num_10ms_frames_buffered_);

  // If we don't yet have enough buffered input for a whole packet, we're done
  // for now.
  if (++num_10ms_frames_buffered_ < num_10ms_frames_per_packet_) {
    
    
    return EncodedInfo();
  }

  // Encode buffered input.
  RTC_DCHECK_EQ(num_10ms_frames_buffered_, num_10ms_frames_per_packet_);
  num_10ms_frames_buffered_ = 0;
  //同样的其调用第三方库中的WebRtcIlbcfix_Encode去实现具体的编码
  size_t encoded_bytes = encoded->AppendData(
      RequiredOutputSizeBytes(), [&](rtc::ArrayView<uint8_t> encoded) {
    
    
        const int r = WebRtcIlbcfix_Encode(
            encoder_, input_buffer_,
            kSampleRateHz / 100 * num_10ms_frames_per_packet_, encoded.data());
        RTC_CHECK_GE(r, 0);

        return static_cast<size_t>(r);
      });

  RTC_DCHECK_EQ(encoded_bytes, RequiredOutputSizeBytes());

  EncodedInfo info;
  info.encoded_bytes = encoded_bytes;
  info.encoded_timestamp = first_timestamp_in_buffer_;
  info.payload_type = payload_type_;
  info.encoder_type = CodecType::kIlbc;
  return info;
}

So far, we can see how the specific encoder is derived from the encoder interface class AudioEncoder. Then there is another question, that is, how are these derived encoder classes finally called by the application? This involves the concepts of call, stream, and channel. Channel includes send and receive. The channel is linked to ssrc and used to indicate the source. For example, in video communication, the sound sent may be the sound collected by the microphone, or the sound played on the shared ppt. These two kinds of sound It is distinguished by ssrc, which is the same for reception. When receiving multiple people’s conference audio at the same time, the ssrc is also different. Each ssrc corresponds to a channel, because the processing method of each channel may be different. For example, the microphone needs to collect signals. Voice enhancement is used, but the voice shared by ppt is not needed. The channel mainly contains information such as encoder and rtp encapsulation and end-to-end encryption. Stream is also divided into sending and receiving, belonging to the transport layer, that is, sending and receiving rtp/rtcp packets packaged in the channel layer. The concept of call is a two-way call conference. Some implementations may have different names, but whether it is a two-person or one-to-one conference, the concept of call is a must. The call class is the call class used in the peer connection example.

//pc/peer_connection.h
  // Creates a PeerConnection and initializes it with the given values.
  // If the initialization fails, the function releases the PeerConnection
  // and returns nullptr.
  //
  // Note that the function takes ownership of dependencies, and will
  // either use them or release them, whether it succeeds or fails.
  static RTCErrorOr<rtc::scoped_refptr<PeerConnection>> Create(
      rtc::scoped_refptr<ConnectionContext> context,
      const PeerConnectionFactoryInterface::Options& options,
      std::unique_ptr<RtcEventLog> event_log,
      std::unique_ptr<Call> call,
      const PeerConnectionInterface::RTCConfiguration& configuration,
      PeerConnectionDependencies dependencies);

The detailed implementation of channel, stream and call will not be expanded here, and will be analyzed at the specific level. So far, the implementation and use process of the AudioEncoder class has ended, and the next step is the function and implementation of the receiving class of the jitter buffer.

1.8.7 AcmReceiver

Here is a list of some methods for receiving in AudioCodingModuleImpl called in third_party/webrtc/modules/audio_coding/acm2/audio_coding_module.cc.

 acm2::AcmReceiver receiver_;  // AcmReceiver has it's own internal lock.

 receiver_.SetCodecs(codecs);
 receiver_.InsertPacket(
      rtp_header,
      rtc::ArrayView<const uint8_t>(incoming_payload, payload_length));
 receiver_.GetAudio(desired_freq_hz, audio_frame, muted);
 receiver_.GetNetworkStatistics(statistics);

The main method of AcmRecevier is called in the above code segment 417-422, and the two received data streams, InsertPacket and GetAudio, internally call the neteq_ method of the same name. The method of creation is as follows, mainly by calling the neteq project class to create the object, and Associate it with the decoder.

  37 std::unique_ptr<NetEq> CreateNetEq(
 38     NetEqFactory* neteq_factory,
 39     const NetEq::Config& config,
 40     Clock* clock,
 41     const rtc::scoped_refptr<AudioDecoderFactory>& decoder_factory) {
    
    
 42   if (neteq_factory) {
    
    
 43     return neteq_factory->CreateNetEq(config, decoder_factory, clock);
 44   }
 45   return DefaultNetEqFactory().CreateNetEq(config, decoder_factory, clock);
 46 }

 50 AcmReceiver::AcmReceiver(const AudioCodingModule::Config& config)
 51     : last_audio_buffer_(new int16_t[AudioFrame::kMaxDataSizeSamples]),
 52       neteq_(CreateNetEq(config.neteq_factory,
 53                          config.neteq_config,
 54                          config.clock,
 55                          config.decoder_factory)),
 56       clock_(config.clock),
 57       resampled_last_output_frame_(true) {
    
    
 58   RTC_DCHECK(clock_);
 59   memset(last_audio_buffer_.get(), 0,
 60          sizeof(int16_t) * AudioFrame::kMaxDataSizeSamples);
 61 }

neteq_ is also created by the method of the factory class. There are two types of NetEqFactory, one is default and the other is customer. NetEq is an interface class, which returns a NetEq object.

// Creates NetEq instances using the settings provided in the config struct.
class NetEqFactory {
    
    
public:
virtual ~NetEqFactory() = default;

// Creates a new NetEq object, with parameters set in `config`. The `config`
// object will only have to be valid for the duration of the call to this
// method.
virtual std::unique_ptr<NetEq> CreateNetEq(
    const NetEq::Config& config,
    const rtc::scoped_refptr<AudioDecoderFactory>& decoder_factory,
    Clock* clock) const = 0;
};

When WebRTC implements NetEqFactory, it encapsulates two factory classes on this basis.

class CustomNetEqFactory : public NetEqFactory {
    
    
 private:
  std::unique_ptr<NetEqControllerFactory> controller_factory_;
}
和
class DefaultNetEqFactory : public NetEqFactory {
    
    

 private:
  const DefaultNetEqControllerFactory controller_factory_;
}

There are more private fields in these two classes than the NetEqFactory class. In most cases, you can use the default one, unless you need to adjust the relevant content for your own application scenarios, and you really need your own customer. After that, the NetEQ algorithm is called for details, see Chapter 11, Section 11.2 of "Real-time Speech Processing Practice Guide".

1.9 MediaStream和MediaStreamTrack

Multimedia English is Multimedia. Multimedia is composed of two words, multi and media. Multi in Chinese means many, and media in Chinese means the medium used for information dissemination, such as text, pictures, video, audio, etc. can be used as information The medium of transmission, the main medium on which WebRTC communication relies is video and audio. However, because WebRTC (Web real time communication) has the feature of real, WebRTC uses streaming multimedia. The meaning of steaming is to stream compressed audio and video information through the network and stream it in real time at the receiving end. communication process. The streaming content can also be files such as mp4, or it can be real-time video captured by the camera or audio captured by the microphone in real time. The receiving end generally broadcasts while receiving (also called streaming playback), and the buffering time of streaming playback is very short. , generally within 800ms, this real-time requirement makes the network transmission use unreliable UDP/RTP/RTSP protocol stack to transmit multimedia content.

In WebRTC, MediaStream mainly has two APIs, MediaStreamTrack and MediaStream. The MediaStreamTrack (MST) object represents a single type of media from the end user. The source of the media can be a physical microphone or camera. When using the RTP protocol for transmission, different The media source uses the Synchronization Source (SSRC) logo. The SSRC of different sources in the same conference is different. The source can be distinguished through SSRC. MediaStream is used to aggregate multiple MediaStreamTrack objects together. After aggregation, it can be used as a whole. For playback or capture, that is, the audio and video are synchronized during playback and rendering. MediaStreamTrack relies on the media source MediaSource to provide media data.

The concept of track represents a multimedia track. For example, when a singer is recording a song/concert, the singer's voice corresponds to a collection track, the soundtrack guitar can also correspond to a collection track, and the drummer also corresponds to a collection track. These tracks are collected and mixed at the same time. then play it out. StreamTrack is a streaming multimedia track. A MediaStreamTrack can represent multi-channel multimedia content, such as stereo, 5.1-channel and 3D video, etc. The timing of different multimedia channels is well coordinated. Track is for collection (think of singers recording songs), and channel It is for playback, that is, songs can be recorded in several Track formats, but eventually they can be recorded in 5.1 or stereo (2 channel) formats.

The interface class of the MediaStreamTrack object is implemented as follows, whether a status flag is enabled, and whether the type flag is video or audio.

//third_party/webrtc/api/media_stream_interface.h
// C++ version of MediaStreamTrack.
// See: https://www.w3.org/TR/mediacapture-streams/#mediastreamtrack
class RTC_EXPORT MediaStreamTrackInterface : public rtc::RefCountInterface,
                                             public NotifierInterface {
    
    
 public:
  enum TrackState {
    
    
    kLive,
    kEnded,
  };

  static const char* const kAudioKind;
  static const char* const kVideoKind;

  // 当该类是AudioTrackInterfaceThe子类时kind()返回kAudioKind.
  // 当该类是VideoTrackInterface子类时kind()返回kVideoKind.
  virtual std::string kind() const = 0;

  // Track ID.
  virtual std::string id() const = 0;

  // disabled audio类型track生成静音数据,disabled video类型track生成黑屏帧
  virtual bool enabled() const = 0;
  virtual bool set_enabled(bool enable) = 0;

  // track状态有Live、ended两种. 终止之后的track不会再进入Live状态.
  virtual TrackState state() const = 0;

 protected:
  ~MediaStreamTrackInterface() override = default;
};

When implementing video stream track and audio stream tack, this class will be inherited, so that the API of the parent class can be used to access the methods of the subclass. The implementation of these two classes is as follows:

class RTC_EXPORT AudioTrackInterface : public MediaStreamTrackInterface {
    
    
 public:
  virtual AudioSourceInterface* GetSource() const = 0;

  // 添加和删除从track接收数据的sink
  virtual void AddSink(AudioTrackSinkInterface* sink) = 0;
  virtual void RemoveSink(AudioTrackSinkInterface* sink) = 0;

  // 从audio track获取信号的幅度.
  virtual bool GetSignalLevel(int* level);

  // 获取用于 audio track的audio processor. 如果该track没有任何processor,则返回 null。
  virtual rtc::scoped_refptr<AudioProcessorInterface> GetAudioProcessor();

 protected:
  ~AudioTrackInterface() override = default;
};


class RTC_EXPORT VideoTrackInterface
    : public MediaStreamTrackInterface,
      public rtc::VideoSourceInterface<VideoFrame> {
    
    
 public:
  // Video track 内容提示,帮助视频处理和编解码算法对不同内容设置不同参数或采用不同方法,Fluid适合电影游戏类,动作比较重要,而Detailed和Text适合演讲、网页以及字、绘画等场景,Detailed更在意静态的细节,而text更针对文字场景。
  //不同的音视频增强和编解码技术在处理不同的音视频内容的时候可能是最优的,比如WebRTC的APM模块主要是针对语音调优的参数,对音乐就不是最优的,类似的对于视频内容,如果是文本类视频,其对量化是敏感的,而电影和游戏对量化则没那么敏感。
  //
  // 参考 https://crbug.com/653531 and https://w3c.github.io/mst-content-hint.
  enum class ContentHint {
    
     kNone, kFluid, kDetailed, kText };

  // 为该track注册 video sink . 用于将track和video engine连接。
  void AddOrUpdateSink(rtc::VideoSinkInterface<VideoFrame>* sink,
                       const rtc::VideoSinkWants& wants) override {
    
    }
  void RemoveSink(rtc::VideoSinkInterface<VideoFrame>* sink) override {
    
    }

  virtual VideoTrackSourceInterface* GetSource() const = 0;

  virtual ContentHint content_hint() const;
  virtual void set_content_hint(ContentHint hint) {
    
    }

 protected:
  ~VideoTrackInterface() override = default;
};

These two are also interface classes. For thread safety, WebRTC uses the proxy method to encapsulate the calls implemented by this class. The corresponding track methods for audio and video are implemented in

Two files, webrtc/pc/audio_track.cc and webrtc/pc/video_track.cc.

The MediaStream class is defined as follows:

//third_party/webrtc/api/media_stream_interface.h
// C++ version of https://www.w3.org/TR/mediacapture-streams/#mediastream.
// 由于PeerConnection/RtpReceiver接收到的远端audio/video tracks不能简单通过将他们添加到同一个MediaStream中同步,SDP协议中描述会话的msid(MediaStream ID)属性需要传递下去以用于同步多媒体源之间的同步,因而MediaStreamInterface接口类仅存储tracks。
class MediaStreamInterface : public rtc::RefCountInterface,
                             public NotifierInterface {
    
    
 public:
  virtual std::string id() const = 0;

  virtual AudioTrackVector GetAudioTracks() = 0;
  virtual VideoTrackVector GetVideoTracks() = 0;
  virtual rtc::scoped_refptr<AudioTrackInterface> FindAudioTrack(
      const std::string& track_id) = 0;
  virtual rtc::scoped_refptr<VideoTrackInterface> FindVideoTrack(
      const std::string& track_id) = 0;

  // Takes ownership of added tracks.
  // Note: Default implementations are for avoiding link time errors in
  // implementations that mock this API.
  // TODO(bugs.webrtc.org/13980): Remove default implementations.
  virtual bool AddTrack(rtc::scoped_refptr<AudioTrackInterface> track) {
    
    
    RTC_CHECK_NOTREACHED();
  }
  virtual bool AddTrack(rtc::scoped_refptr<VideoTrackInterface> track) {
    
    
    RTC_CHECK_NOTREACHED();
  }
  virtual bool RemoveTrack(rtc::scoped_refptr<AudioTrackInterface> track) {
    
    
    RTC_CHECK_NOTREACHED();
  }
  virtual bool RemoveTrack(rtc::scoped_refptr<VideoTrackInterface> track) {
    
    
    RTC_CHECK_NOTREACHED();
  }

 protected:
  ~MediaStreamInterface() override = default;
};

MediaStream is an aggregation of multiple tracks, which also means that there can be multiple video (shared screen, camera capture, etc.) and audio tracks. In MediaStream, video and audio are stored in different vectors, and the types are AudioTrackInterface and VideoTrackInterface respectively.

//third_party/webrtc/api/media_stream_interface.h
typedef std::vector<rtc::scoped_refptr<AudioTrackInterface> > AudioTrackVector;
typedef std::vector<rtc::scoped_refptr<VideoTrackInterface> > VideoTrackVector;

1.10 voiceEngine

Several important member variables in the voiceEngine class are defined as follows, as shown in the English notes, including ADM, APM and codec related. The reason why the ACM module is not directly used is because ACM involves network transmission, which is It involves network jitter, packet loss, delay, and network bandwidth limitations, so the ACM module is called by the NetEQ module, but the jitter buffer configuration of the NetEQ module is placed in VoiceEngine.

 // The audio device module.
  rtc::scoped_refptr<webrtc::AudioDeviceModule> adm_;
  rtc::scoped_refptr<webrtc::AudioEncoderFactory> encoder_factory_;
  rtc::scoped_refptr<webrtc::AudioDecoderFactory> decoder_factory_;
  rtc::scoped_refptr<webrtc::AudioMixer> audio_mixer_;
  // The audio processing module.
  rtc::scoped_refptr<webrtc::AudioProcessing> apm_;
  // Asynchronous audio processing.
  webrtc::AudioFrameProcessor* const audio_frame_processor_;
  // The primary instance of WebRtc VoiceEngine.
  rtc::scoped_refptr<webrtc::AudioState> audio_state_;
  std::vector<AudioCodec> send_codecs_;
  std::vector<AudioCodec> recv_codecs_;


  // Jitter buffer settings for new streams.
  size_t audio_jitter_buffer_max_packets_ = 200;
  bool audio_jitter_buffer_fast_accelerate_ = false;
  int audio_jitter_buffer_min_delay_ms_ = 0;

The init method of the voice engine is directly called in media_engine_. The core of the method of the WebRTC webrtc::cricket::WebRtcVoiceEngineclass initis as follows. This function requires the ADM module. The ADM module can be passed in as a parameter during construction, or a platform-related ADM module can be created by default. , and then create an AudioState object according to the configuration. After creating the AudioState object, register an AudioTranportobject for the ADM module, thus connecting the ADM and the audio sending path. The main process of this method is as follows:

void WebRtcVoiceEngine::Init() {
    
    

// Load our audio codec lists.
send_codecs_ = CollectCodecs(encoder_factory_->GetSupportedEncoders());
recv_codecs_ = CollectCodecs(decoder_factory_->GetSupportedDecoders());

// No ADM supplied? Create a default one. 在创建ppeer_connection_factory_时ADM参数等于NULL,故这里创建平台默认的ADM对象。这里见2.1
if (!adm_) {
    
    
  adm_ = webrtc::AudioDeviceModule::Create(
      webrtc::AudioDeviceModule::kPlatformDefaultAudio, task_queue_factory_);
}

webrtc::adm_helpers::Init(adm());

// Set up AudioState.
  {
    
    
  webrtc::AudioState::Config config;
  if (audio_mixer_) {
    
    
    config.audio_mixer = audio_mixer_;
  } else {
    
    
    config.audio_mixer = webrtc::AudioMixerImpl::Create();
  }
  config.audio_processing = apm_;
  config.audio_device_module = adm_;
  if (audio_frame_processor_)
    config.async_audio_processing_factory =
        rtc::make_ref_counted<webrtc::AsyncAudioProcessing::Factory>(
            *audio_frame_processor_, *task_queue_factory_);
  audio_state_ = webrtc::AudioState::Create(config);
}

  // Connect the ADM to our audio path.
adm()->RegisterAudioCallback(audio_state()->audio_transport());

  initialized_ = true;
}

Here, the audio_transport is registered to the (audio_device_impl.cc) adm module, which will eventually be registered to the (audio_device_buffer.cc) AudioDeviceBuffer object, because this stores the audio data. When the collected data comes, the data is completed by means of a callback. Processing, such as encoding and sending.

adm()->RegisterAudioCallback(audio_state()->audio_transport());

The RegisterAudioCallback called by ADM will AduioDeviceBuffer::RegisterAudioCallbackcall the RecordedDataIsAvailable method of the tranport layer after the ADM collects the data, and hand over the subsequent processing right of the data to the transport layer.

//webrtc/modules/audio_device/audio_device_buffer.cc
// Not used in Chromium. Process captured audio and distribute to all sending
// streams, and try to do this at the lowest possible sample rate.
int32_t AudioTransportImpl::RecordedDataIsAvailable(
    const void* audio_data,
    const size_t number_of_frames,
    const size_t bytes_per_sample,
    const size_t number_of_channels,
    const uint32_t sample_rate,
    const uint32_t audio_delay_milliseconds,
    const int32_t /*clock_drift*/,
    const uint32_t /*volume*/,
    const bool key_pressed,
    uint32_t& /*new_mic_volume*/,
    const int64_t
        estimated_capture_time_ns) {
    
      // NOLINT: to avoid changing APIs
  RTC_DCHECK(audio_data);
  RTC_DCHECK_GE(number_of_channels, 1);
  RTC_DCHECK_LE(number_of_channels, 2);
  RTC_DCHECK_EQ(2 * number_of_channels, bytes_per_sample);
  RTC_DCHECK_GE(sample_rate, AudioProcessing::NativeRate::kSampleRate8kHz);
  // 100 = 1 second / data duration (10 ms).
  RTC_DCHECK_EQ(number_of_frames * 100, sample_rate);
  RTC_DCHECK_LE(bytes_per_sample * number_of_frames * number_of_channels,
                AudioFrame::kMaxDataSizeBytes);

  int send_sample_rate_hz = 0;
  size_t send_num_channels = 0;
  bool swap_stereo_channels = false;
  {
    
    
    MutexLock lock(&capture_lock_);
    send_sample_rate_hz = send_sample_rate_hz_;
    send_num_channels = send_num_channels_;
    swap_stereo_channels = swap_stereo_channels_;
  }

  std::unique_ptr<AudioFrame> audio_frame(new AudioFrame());
  InitializeCaptureFrame(sample_rate, send_sample_rate_hz, number_of_channels,
                         send_num_channels, audio_frame.get());
  voe::RemixAndResample(static_cast<const int16_t*>(audio_data),
                        number_of_frames, number_of_channels, sample_rate,
                        &capture_resampler_, audio_frame.get());
  ProcessCaptureFrame(audio_delay_milliseconds, key_pressed,
                      swap_stereo_channels, audio_processing_,
                      audio_frame.get());
  audio_frame->set_absolute_capture_timestamp_ms(estimated_capture_time_ns /
                                                 1000000);

  RTC_DCHECK_GT(audio_frame->samples_per_channel_, 0);
  if (async_audio_processing_)
    async_audio_processing_->Process(std::move(audio_frame));
  else
    SendProcessedData(std::move(audio_frame));

  return 0;
}

Because there may be audio mixing, it is necessary to create an audio mixer. audiostate saves the audio state (apm, adm, and amxier) when multiple webrtc::Call instances need to be processed with audio. Before the
specific audio data flow path, let's take a look at AudioTranportthe object definition:

//webrtc/modules/audio_device/include/audio_device_defines.h
class AudioTransport {
    
    
 public:
//硬件设备采集到麦克风数据回调函数,keyPressed是键盘按键是否按下的标志,方便APM处理
  virtual int32_t RecordedDataIsAvailable(
      const void* audioSamples,
      size_t nSamples,
      size_t nBytesPerSample,
      size_t nChannels,
      uint32_t samplesPerSec,
      uint32_t totalDelayMS,
      int32_t clockDrift,
      uint32_t currentMicLevel,
      bool keyPressed,
      uint32_t& newMicLevel,
      int64_t estimatedCaptureTimeNS) {
    
      // NOLINT

    return RecordedDataIsAvailable(
        audioSamples, nSamples, nBytesPerSample, nChannels, samplesPerSec,
        totalDelayMS, clockDrift, currentMicLevel, keyPressed, newMicLevel);
  }

  // 播放音频流
  virtual int32_t NeedMorePlayData(size_t nSamples,
                                   size_t nBytesPerSample,
                                   size_t nChannels,
                                   uint32_t samplesPerSec,
                                   void* audioSamples,
                                   size_t& nSamplesOut,  // NOLINT
                                   int64_t* elapsed_time_ms,
                                   int64_t* ntp_time_ms) = 0;  // NOLINT

  // 这是chrom上的NeedMorePlayData方法
  virtual void PullRenderData(int bits_per_sample,
                              int sample_rate,
                              size_t number_of_channels,
                              size_t number_of_frames,
                              void* audio_data,
                              int64_t* elapsed_time_ms,
                              int64_t* ntp_time_ms) = 0;

 protected:
  virtual ~AudioTransport() {
    
    }
};

1.11 Audio Encoder/Decoder Factory

The VoiceEngine in section 1.10 does not directly use the AudioEncoder class, but uses the AudioEncoderFactorypackage to create an encoder. Audio encoder factory is used to create encoder objects encoded by various audio codecs, and audio decoder factory is used to create various audio codec The decoder object to decode.

The Audio Encoder Factory interface for WebRTC is defined (located at webrtc/src/api/audio_codecs/audio_encoder_factory.h) as follows:

namespace webrtc {
    
    

// A factory that creates AudioEncoders.
class AudioEncoderFactory : public rtc::RefCountInterface {
    
    
 public:
  // Returns a prioritized list of audio codecs, to use for signaling etc.
  virtual std::vector<AudioCodecSpec> GetSupportedEncoders() = 0;

  // Returns information about how this format would be encoded, provided it's
  // supported. More format and format variations may be supported than those
  // returned by GetSupportedEncoders().
  virtual absl::optional<AudioCodecInfo> QueryAudioEncoder(
      const SdpAudioFormat& format) = 0;

  // Creates an AudioEncoder for the specified format. The encoder will tags its
  // payloads with the specified payload type. The `codec_pair_id` argument is
  // used to link encoders and decoders that talk to the same remote entity: if
  // a AudioEncoderFactory::MakeAudioEncoder() and a
  // AudioDecoderFactory::MakeAudioDecoder() call receive non-null IDs that
  // compare equal, the factory implementations may assume that the encoder and
  // decoder form a pair. (The intended use case for this is to set up
  // communication between the AudioEncoder and AudioDecoder instances, which is
  // needed for some codecs with built-in bandwidth adaptation.)
  //
  // Note: Implementations need to be robust against combinations other than
  // one encoder, one decoder getting the same ID; such encoders must still
  // work.
  //
  // TODO(ossu): Try to avoid audio encoders having to know their payload type.
  virtual std::unique_ptr<AudioEncoder> MakeAudioEncoder(
      int payload_type,
      const SdpAudioFormat& format,
      absl::optional<AudioCodecPairId> codec_pair_id) = 0;
};

}  // namespace webrtc

Intuitively, AudioEncoderFactoryit should be implemented in a way similar to the composite pattern:

  • First implement a subclass implementation for AudioEncoderFactoryeach , such as defining AudioEncoderFactorya subclass implementation for creating an OPUS encoder, defining AudioEncoderFactorya subclass implementation for creating an AAC encoder, etc.;
  • Then create a combined AudioEncoderFactorysubclass implementation as the container of the encoder factory of all audio codecs to be supported, and realize its interface function by means of the encoder factory implementation of each audio codec;
  • Finally, create AudioEncoderFactorya factory method that creates a combined AudioEncoderFactorysubclass object, and creates an encoder factory object for each supported audio codec and registers it with the combined AudioEncoderFactorysubclass object, and then returns the combined AudioEncoderFactorysubclass object.

One possible implementation is as follows. The first is the factory method declaration of the audio encoder factory:

#ifndef API_AUDIO_CODECS_FAKE_AUDIO_ENCODER_FACTORY_H_
#define API_AUDIO_CODECS_FAKE_AUDIO_ENCODER_FACTORY_H_

#include "api/audio_codecs/audio_encoder_factory.h"
#include "rtc_base/scoped_ref_ptr.h"

namespace webrtc {
    
    

class CodecAudioEncoderFactory: public AudioEncoderFactory {
    
    
public:
  virtual bool IsSupported(const SdpAudioFormat &format) = 0;
};

// Creates a new factory that can create the built-in types of audio encoders.
// NOTE: This function is still under development and may change without notice.
rtc::scoped_refptr<AudioEncoderFactory> CreateBuiltinAudioEncoderFactory();

}  // namespace webrtc

#endif /* API_AUDIO_CODECS_FAKE_AUDIO_ENCODER_FACTORY_H_ */

In addition to the factory method declaration of the audio encoder factory, a new AudioEncoderFactorysubclass to describe the specific interface of the encoder factory of the audio codec CodecAudioEncoderFactory, and an interface function is added to the interface class IsSupported()to judge whether an encoder factory is suitable for Support for specific SdpFormats to aid in the implementation of composite AudioEncoderFactorysubclass objects.

Then there are the implementations of several related classes:

#include "fake_audio_encoder_factory.h"
#include <vector>
#include "rtc_base/refcountedobject.h"

namespace webrtc {
    
    

class OpusEncoderFactory : public CodecAudioEncoderFactory {
    
    
public:
  std::vector<AudioCodecSpec> GetSupportedEncoders() override {
    
    
    std::vector<AudioCodecSpec> specs;

    return specs;
  }

  absl::optional<AudioCodecInfo> QueryAudioEncoder(const SdpAudioFormat &format)
      override {
    
    
    return absl::nullopt;
  }

  std::unique_ptr<AudioEncoder> MakeAudioEncoder(int payload_type,
      const SdpAudioFormat &format,
      absl::optional<AudioCodecPairId> codec_pair_id) override {
    
    

    return nullptr;
  }

  bool IsSupported(const SdpAudioFormat &format) override {
    
    
    return true;
  }
};

class FakeAudioEncoderFactory: public AudioEncoderFactory {
    
    
public:
  std::vector<AudioCodecSpec> GetSupportedEncoders() override {
    
    
    std::vector<AudioCodecSpec> specs;

    for (auto &factory : audio_encoder_factories) {
    
    
      specs.insert(specs.end(), factory->GetSupportedEncoders().begin(), factory->GetSupportedEncoders().end());
    }

    return specs;
  }

  absl::optional<AudioCodecInfo> QueryAudioEncoder(const SdpAudioFormat &format)
      override {
    
    
    for (auto &factory : audio_encoder_factories) {
    
    
      if (factory->IsSupported(format)) {
    
    
        return factory->QueryAudioEncoder(format);
      }
    }

    return absl::nullopt;
  }

  std::unique_ptr<AudioEncoder> MakeAudioEncoder(int payload_type,
      const SdpAudioFormat &format,
      absl::optional<AudioCodecPairId> codec_pair_id) override {
    
    
    for (auto &factory : audio_encoder_factories) {
    
    
      if (factory->IsSupported(format)) {
    
    
        return factory->MakeAudioEncoder(payload_type, format, codec_pair_id);
      }
    }

    return nullptr;
  }

  void AddAudioEncoderFactory(rtc::scoped_refptr<CodecAudioEncoderFactory> factory) {
    
    
    audio_encoder_factories.push_back(factory);
  }

private:
  std::vector<rtc::scoped_refptr<CodecAudioEncoderFactory>> audio_encoder_factories;
};

rtc::scoped_refptr<AudioEncoderFactory> CreateBuiltinAudioEncoderFactory() {
    
    
  rtc::scoped_refptr<FakeAudioEncoderFactory> factory(new rtc::RefCountedObject<FakeAudioEncoderFactory>);

  rtc::scoped_refptr<OpusEncoderFactory> opus_factory(new rtc::RefCountedObject<OpusEncoderFactory>);
  factory->AddAudioEncoderFactory(opus_factory);

  return factory;
}

However, WebRTC's builtin audio encoder factory is not implemented in a similar way.

The factory method used to create the is declared in webrtc/src/api/audio_codecs/builtin_audio_encoder_factory.hthe file AudioEncoderFactory:

namespace webrtc {
    
    

// Creates a new factory that can create the built-in types of audio encoders.
// NOTE: This function is still under development and may change without notice.
rtc::scoped_refptr<AudioEncoderFactory> CreateBuiltinAudioEncoderFactory();

}  // namespace webrtc

In webrtc/src/api/audio_codecs/builtin_audio_encoder_factory.ccthe file , CreateBuiltinAudioEncoderFactory()the function is implemented as follows:

namespace webrtc {
    
    

namespace {
    
    

// Modify an audio encoder to not advertise support for anything.
template <typename T>
struct NotAdvertised {
    
    
  using Config = typename T::Config;
  static absl::optional<Config> SdpToConfig(
      const SdpAudioFormat& audio_format) {
    
    
    return T::SdpToConfig(audio_format);
  }
  static void AppendSupportedEncoders(std::vector<AudioCodecSpec>* specs) {
    
    
    // Don't advertise support for anything.
  }
  static AudioCodecInfo QueryAudioEncoder(const Config& config) {
    
    
    return T::QueryAudioEncoder(config);
  }
  static std::unique_ptr<AudioEncoder> MakeAudioEncoder(
      const Config& config,
      int payload_type,
      absl::optional<AudioCodecPairId> codec_pair_id = absl::nullopt) {
    
    
    return T::MakeAudioEncoder(config, payload_type, codec_pair_id);
  }
};

}  // namespace

rtc::scoped_refptr<AudioEncoderFactory> CreateBuiltinAudioEncoderFactory() {
    
    
  return CreateAudioEncoderFactory<

#if WEBRTC_USE_BUILTIN_OPUS
      AudioEncoderOpus,
#endif

      AudioEncoderIsac, AudioEncoderG722,

#if WEBRTC_USE_BUILTIN_ILBC
      AudioEncoderIlbc,
#endif

      AudioEncoderG711, NotAdvertised<AudioEncoderL16>>();
}

}  // namespace webrtc

CreateBuiltinAudioEncoderFactory()The implementation of the function is quite satisfactory. It calls a template function CreateAudioEncoderFactory()to create audio encoder factory, and multiple Encoderclasses with the word as the type parameter of the template function.

CreateAudioEncoderFactory()The definition of the template function is located webrtc/src/api/audio_codecs/audio_encoder_factory_template.hin the file:

namespace webrtc {
    
    

namespace audio_encoder_factory_template_impl {
    
    

template <typename... Ts>
struct Helper;

// Base case: 0 template parameters.
template <>
struct Helper<> {
    
    
  static void AppendSupportedEncoders(std::vector<AudioCodecSpec>* specs) {
    
    }
  static absl::optional<AudioCodecInfo> QueryAudioEncoder(
      const SdpAudioFormat& format) {
    
    
    return absl::nullopt;
  }
  static std::unique_ptr<AudioEncoder> MakeAudioEncoder(
      int payload_type,
      const SdpAudioFormat& format,
      absl::optional<AudioCodecPairId> codec_pair_id) {
    
    
    return nullptr;
  }
};

// Inductive case: Called with n + 1 template parameters; calls subroutines
// with n template parameters.
template <typename T, typename... Ts>
struct Helper<T, Ts...> {
    
    
  static void AppendSupportedEncoders(std::vector<AudioCodecSpec>* specs) {
    
    
    T::AppendSupportedEncoders(specs);
    Helper<Ts...>::AppendSupportedEncoders(specs);
  }
  static absl::optional<AudioCodecInfo> QueryAudioEncoder(
      const SdpAudioFormat& format) {
    
    
    auto opt_config = T::SdpToConfig(format);
    static_assert(std::is_same<decltype(opt_config),
                               absl::optional<typename T::Config>>::value,
                  "T::SdpToConfig() must return a value of type "
                  "absl::optional<T::Config>");
    return opt_config ? absl::optional<AudioCodecInfo>(
                            T::QueryAudioEncoder(*opt_config))
                      : Helper<Ts...>::QueryAudioEncoder(format);
  }
  static std::unique_ptr<AudioEncoder> MakeAudioEncoder(
      int payload_type,
      const SdpAudioFormat& format,
      absl::optional<AudioCodecPairId> codec_pair_id) {
    
    
    auto opt_config = T::SdpToConfig(format);
    if (opt_config) {
    
    
      return T::MakeAudioEncoder(*opt_config, payload_type, codec_pair_id);
    } else {
    
    
      return Helper<Ts...>::MakeAudioEncoder(payload_type, format,
                                             codec_pair_id);
    }
  }
};

template <typename... Ts>
class AudioEncoderFactoryT : public AudioEncoderFactory {
    
    
 public:
  std::vector<AudioCodecSpec> GetSupportedEncoders() override {
    
    
    std::vector<AudioCodecSpec> specs;
    Helper<Ts...>::AppendSupportedEncoders(&specs);
    return specs;
  }

  absl::optional<AudioCodecInfo> QueryAudioEncoder(
      const SdpAudioFormat& format) override {
    
    
    return Helper<Ts...>::QueryAudioEncoder(format);
  }

  std::unique_ptr<AudioEncoder> MakeAudioEncoder(
      int payload_type,
      const SdpAudioFormat& format,
      absl::optional<AudioCodecPairId> codec_pair_id) override {
    
    
    return Helper<Ts...>::MakeAudioEncoder(payload_type, format, codec_pair_id);
  }
};

}  // namespace audio_encoder_factory_template_impl

// Make an AudioEncoderFactory that can create instances of the given encoders.
//
// Each encoder type is given as a template argument to the function; it should
// be a struct with the following static member functions:
//
//   // Converts |audio_format| to a ConfigType instance. Returns an empty
//   // optional if |audio_format| doesn't correctly specify an encoder of our
//   // type.
//   absl::optional<ConfigType> SdpToConfig(const SdpAudioFormat& audio_format);
//
//   // Appends zero or more AudioCodecSpecs to the list that will be returned
//   // by AudioEncoderFactory::GetSupportedEncoders().
//   void AppendSupportedEncoders(std::vector<AudioCodecSpec>* specs);
//
//   // Returns information about how this format would be encoded. Used to
//   // implement AudioEncoderFactory::QueryAudioEncoder().
//   AudioCodecInfo QueryAudioEncoder(const ConfigType& config);
//
//   // Creates an AudioEncoder for the specified format. Used to implement
//   // AudioEncoderFactory::MakeAudioEncoder().
//   std::unique_ptr<AudioDecoder> MakeAudioEncoder(
//       const ConfigType& config,
//       int payload_type,
//       absl::optional<AudioCodecPairId> codec_pair_id);
//
// ConfigType should be a type that encapsulates all the settings needed to
// create an AudioEncoder. T::Config (where T is the encoder struct) should
// either be the config type, or an alias for it.
//
// Whenever it tries to do something, the new factory will try each of the
// encoders in the order they were specified in the template argument list,
// stopping at the first one that claims to be able to do the job.
//
// NOTE: This function is still under development and may change without notice.
//
// TODO(kwiberg): Point at CreateBuiltinAudioEncoderFactory() for an example of
// how it is used.
template <typename... Ts>
rtc::scoped_refptr<AudioEncoderFactory> CreateAudioEncoderFactory() {
    
    
  // There's no technical reason we couldn't allow zero template parameters,
  // but such a factory couldn't create any encoders, and callers can do this
  // by mistake by simply forgetting the <> altogether. So we forbid it in
  // order to prevent caller foot-shooting.
  static_assert(sizeof...(Ts) >= 1,
                "Caller must give at least one template parameter");

  return rtc::scoped_refptr<AudioEncoderFactory>(
      new rtc::RefCountedObject<
          audio_encoder_factory_template_impl::AudioEncoderFactoryT<Ts...>>());
}

}  // namespace webrtc

Compare the implementation of WebRTC FakeAudioEncoderFactorywith :

  • The factory list of Audio codec encoder is not a dynamic list, but a static list constructed by means of template mechanism;

  • The template class Helperacts as the traverser and visitor of the factory list of the audio codec encoder;

  1. The interface of the audio codec encoder factory is not reused AudioEncoderFactory, but another interface is implicitly defined, as explained in the comments of CreateAudioEncoderFactory()the template function, this interface contains the following member functions:
  absl::optional<ConfigType> SdpToConfig(const SdpAudioFormat& audio_format);
  void AppendSupportedEncoders(std::vector<AudioCodecSpec>* specs);
  AudioCodecInfo QueryAudioEncoder(const ConfigType& config);
  std::unique_ptr<AudioDecoder> MakeAudioEncoder(
      const ConfigType& config,
      int payload_type,
      absl::optional<AudioCodecPairId> codec_pair_id);
  1. AudioEncoderFactoryThe final implementer of the interface is AudioEncoderFactoryTthe implementation of each interface, which is mainly Helpercompleted .
  2. The that we added earlier for CodecAudioEncoderFactorythe interface IsSupported()is roughly equivalent to WebRTC's SdpToConfig().

You can specifically look at the implementation of an * Encoder* , such AudioEncoderOpusas the declaration of (located in webrtc/src/api/audio_codecs/opus/audio_encoder_opus.h):

namespace webrtc {
    
    

// Opus encoder API for use as a template parameter to
// CreateAudioEncoderFactory<...>().
//
// NOTE: This struct is still under development and may change without notice.
struct AudioEncoderOpus {
    
    
  using Config = AudioEncoderOpusConfig;
  static absl::optional<AudioEncoderOpusConfig> SdpToConfig(
      const SdpAudioFormat& audio_format);
  static void AppendSupportedEncoders(std::vector<AudioCodecSpec>* specs);
  static AudioCodecInfo QueryAudioEncoder(const AudioEncoderOpusConfig& config);
  static std::unique_ptr<AudioEncoder> MakeAudioEncoder(
      const AudioEncoderOpusConfig& config,
      int payload_type,
      absl::optional<AudioCodecPairId> codec_pair_id = absl::nullopt);
};

}  // namespace webrtc

AudioEncoderOpusThe implementation is as follows (at webrtc/src/api/audio_codecs/opus/audio_encoder_opus.cc):

namespace webrtc {
    
    

absl::optional<AudioEncoderOpusConfig> AudioEncoderOpus::SdpToConfig(
    const SdpAudioFormat& format) {
    
    
  return AudioEncoderOpusImpl::SdpToConfig(format);
}

void AudioEncoderOpus::AppendSupportedEncoders(
    std::vector<AudioCodecSpec>* specs) {
    
    
  AudioEncoderOpusImpl::AppendSupportedEncoders(specs);
}

AudioCodecInfo AudioEncoderOpus::QueryAudioEncoder(
    const AudioEncoderOpusConfig& config) {
    
    
  return AudioEncoderOpusImpl::QueryAudioEncoder(config);
}

std::unique_ptr<AudioEncoder> AudioEncoderOpus::MakeAudioEncoder(
    const AudioEncoderOpusConfig& config,
    int payload_type,
    absl::optional<AudioCodecPairId> /*codec_pair_id*/) {
    
    
  return AudioEncoderOpusImpl::MakeAudioEncoder(config, payload_type);
}

}  // namespace webrtc

AudioEncoderOpusAlthough there is only encoderthe word in its name and no factoryword , it is a genuine encoder factory.

The implementation of WebRTC's decoder factory is very similar to that of its encoder factory.

The interface is defined in webrtc/src/api/audio_codecs/audio_decoder_factory.hthe file AudioDecoderFactory:

namespace webrtc {
    
    

// A factory that creates AudioDecoders.
// NOTE: This class is still under development and may change without notice.
class AudioDecoderFactory : public rtc::RefCountInterface {
    
    
 public:
  virtual std::vector<AudioCodecSpec> GetSupportedDecoders() = 0;

  virtual bool IsSupportedDecoder(const SdpAudioFormat& format) = 0;

  // Create a new decoder instance. The `codec_pair_id` argument is used to
  // link encoders and decoders that talk to the same remote entity; if a
  // MakeAudioEncoder() and a MakeAudioDecoder() call receive non-null IDs that
  // compare equal, the factory implementations may assume that the encoder and
  // decoder form a pair.
  //
  // Note: Implementations need to be robust against combinations other than
  // one encoder, one decoder getting the same ID; such decoders must still
  // work.
  virtual std::unique_ptr<AudioDecoder> MakeAudioDecoder(
      const SdpAudioFormat& format,
      absl::optional<AudioCodecPairId> codec_pair_id) = 0;
};

}  // namespace webrtc

webrtc/src/api/audio_codecs/builtin_audio_decoder_factory.hThe factory method is declared in the file CreateBuiltinAudioDecoderFactory():

namespace webrtc {
    
    

// Creates a new factory that can create the built-in types of audio decoders.
// NOTE: This function is still under development and may change without notice.
rtc::scoped_refptr<AudioDecoderFactory> CreateBuiltinAudioDecoderFactory();

}  // namespace webrtc

webrtc/src/api/audio_codecs/builtin_audio_decoder_factory.ccThe definition of the factory method in the file CreateBuiltinAudioDecoderFactory()is as follows:

namespace webrtc {
    
    

namespace {
    
    

// Modify an audio decoder to not advertise support for anything.
template <typename T>
struct NotAdvertised {
    
    
  using Config = typename T::Config;
  static absl::optional<Config> SdpToConfig(
      const SdpAudioFormat& audio_format) {
    
    
    return T::SdpToConfig(audio_format);
  }
  static void AppendSupportedDecoders(std::vector<AudioCodecSpec>* specs) {
    
    
    // Don't advertise support for anything.
  }
  static std::unique_ptr<AudioDecoder> MakeAudioDecoder(
      const Config& config,
      absl::optional<AudioCodecPairId> codec_pair_id = absl::nullopt) {
    
    
    return T::MakeAudioDecoder(config, codec_pair_id);
  }
};

}  // namespace

rtc::scoped_refptr<AudioDecoderFactory> CreateBuiltinAudioDecoderFactory() {
    
    
  return CreateAudioDecoderFactory<

#if WEBRTC_USE_BUILTIN_OPUS
      AudioDecoderOpus,
#endif

      AudioDecoderIsac, AudioDecoderG722,

#if WEBRTC_USE_BUILTIN_ILBC
      AudioDecoderIlbc,
#endif

      AudioDecoderG711, NotAdvertised<AudioDecoderL16>>();
}

}  // namespace webrtc

where the definition of CreateAudioDecoderFactory()the template function is located at webrtc/src/api/audio_codecs/audio_decoder_factory_template.h:

namespace webrtc {
    
    

namespace audio_decoder_factory_template_impl {
    
    

template <typename... Ts>
struct Helper;

// Base case: 0 template parameters.
template <>
struct Helper<> {
    
    
  static void AppendSupportedDecoders(std::vector<AudioCodecSpec>* specs) {
    
    }
  static bool IsSupportedDecoder(const SdpAudioFormat& format) {
    
     return false; }
  static std::unique_ptr<AudioDecoder> MakeAudioDecoder(
      const SdpAudioFormat& format,
      absl::optional<AudioCodecPairId> codec_pair_id) {
    
    
    return nullptr;
  }
};

// Inductive case: Called with n + 1 template parameters; calls subroutines
// with n template parameters.
template <typename T, typename... Ts>
struct Helper<T, Ts...> {
    
    
  static void AppendSupportedDecoders(std::vector<AudioCodecSpec>* specs) {
    
    
    T::AppendSupportedDecoders(specs);
    Helper<Ts...>::AppendSupportedDecoders(specs);
  }
  static bool IsSupportedDecoder(const SdpAudioFormat& format) {
    
    
    auto opt_config = T::SdpToConfig(format);
    static_assert(std::is_same<decltype(opt_config),
                               absl::optional<typename T::Config>>::value,
                  "T::SdpToConfig() must return a value of type "
                  "absl::optional<T::Config>");
    return opt_config ? true : Helper<Ts...>::IsSupportedDecoder(format);
  }
  static std::unique_ptr<AudioDecoder> MakeAudioDecoder(
      const SdpAudioFormat& format,
      absl::optional<AudioCodecPairId> codec_pair_id) {
    
    
    auto opt_config = T::SdpToConfig(format);
    return opt_config ? T::MakeAudioDecoder(*opt_config, codec_pair_id)
                      : Helper<Ts...>::MakeAudioDecoder(format, codec_pair_id);
  }
};

template <typename... Ts>
class AudioDecoderFactoryT : public AudioDecoderFactory {
    
    
 public:
  std::vector<AudioCodecSpec> GetSupportedDecoders() override {
    
    
    std::vector<AudioCodecSpec> specs;
    Helper<Ts...>::AppendSupportedDecoders(&specs);
    return specs;
  }

  bool IsSupportedDecoder(const SdpAudioFormat& format) override {
    
    
    return Helper<Ts...>::IsSupportedDecoder(format);
  }

  std::unique_ptr<AudioDecoder> MakeAudioDecoder(
      const SdpAudioFormat& format,
      absl::optional<AudioCodecPairId> codec_pair_id) override {
    
    
    return Helper<Ts...>::MakeAudioDecoder(format, codec_pair_id);
  }
};

}  // namespace audio_decoder_factory_template_impl

// Make an AudioDecoderFactory that can create instances of the given decoders.
//
// Each decoder type is given as a template argument to the function; it should
// be a struct with the following static member functions:
//
//   // Converts |audio_format| to a ConfigType instance. Returns an empty
//   // optional if |audio_format| doesn't correctly specify an decoder of our
//   // type.
//   absl::optional<ConfigType> SdpToConfig(const SdpAudioFormat& audio_format);
//
//   // Appends zero or more AudioCodecSpecs to the list that will be returned
//   // by AudioDecoderFactory::GetSupportedDecoders().
//   void AppendSupportedDecoders(std::vector<AudioCodecSpec>* specs);
//
//   // Creates an AudioDecoder for the specified format. Used to implement
//   // AudioDecoderFactory::MakeAudioDecoder().
//   std::unique_ptr<AudioDecoder> MakeAudioDecoder(
//       const ConfigType& config,
//       absl::optional<AudioCodecPairId> codec_pair_id);
//
// ConfigType should be a type that encapsulates all the settings needed to
// create an AudioDecoder. T::Config (where T is the decoder struct) should
// either be the config type, or an alias for it.
//
// Whenever it tries to do something, the new factory will try each of the
// decoder types in the order they were specified in the template argument
// list, stopping at the first one that claims to be able to do the job.
//
// NOTE: This function is still under development and may change without notice.
//
// TODO(kwiberg): Point at CreateBuiltinAudioDecoderFactory() for an example of
// how it is used.
template <typename... Ts>
rtc::scoped_refptr<AudioDecoderFactory> CreateAudioDecoderFactory() {
    
    
  // There's no technical reason we couldn't allow zero template parameters,
  // but such a factory couldn't create any decoders, and callers can do this
  // by mistake by simply forgetting the <> altogether. So we forbid it in
  // order to prevent caller foot-shooting.
  static_assert(sizeof...(Ts) >= 1,
                "Caller must give at least one template parameter");

  return rtc::scoped_refptr<AudioDecoderFactory>(
      new rtc::RefCountedObject<
          audio_decoder_factory_template_impl::AudioDecoderFactoryT<Ts...>>());
}

}  // namespace webrtc

An implementation AudioDecoderOpusdeclaration is located at webrtc/src/api/audio_codecs/opus/audio_decoder_opus.h:

namespace webrtc {
    
    

// Opus decoder API for use as a template parameter to
// CreateAudioDecoderFactory<...>().
//
// NOTE: This struct is still under development and may change without notice.
struct AudioDecoderOpus {
    
    
  struct Config {
    
    
    int num_channels;
  };
  static absl::optional<Config> SdpToConfig(const SdpAudioFormat& audio_format);
  static void AppendSupportedDecoders(std::vector<AudioCodecSpec>* specs);
  static std::unique_ptr<AudioDecoder> MakeAudioDecoder(
      Config config,
      absl::optional<AudioCodecPairId> codec_pair_id = absl::nullopt);
};

}  // namespace webrtc

AudioDecoderOpusis defined at webrtc/src/api/audio_codecs/opus/audio_decoder_opus.cc:

namespace webrtc {
    
    

absl::optional<AudioDecoderOpus::Config> AudioDecoderOpus::SdpToConfig(
    const SdpAudioFormat& format) {
    
    
  const auto num_channels = [&]() -> absl::optional<int> {
    
    
    auto stereo = format.parameters.find("stereo");
    if (stereo != format.parameters.end()) {
    
    
      if (stereo->second == "0") {
    
    
        return 1;
      } else if (stereo->second == "1") {
    
    
        return 2;
      } else {
    
    
        return absl::nullopt;  // Bad stereo parameter.
      }
    }
    return 1;  // Default to mono.
  }();

  if (STR_CASE_CMP(format.name.c_str(), "opus") == 0 &&
      format.clockrate_hz == 16000 && format.num_channels == 1 &&
      num_channels) {
    
    
    return Config{
    
    static_cast<int>(format.num_channels)};
  } else if (STR_CASE_CMP(format.name.c_str(), "opusswb") == 0 &&
      format.clockrate_hz == 32000 && format.num_channels == 1 &&
      num_channels) {
    
    
    return Config{
    
    static_cast<int>(format.num_channels)};
  } else if (STR_CASE_CMP(format.name.c_str(), "opusfb") == 0 &&
      format.clockrate_hz == 48000 && format.num_channels == 2 &&
      num_channels) {
    
    
    return Config{
    
    static_cast<int>(format.num_channels)};
  } else if (STR_CASE_CMP(format.name.c_str(), "opusfb") == 0 &&
      format.clockrate_hz == 48000 && format.num_channels == 1 &&
      num_channels) {
    
    
    return Config{
    
    static_cast<int>(format.num_channels)};
  } else {
    
    
    return absl::nullopt;
  }
}

void AudioDecoderOpus::AppendSupportedDecoders(
    std::vector<AudioCodecSpec>* specs) {
    
    
  AudioCodecInfo opus_info{
    
    48000, 1, 64000, 6000, 510000};
  opus_info.allow_comfort_noise = false;
  opus_info.supports_network_adaption = true;
  SdpAudioFormat opus_format(
      {
    
    "opus", 48000, 2, {
    
    {
    
    "minptime", "10"}, {
    
    "useinbandfec", "1"}}});
  specs->push_back({
    
    std::move(opus_format), std::move(opus_info)});
}

std::unique_ptr<AudioDecoder> AudioDecoderOpus::MakeAudioDecoder(
    Config config,
    absl::optional<AudioCodecPairId> /*codec_pair_id*/) {
    
    
  return absl::make_unique<AudioDecoderOpusImpl>(config.num_channels);
}

}  // namespace webrtc

The implementation routines of WebRTC builtin audio decoder factory and builtin audio encoder factory are almost exactly the same, so I won't repeat them here.

Guess you like

Origin blog.csdn.net/shichaog/article/details/128881021