Analysis of the establishment process of WebRTC transmission channel

 foreword

WebRTC is a free and open source project with a wide range of applications in real-time audio and video communication. It provides real-time communication (RTC) capabilities to browsers and mobile devices through a simple API. To best serve this purpose, WebRTC components are still being optimized. The goal of the official team is to enable feature-rich, high-quality communications across browsers, mobile and IoT devices through a common set of protocols. WebRTC needs to rely on a specific multimedia data transmission channel in the process of real-time audio and video communication. Today, let's take a look at the establishment process of this transmission channel.

text

Friends who are familiar with WebRTC must know the concept of PeerConnection. Yes, WebRTC relies on the PeerConnection channel to realize the transmission of multimedia data. Let's go into details below.

1. Global initialization

Before formally creating PeerConnection, it is necessary to initialize some global modules and set performance switches, such as enabling the video encoding error correction mechanism FlexFEC, enabling Intel VP8 hardware acceleration, disabling the automatic gain control of WebRTC, enabling log printing, etc. The following is an introduction to the Android device and WebRTC 76 version of the mobile terminal as an example. The reference code is as follows:

			PeerConnectionFactory.initialize(
          PeerConnectionFactory.InitializationOptions.builder(appContext)
              .setFieldTrials(fieldTrials)
              .setEnableInternalTracer(true)
              .createInitializationOptions());

After completing the initialization of the global module, the PeerConnection can be created.

二、PeerConnectionFactory

If you are careful, you will find that the method of the class is used when the global initialization process is performed above. At the same time, we can know that PeerConnectionFactory is a factory class by name, and the instance of PeerConnectionFactory factory class plays an important role in the subsequent creation of video encoders and decoders.

When creating an instance of the PeerConnectionFactory factory class, much of the PeerConnection channel, audio, and video setup work is done. Let's introduce them separately, which is very helpful for us to understand the function of the PeerConnectionFactory factory class.

1. PeerConnection channel

The global PeerConnection parameter determines whether to print the underlying log related to the PeerConnection. The reference code is as follows:

		if (peerConnectionParameters.tracing) {
      PeerConnectionFactory.startInternalTracingCapture(
          Environment.getExternalStorageDirectory().getAbsolutePath() + File.separator
          + "webrtc-trace.txt");
    }

2. Audio Settings

If we do not actively set the audio encoding of AAC or Opus, WebRTC will set the audio encoding to ISAC by default. The full spelling of ISAC is Internet Speech Audio Codec, developed by GIPS Company. It is a free and open source audio coding format, which is very suitable for VOIP application scenarios.

WebRTC also provides an interface for saving audio raw data, which can be used to locate some audio acquisition problems. If the audio we send has noise or distortion problems, we can give priority to whether there is a problem with the raw audio data collected. There is no problem with the original audio data, and then consider whether it is caused by modules such as encoding, transmission, decoding, and playback. After all, network packet loss is one of the most common reasons in actual use. And this interface can be used to help us locate whether the collected audio data is correct.

Although, the original audio data can be saved to the specified file through the setting item, but if the underlying layer has already started OpenSL ES, then the setting item will not take effect. At the same time, related modules for audio collection and playback are also set up, which act on the microphone and speaker of the hardware device of the Android system. The reference code is as follows:

		preferIsac = peerConnectionParameters.audioCodec != null
        && peerConnectionParameters.audioCodec.equals(AUDIO_CODEC_ISAC);

    if (peerConnectionParameters.saveInputAudioToFile) {
      if (!peerConnectionParameters.useOpenSLES) {
        Log.d(TAG, "Enable recording of microphone input audio to file");
        saveRecordedAudioToFile = new RecordedAudioToFileController(executor);
      } else {
        Log.e(TAG, "Recording of input audio is not supported for OpenSL ES");
      }
    }

    final AudioDeviceModule adm = createJavaAudioDevice();

3. Video Settings

Set the video encoding type. Generally, the modified WebRTC will support H264, VP8, and VP9. By default, H264 is not supported, just like the audio encoding format AAC is not supported. In addition, soft and hard coding and soft and hard decoding are also set. Generally, soft coding and soft decoding are corresponding, and hard coding and hard decoding are corresponding. The reference code is as follows:

		final boolean enableH264HighProfile =
        VIDEO_CODEC_H264_HIGH.equals(peerConnectionParameters.videoCodec);
    final VideoEncoderFactory encoderFactory;
    final VideoDecoderFactory decoderFactory;

    if (peerConnectionParameters.videoCodecHwAcceleration) {
      encoderFactory = new DefaultVideoEncoderFactory(
          rootEglBase.getEglBaseContext(), true /* enableIntelVp8Encoder */, enableH264HighProfile);
      decoderFactory = new DefaultVideoDecoderFactory(rootEglBase.getEglBaseContext());
    } else {
      encoderFactory = new SoftwareVideoEncoderFactory();
      decoderFactory = new SoftwareVideoDecoderFactory();
    }

    factory = PeerConnectionFactory.builder()
                  .setOptions(options)
                  .setAudioDeviceModule(adm)
                  .setVideoEncoderFactory(encoderFactory)
                  .setVideoDecoderFactory(decoderFactory)
                  .createPeerConnectionFactory();
    Log.d(TAG, "Peer connection factory created.");
    adm.release();

3. PeerConnection

PeerConnection can be understood as the multimedia data transmission channel of WebRTC, which plays an important role in the entire real-time audio and video communication process. At the same time, PeerConnection is one of the three external encapsulation interfaces of WebRTC.

Audio and video development document video data + mind map skill point route!

The creation of the PeerConnection instance depends on the PeerConnectionFactory instance mentioned above. Let's take a look at it in detail below. The RTCConfiguration class is a configuration parameter class related to PeerConnection, including ICE server, ICE-TCP, bundle policy, RTCP multiplexing policy, ECDSA encryption, DTLS encryption, SDP semantics, etc.

		PeerConnection.RTCConfiguration rtcConfig =
        new PeerConnection.RTCConfiguration(signalingParameters.iceServers);
 
    rtcConfig.tcpCandidatePolicy = PeerConnection.TcpCandidatePolicy.DISABLED;
    rtcConfig.bundlePolicy = PeerConnection.BundlePolicy.MAXBUNDLE;
    rtcConfig.rtcpMuxPolicy = PeerConnection.RtcpMuxPolicy.REQUIRE;
    rtcConfig.continualGatheringPolicy = PeerConnection.ContinualGatheringPolicy.GATHER_CONTINUALLY;
    rtcConfig.keyType = PeerConnection.KeyType.ECDSA;
    rtcConfig.enableDtlsSrtp = !peerConnectionParameters.loopback;
    rtcConfig.sdpSemantics = PeerConnection.SdpSemantics.UNIFIED_PLAN;

    peerConnection = factory.createPeerConnection(rtcConfig, pcObserver);

In addition, DataChannel will be dynamically enabled as needed. The DataChannel channel is a very important data channel. Some manufacturers have used it as a signaling transmission channel. The reference code is as follows:

		if (dataChannelEnabled) {
      DataChannel.Init init = new DataChannel.Init();
      init.ordered = peerConnectionParameters.dataChannelParameters.ordered;
      init.negotiated = peerConnectionParameters.dataChannelParameters.negotiated;
      init.maxRetransmits = peerConnectionParameters.dataChannelParameters.maxRetransmits;
      init.maxRetransmitTimeMs = peerConnectionParameters.dataChannelParameters.maxRetransmitTimeMs;
      init.id = peerConnectionParameters.dataChannelParameters.id;
      init.protocol = peerConnectionParameters.dataChannelParameters.protocol;
      dataChannel = peerConnection.createDataChannel("ApprtcDemo data", init);
    }

Fourth, create audio and video streams

1. Create an audio stream

After the PeerConnection is created, the audio track and audio source will be created immediately. The creation of the audio track depends on the creation of the audio source, but the PeerConnection acts directly on the audio track, so the addTrack() method will be called to bind the corresponding audio track to the PeerConnection object. The reference code is as follows:

		audioSource = factory.createAudioSource(audioConstraints);
    localAudioTrack = factory.createAudioTrack(AUDIO_TRACK_ID, audioSource);
    localAudioTrack.setEnabled(enableAudio);

2. Create a video stream

After the PeerConnection is created, the video track and video source will be created immediately. The creation of the video track depends on the creation of the video source, but the PeerConnection acts directly on the video track, so the addTrack() method will be called to bind the corresponding video track to the PeerConnection object.

However, creating a video track and a video source is different from audio, because the video needs to meet the needs of local preview, and the initialization of the VideoCapturer object instance needs to bind the monitoring event of the video source, and then start to collect the local image data of the Android device camera. At the same time, the created video track will also bind the VideoSink object instance of the video by calling the addSink() method. The object instance is set by passing parameters when creating the PeerConnection object instance. The reference code is as follows:

		surfaceTextureHelper =
        SurfaceTextureHelper.create("CaptureThread", rootEglBase.getEglBaseContext());
    videoSource = factory.createVideoSource(capturer.isScreencast());
    capturer.initialize(surfaceTextureHelper, appContext, videoSource.getCapturerObserver());
    capturer.startCapture(videoWidth, videoHeight, videoFps);

    localVideoTrack = factory.createVideoTrack(VIDEO_TRACK_ID, videoSource);
    localVideoTrack.setEnabled(renderVideo);
    localVideoTrack.addSink(localRender);

 Therefore, the flow of the local camera video picture data is very clear. It is first collected by the VideoCapturer object instance, then flows to the video source, then flows to the video track, and finally flows to the PeerConnection object instance to complete the transmission of multimedia data.

5. SDP negotiation

The local audio data and video data have been prepared, so the remaining work content is to negotiate with the remote end which multimedia data to transmit and how to transmit the multimedia data. This naturally involves the classic SDP negotiation mechanism of WebRTC. SDP (Session Description Protocol) is a session description protocol. WebRTC negotiates through SDP, and exchanges and negotiates SDP information locally and remotely to create a Session that meets the call requirements. , and ultimately determine the data content in the transmission channel. SDP negotiation is the basis of WebRTC for audio and video communication, and plays an important role in the entire audio and video interaction process.

1. Create an Offer

Offer is a collection of SDP information used to describe local multimedia capabilities in WebRTC. The actual logic of Offer creation is in the native layer, and the Java layer only provides the external interface createOffer() method. Friends who are familiar with Android system development must have a good understanding of JNI. As a bridge layer between the Java logic layer and the native bottom layer, the JNI module can easily realize the mixed development of the Java programming language and other programming languages. The reference code is as follows:

  public void createOffer(SdpObserver observer, MediaConstraints constraints) {
    nativeCreateOffer(observer, constraints);
  }

The SDP information in the Offer will contain local multimedia capabilities. The representation format of the media description information is as follows:

m=<media type> <port> <protocol> <format type>

A session description may contain multiple media descriptions, such as audio, video, text, etc. Each media description starts with an "m=" field and ends with the next "m=", of course, it may end with the entire SDP session description.

Among them, <media type> defines several types of video, audio, text, application, and message, and it is not excluded that they will be added in the future. <port>, the port for sending the media stream, the meaning of this field depends on the "c=" field and the <protocol> field.

For hierarchically encoded code streams, if a unicast address is used for transmission, ports must be used to distinguish these code streams. The specific format is as follows:

m=<media type> <port> くport number> <protocol> <format type>

In this case, the meaning of the port depends on the transport protocol. For RTP protocol, by default, only even-numbered ports are used to send RTP data, and the corresponding port is increased by 1 to send RTCP data. If multiple addresses are used in the "c=" field, and similarly, multiple ports are used in the "m=" field, then these addresses and ports are considered to be in one-to-one correspondence.

<protocol>, refers to the transmission protocol, and the transmission protocol is related to the "c=" field. For example, the IP4 field in the "c=" field indicates the protocol over IP4.

 Learning materials collection address: https://docs.qq.com/doc/DQm1VTHBlQmdmTlN2

If the <protocol> field is "RTP/AVP" or "RTP/SAVP", the media format indicates the number of the RTP payload format. When a linked list appears, it means that all media formats in the linked list can be used for the current media track, but the first media format is the default format. The "a= remap:" attribute is used to dynamically match the media format number and media format. The "a=fmtp:" attribute may be used to describe specific parameters of the media format.

If the <protocol> field is UDP, the media format specifies the media type as audio, video, text, application, or message. These media types also define the corresponding UDP transmission packet format.

     媒体名称
     m=  (media name and transport address)

     媒体标题
     i=* (media title)

     连接信息
     c=* (connection information -- optional if included at session level)

     带宽信息行
     b=* (zero or more bandwidth information lines)

     密钥
     k=* (encryption key)

     媒体属性行
     a=* (zero or more media attribute lines)

Let's take a look at what SDP looks like through a complete SDP example:

v=0

o=- 7644049451648220451 2 IN IP4 127.0.0.1

s=-

t=0 0

a=group:BUNDLE audio video

a=msid-semantic: WMS ARDAMS

m=audio 44585 UDP/TLS/RTP/SAVPF 111 103 104 9 102 0 8 106 105 13 110 112 113 126

c=IN IP4 172.31.200.23

a=rtcp:9 IN IP4 0.0.0.0

a=candidate:2586587190 1 udp 2122260223 172.31.200.23 44585 typ host generation 0 network-id 3 network-cost 10

a=candidate:559267639 1 udp 2122202367 ::1 45075 typ host generation 0 network-id 2

a=candidate:1510613869 1 udp 2122129151 127.0.0.1 34137 typ host generation 0 network-id 1

a=ice-ufrag:Rcuq

a=ice-pwd:OxDSE1pHNWhgcdHaX/3cYLE1

a=ice-options:trickle renomination

a=fingerprint:sha-256 49:B6:A0:48:F8:EB:82:1D:FB:DE:B9:22:33:0E:91:EE:60:34:73:45:2B:C3:92:3A:0B:0D:FF:B1:EF:AE:8E:29

a=setup:actpass

a=mid:audio

a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level

a=sendrecv

a=rtcp-mux

a=rtpmap:111 work/48000/2

a=rtcp-fb:111 transport-cc

a=fmtp:111 minptime=10;useinbandfec=1

a=rtpmap:103 ISAC/16000

a=rtpmap:104 ISAC/32000

a=rt"

2. Set up the Answer

The SDP information in Answer is actually the SDP description information of the opposite end sent by the remote client, which corresponds to the local Offer SDP description information, one is local and the other is remote. However, we still need to set the remote SDP information on the local end, which is an indispensable part in the SDP negotiation process. The meaning of the specific fields of the SDP information in Answer has been introduced above, so I will not repeat them here. The reference code is as follows:

				peerConnectionClient.setRemoteDescription(sdp);
        if (!signalingParameters.initiator) {
          logAndToast("Creating ANSWER...");
          peerConnectionClient.createAnswer();
        }

Note that setting the remote SDP description is done by listening to the onRemoteDescription() callback event. If you look at the code above, you will find that there is also some logic to create the Answer description information. In fact, this is because the relationship between Offer and Answer is relative. If the local is the initiator of the establishment of the PeerConnection channel, it needs to create the Offer first, but this Offer is the remote Answer in the view of the opposite end. For the exchange process of Offer and Answer, please refer to the following figure:

6. Transmission channel establishment

The establishment of the WebRTC transmission channel also depends on the setting of the Candidate. After the initiator of the PeerConnection channel establishment creates and sets the local SDP information, it will start the collection of ICE Candidate, and the collected results will be notified to the upper layer through the onIceCandidate callback event, and then by calling sendLocalIceCandidate () method to send to the peer. After the peer receives the Candidate, it will call the addRemoteIceCandidate() method to bind the Candidate to the corresponding PeerConnection object. Similarly, the local end will also bind the candidate of the remote end, and then the local end and the remote end will complete the final communication through a set of corresponding candidates. But in this process, the SDP renegotiation process may also be faced due to various reasons.

in conclusion

In the process of real-time audio and video communication, WebRTC needs to rely on a specific multimedia data transmission channel. This article has basically explained the establishment process of this transmission channel. However, this article only introduces a very simple channel establishment model. In fact, there will be many other problems during the establishment and use of the entire channel. This article is not expanded due to space limitations. Interested partners welcome comments message to communicate.

Guess you like

Origin blog.csdn.net/m0_60259116/article/details/123928135