WebRTC | Realize one-to-one communication of data stream

Table of contents

1. Browser support for WebRTC

Two, MediaStream and MediaStreamTrack

三、RTCPeerConnection

1. RTCPeerConnection is bound to local audio and video data

2. Media Negotiation SDP

3. ICE

(1) Candidate information

(2) WebRTC collects Candidate

(3) Exchanging Candidates

4. Exchange of SDP and Candidate messages

5. Remote audio and video rendering


        Realizing one-to-one real-time audio and video communication on the browser is the most important application scenario of WebRTC. Since the mainstream browsers already support WebRTC, it is easy to implement one-to-one communication in the browser, and it only takes a few lines of code to achieve.

1. Browser support for WebRTC

        At present, mainstream browsers in the world such as Chrome, Safari, and Firefox all support WebRTC. However, it should be noted that Microsoft's IE browser clearly stated that it does not support WebRTC, but supports it on the newly launched Edge browser. There are two main reasons for not supporting WebRTC on IE browser: one is that IE browser will be gradually replaced by Edge; the other is that supporting WebRTC requires large-scale adjustments to the browser architecture, which is too costly.

Browsers that support WebRTC

Two, MediaStream and MediaStreamTrack

        There are two important concepts in WebRTC, namely MediaStream and MediaStreamTrack.

        MediaStreamTrack is called "track", which means a single type of media source. For example, the video data collected from the camera is a MediaStreamTrack, and the audio collected from the microphone is another MediaStreamTrack.

        MediaStream is called "flow", which can include 0 or more MediaStreamTrack.

        MediaStream has two important functions. One is that it can be used as a source for recording or rendering, so that we can record the content in the Stream as a file or play the data in the Stream through the <video> tag in the browser; the other is in the The MediaStreamTrack data in the same MediaStream will be synchronized (for example, the audio track and video track in the same MediaStream will be time-synchronized), but the time synchronization between MediaStreamTracks in different MediaStreams will not be performed.

三、RTCPeerConnection

        The RTCPeerConnection object is the core of WebRTC. It is a unified interface that WebRTC exposes to users. It is composed of multiple modules, such as network processing module, service quality module, audio and video engine module and so on. It can be thought of as a super socket through which end-to-end data transmission can be easily completed. What is even more surprising is that it can also dynamically adjust the best service quality according to the actual network conditions.

1. RTCPeerConnection is bound to local audio and video data

        For the problem of binding data, the RTCPeerConnection object provides us with two methods: one is addTrack(); the other is addStream(). These two methods can realize the function of binding the collected data to RTCPeerConnection. However, since addStream() has been marked as obsolete in the WebRTC specification, it is recommended to use the addTrack() method as much as possible to avoid compatibility problems in the future.

        When the client receives the joined message from the server, it will create an RTCPeer Connection object, and then call the addTrack() function to bind it with the audio and video data collected through the getUserMedia() interface.

2. Media Negotiation SDP

        After the RTCPeerConnection object is bound to audio and video, media negotiation is required immediately. For example, the encoder you use by default is VP8. If you want to communicate with the other party, you also need to know whether the other party can decode VP8 data. If the other party does not support VP8 decoding, then you cannot use this encoder. For another example, one party in the communication says that my data is encrypted using DTLS-SRTP, and the other party must also have this capability, otherwise the two parties cannot communicate. This is media negotiation.

        During media negotiation, the exchanged content is in SDP format.

         The initiator of the negotiation is user A. After it creates the RTCPeerConnection object and binds it with the collected data, it starts to execute step ❶ in the figure, that is, calls the createOffer interface of the RTCPeerConnection object to generate the local negotiation information Offer in SDP format; the local After the negotiation information Offer is generated, call the setLocalDescription interface to save the Offer (step ❷); then send the Offer information to the remote user B through the signaling system of the client (step ❸). At this point, the media negotiation process of user A has come to an end (not yet completed).

        After receiving User A's Offer information through the signaling system, User B calls the setRemoteDescription interface of the local RTCPeerConnection object to save the Offer information (step ❹); after this step is completed, call the createAnswer interface to create an Answer message (step ❺ ) (The Answer message is also in SDP format, which records the negotiation information of user B); after the Answer message is created, user B calls the setLocalDescription interface to save the Answer information (step ❻). So far, the media negotiation of user B has been completed. Next, user B needs to send an Answer message to terminal A (step ❼), so that user A can continue to complete its own media negotiation.

        After user A receives the Answer message from user B, it can restart its unfinished media negotiation. User A needs to call the setRemoteDescription interface of the RTCPeerConnection object to save the received Answer message (step ❽). After this step is performed, the entire media negotiation process is finalized.

3. ICE

        When the media negotiation is completed, WebRTC begins to establish a network connection, and the process is called ICE ( Interactive Connectivity Establishment, interactive connection establishment ).

        More precisely, ICE starts after each side calls the setLocalDescription() interface. The operation process is: collect Candidate ( Candidate, connectable candidate. Each candidate is an information set including IP address and port, etc. ), exchange Candidate, and try to connect according to priority.       

(1) Candidate information

         Candidate is the basic information that WebRTC uses to describe the remote end it can connect to, so it is an information set that includes at least {address, port, protocol} triples.

ICE collects candidates, including: transmission mode (UDP, TCP, TLS, and TURN), connection strategy/type (host, srflx, prflx, relay), IP, port number, etc.

        WebRTC divides Candidate into four types, namely host, srflx, prflx and relay, and they also have priority order, among which host has the highest priority and relay has the lowest priority. For example, if WebRTC has collected two candidates, one is of host type and the other is of srflx type, then WebRTC will try to establish a connection with the host type of Candidate first, and if unsuccessful, it will use the srflx type of Candidate.

* host (host): The host address is the local address of the device, also known as the private address. It is the address of the device directly connected to the network, without NAT translation. In ICE, the host address is used to directly establish a point-to-point connection.
* srflx (server reverse mapping): srflx address is obtained through the STUN server. A STUN server can help a device discover its own public IP address and port behind NAT. The srflx address allows devices to establish direct point-to-point connections in a NAT environment.
* prflx (Peer-to-Peer Reverse Mapping): The prflx address is obtained through the TURN server. When a device cannot establish a connection directly, it can relay through a STUN server. The STUN server will assign a temporary public IP address and port to the device for communication.
* relay (relay): The relay address is obtained through the TURN server and is used for relay communication between devices. When the devices cannot directly establish a connection, they can communicate through the TURN server, and all data is forwarded by the server.

During WebRTC communication, connections will be attempted in the order of intranet, P2P, and relay (see ICE details below):
1. The intranet connection corresponds to the host connection strategy. A host connection means a connection established directly on a local device, that is, a direct connection between devices in the same local area network.
2. The P2P connection corresponds to the srflx and prflx connection policies. Both srflx (server reflexive) and prflx (peer reflexive) connections are public IP addresses obtained through the STUN server and are used to establish point-to-point connections. A srflx connection is a connection over a public IP address behind NAT, while a prflx connection is a peer's own NAT-mapped address.
3. The relay connection corresponds to the relay connection policy. When a direct P2P connection cannot be established, WebRTC will use the TURN server as a relay to establish a relay connection. The relay connection transfers data through the relay server to ensure reliable data transmission. 

(2) WebRTC collects Candidate

        There are several ways for WebRTC to collect candidates: host type candidates are determined according to the number of host network cards. Generally speaking, one network card corresponds to one IP address, and each IP address is randomly assigned a port to generate a host type. The Candidate of the srflx type is generated from the IP address and port obtained from the STUN server; the Candidate of the relay type is generated from the IP address and port number obtained from the TRUN server.

(3) Exchanging Candidates

        After WebRTC collects Candidates, it will send them to the peer through the signaling system. After receiving these Candidates, the peer end will form a CandidatePair (that is, a connection candidate pair) with the local Candidate. With CandidatePair in place, WebRTC can start trying to establish a connection.

        The exchange of Candidates is not carried out after all the Candidates are collected, but is exchanged while collecting.

4. Exchange of SDP and Candidate messages

        Message signaling is used when the communication parties need to exchange information SDP and Candidate messages. The initiator first sends a message to the signaling server, and the server directly forwards the message to the target user without any processing after receiving the message.

        Message exchange is divided into three steps, that is, the initiator sends the message to be exchanged, the server forwards the message after receiving it, and the client receives the message.

5. Remote audio and video rendering

        WebRTC provides us with a very good interface, the ontrack() event of the RTCPeerConnection object. Whenever there is audio and video data from the remote end, the ontrack() event will be triggered. Therefore, you only need to set a callback function for the ontrack() event to get the remote MediaStream.

Guess you like

Origin blog.csdn.net/weixin_39766005/article/details/132193162