The overall architecture, technical principles and use of WebRTC real-time audio and video technology

1. Basic introduction


WebRTC (full name: Web Real-Time Communication), which is web page instant communication. It is a technical solution that supports web browsers for real-time voice conversations or video conversations. From the perspective of front-end technology development, it is a set of callable API standards.

Before the release of WebRTC, the cost of developing real-time audio and video interactive applications was very expensive, and there were many technical issues that needed to be considered, such as audio and video encoding and decoding issues, data transmission issues, delay, packet loss, jitter, echo processing and elimination, etc. , if you want to be compatible with real-time audio and video communication on the browser side, you need to install additional plug-ins.

May 2010: Google acquired the GIPS engine of VoIP software developer Global IP Solutions for US$68.2 million and changed its name to "WebRTC" (see "Amazing WebRTC: The ecosystem is getting better and better, or real-time audio and video technology may become a cabbage" ). It aims to establish a platform for real-time communication between Internet browsers and make WebRTC technology one of the H5 standards.

Significance

The emergence, development and general recognition of WebRTC by industry standards organizations (such as W3C) are of great significance to the current and future development of big front-end technology.

Lower the threshold for audio and video interactive development on the web:

    1) Previous audio and video interactive development has a certain technical threshold for web developers;

    2) Now with the help of WebRTC, web developers can quickly implement audio and video interactive applications by calling JS interfaces.

Avoid secondary problems caused by dependencies and plug-ins:

    1) In the past, the construction of audio and video interactive applications relied on various plug-ins, software and servers;

    2) End-to-end audio and video interaction can now be formed with the help of mainstream browsers.

Unification and standardization avoid the differences in traditional audio and video interaction environments:

    1) In the past, audio and video interaction required different NATs and firewalls, which brought great challenges to the establishment of media P2P;

    2) There is now libjingle, an open source project for P2P hole-punching in WebRTC, which supports STUN, TURN and other protocols.

More efficient and optimized algorithms and technologies improve audio and video interaction performance:

    1) WebRTC uses NACK and FEC technologies to avoid routing through the server, reducing delay and bandwidth consumption;

    2) There are also technologies such as TCC + SVC + PACER + JitterBuffer that optimize audio and video fluency.

3. Technical characteristics

WebRTC is rich in content, and its main technical features include the following points.

1) Real-time communication:

WebRTC is a real-time communication technology that allows network applications or sites to establish point-to-point (Peer-to-Peer) connections between browsers without the use of intermediaries to achieve video streaming and/or audio streaming or other Transmission of arbitrary data.

2) No dependencies/plugins:

These standards included in WebRTC make it possible for users to create peer-to-peer data sharing and conference calls without installing any plug-ins or third-party software.

3) There are many protocol stacks:

WebRTC is not a single protocol. It includes multiple protocol standards including media, encryption, transport layer, etc., as well as a set of JavaScript-based APIs. It includes audio and video collection, encoding and decoding, network transmission, display and other functions. Through the simple and easy-to-use JavaScript API, the browser has the ability to share P2P audio, video and data without installing any plug-ins.

WebRTC relies on numerous protocol stack diagrams
:

At the same time, WebRTC is not an isolated protocol. It has flexible signaling and can easily connect to existing SIP and telephone network systems.

4. Compatible coverage


Currently, most mainstream browsers are normally compatible with WebRTC:

For more detailed browser and version compatibility, please take a look at the picture below:

Mainstream browsers support the WebRTC standard API, which makes plug-in-free audio and video interoperability between browsers possible, greatly lowering the threshold for audio and video development. Developers only need to call the WebRTC API to quickly build audio and video applications. .
5. Technical framework

As shown in the figure below: the technical framework describes the core content of WebRTC and the API design for different developers.

WebRTC technology framework diagram
:

As can be seen from the figure, WebRTC mainly targets API designs for three types of developers:

    1) API for web developers: The framework includes a set of API standards based on JavaScript and certified by W3C, allowing web developers to develop instant messaging applications based on WebRTC based on this set of APIs;

    2) For browser manufacturers’ APIs: The framework also includes the underlying WebRTC interface based on C++, which is very friendly to browser manufacturers’ underlying access;

    3) Parts that can be customized by browser manufacturers: The framework also includes extensions such as audio and video interception that can be customized by browser manufacturers.

6. Technical core

As can be seen from the framework of the previous section, WebRTC mainly consists of three parts: audio, video engine and transmission, which also contains many protocols and methods.

1) Voice Engine:

    a. Voice Engine includes iSAC/iLBC Codec (audio codec, the former is for broadband and ultra-wideband, and the latter is for narrowband);

    b. NetEQ for voice (handling network jitter and voice packet loss);

    c. Echo Canceler/Noise Reduction.

2) Video Engine:

    a. VP8 Codec (video image codec);

    b. Video jitter buffer (video jitter buffer, handles video jitter and video packet loss);

    c. Image enhancements (image quality enhancement).

3) Transport.
7. Technical principles
7.1 Basic situation

The main technical features of WebRTC:

    1) SRTP: Secure real-time transmission protocol for audio and video streaming;

    2) Multiplexing: multiplexing;

    3) P2P: STUN+TURN+ICE, used for NAT network and firewall traversal;

    4) DTLS: Secure transmission may also use DTLS (Datagram Transmission Secure) for encrypted transmission and key negotiation;

    5) UDP: The entire WebRTC communication is based on UDP.

Due to space limitations, the following chapters of this article will not introduce audio and video collection, encoding and processing in detail, but will only introduce the core content of the principles of the establishment process of real-time communication.
7.2 Public network IP mapping: clarify network positioning information

WebRTC is implemented based on browser end-to-end connection (P2P).

Since server transfer is not required, the way to obtain the network address of the connection object is to use auxiliary network penetration technology (NAT) such as ICE, STUN, and TURN to obtain the public network address and port of the corresponding host and other network positioning information.

Clear network positioning is the basis for establishing direct end-to-end communication.

NAT penetration schematic diagram :

7.3 Signaling server: network negotiation and information exchange

The role of the signaling server is to relay information based on duplex communication.

The transit information includes network positioning information after public network IP mapping, such as public network IP, port, and media data flow.

Concept map :

7.4 Session Description Protocol SDP: Unified media negotiation method

The role of SDP:

    1) Different terminals/browsers have different encoding formats for media stream data, such as VP8, VP9, ​​etc., the capabilities of each member participating in the session are not equal, and the user environment and configuration are inconsistent, etc.;

    2) WebRTC communication also requires determining and exchanging local and remote audio and video media information, such as resolution and codec capabilities. The signaling for exchanging media configuration information is performed by exchanging Offer and Anwser using Session Description Protocol (SDP);

    3) The exchange of SDP must precede the exchange of audio and video streams. Its content includes basic session information, media information description, etc. //SDP structure Session description (session level description) v= (protocol version)o= (originator and session identifier)s= (session name)c=* (connection information -- not required ifincluded inall media)One or moreTime descriptions ("t="and "r="lines; see below)a=* (zero or moresession attribute lines)Zero or moreMedia descriptionsTime descriptiont= (timethe session is active)Media description (media level description), ifpresentm= (media name and transport address)c=* (connection information -- optional ifincluded at session level)a=* (zero or moremedia attribute lines)
 

   An example of SDP is as follows:

    v=0 //Represents the version, currently it is usually v=0.o=- 3883943731 1 IN IP4 127.0.0.1s=t=0 0 //The time the session is active a=group:BUNDLE audio video //: Description Quality of service, transport layer multiplexing related information m=audio 1 RTP/SAVPF103 104 0 8 106 105 13 126 //...a=ssrc:2223794119 label:H4fjnMzxy3dPIgQ7HxuCTLb4wLLLeRHnFxh81

7.5 One-to-one connection establishment process

Let’s briefly explain the process of establishing a one-to-one Web RTC connection as an example.

One-to-one process diagram:

  • 1) Exchange SDP and obtain respective media configuration information;

  • 2) The STUN server exchanges network information such as network addresses and ports;

  • 3) Turn to transfer audio and video media stream data.

work flow chart:

    1) Both parties A and B first call getUserMedia to open the local camera as the local media stream to be output;

    2) Send a room-joining request to the signaling server;

    3) Peer B receives the offer SDP object sent by Peer A, saves the Answer SDP object through the SetLocalDescription method of PeerConnection, and sends it to Peer A through the signaling server;

    4) In the offer/answer process of SDP information, Peer A and Peer B have created the corresponding audio channel and video channel based on the SDP information, and started the collection of candidate data. Candidate data (local IP address, public IP address, The address assigned by the Relay server);

    5) When Peer A collects the Candidate information, it sends it to Peer B through the signaling server. Peer B will send the same process to Peer A again.

7.6 Many-to-many establishment

Concept diagram of establishing a point-to-point connection from many to many, taking the point-to-point connection of three users as an example:

7.7 Main JavaScript interfaces for WebRTC

getUserMedia(): Access data streams, such as from the user's camera and microphone

    //Request media type const constraints = {video: trueaudio:true}; const video = document.querySelector('video'); //Mount the stream to the corresponding dom to display the local media stream function handleSuccess(stream) {video.srcObject = stream;}function handleError(error) {console.error('getUserMedia error: ', error);}//Use the camera to capture the multimedia stream navigator.mediaDevices.getUserMedia(constraints).then(handleSuccess).catch(handleError);

RTCPeerConnection: Enable audio or video calls with encryption and bandwidth management tools

    // Allow RTC server configuration. const server = {"iceServers":[{ "urls": "stun: http://stun.stunprotocol.org"}]};// Create a local connection const localPeerConnection = newRTCPeerConnection(servers);// Collect Candidate data localPeerConnection .onicecandidate=function(event){...}// Monitor the operation when the media stream is accessed localPeerConnection.ontack=function(event){...}

RTCDataChannel: supports point-to-point communication of general data, often used for point-to-point transmission of data

    const pc = newRTCPeerConnection();const dc = pc.createDataChannel("my channel");//Receive data dc.onmessage = function(event) {console.log("received: "+ event.data);};/ /Open transmission dc.onopen = function() {console.log("datachannel open");}; //Close transmission dc.onclose = function() {console.log("datachannel close");};

8. Application cases


Here is a rough demonstration using the multi-person video case of WebRTC as a practice.

8.1 Design framework

Basic frame diagram of multi-person video:

8.2 Key code
8.2.1) Media capture:

Obtain the browser video permission, capture the local video media stream, attach the media stream to the Video element, and display the local video results. code show as below.

    //Camera compatibility processing navigator.getUserMedia = ( navigator.getUserMedia ||navigator.webkitGetUserMedia ||navigator.mozGetUserMedia ||navigator.msGetUserMedia); // Get local audio and video streams navigator.mediaDevices.getUserMedia({"audio": false,"video": true}).then( (stream)=> {//Display your own output stream and hang it on the Video element of the page document.getElementById("myVido").srcObject=stream})

    // stun and turn servers const iceServer = {"iceServers": [{urls:"stun: http://stun.l.google.com:19302"}]};//Create RTCPeerConnectionconst peerRTCConn for point-to-point connections =newRTCPeerConnection(iceServer);

8.2.2) Network negotiation:

The main tasks are: creating peer-to-peer connections, collecting ICE candidates, and mounting to the dom when waiting for media streams to access.

Interactive Connectivity Establishment (ICE) is a framework that allows real-time peers to discover and connect to each other. This technique allows peers to discover enough information about each other's topology to potentially find one or more communication paths between each other. The ICE agent is responsible for: collecting local IP, port tuple candidates, performing connection checks between peers and sending connection keep-alives. (For an introduction to ICE, see "Detailed Explanation of STUN, TURN, and ICE on P2P Technology")

    //Send ICE candidates to other clients peerRTCConn.onicecandidate = function(event){if(event.candidate) {//Forward the collected ICE candidates to the signaling server socket.send(JSON.stringify({"event": "relayICECandidate","data": {'iceCandidate': {'sdpMLineIndex': event.candidate.sdpMLineIndex,'candidate': event.candidate.candidate}},"fromID":signalMsg 'data'}));}} //Mount dom if media stream is involved peerRTCConn.ontrack=function(event){let v=document.createElement("video")v.autoplay=truev.style="width:200px"document.getElementById("peer" ).appendChild(v)v.srcObject=event.streams[0]}

8.2.3) Media consultation:

Offer is created when initiated. The peer uses the setLocalDescription method to add session information to RTCPeerConnection(), and it is relayed by the signaling server. Other Peers will return corresponding Answers. SDP process:

    //The newly added node initiates an offer if(canOffer){peerRTCConn.createOffer(function(localDescription) {peerRTCConn.setLocalDescription(localDescription,function() {//Send description information to the signaling server socket.send(JSON.stringify({" event":"relaySessionDescription","data":localDescription,"fromID":peerId}))},function() { alert("offer failed"); });},function(error) {console.log(" error sending offer: ", error);})}

Answer is created in response. The session description includes audio and video information and other content. When the initiator sends an offer type description to the responder, the responder will return an answer type description:

    //Create an Answer session peer.createAnswer(function( remoteDescription) {peer.setLocalDescription(remoteDescription,function() {//Send description information to the signaling server socket.send(JSON.stringify({"event":"relaySessionDescription", "data":_remoteDescription,"callerID":signalMsg['fromId'],"fromID":signalMsg['fromId']})) },function() { alert("answer failed"); });},function (error) {console.log("error creating answer: ", error);});

When an ICE candidate share is received, the ICE candidate is added to the remote peer description:

    //Corresponding RTCPeerConnectionconst peer = peers[signalMsg["fromID"]]; //ICE candidates are added to the remote peer point description peer.addIceCandidate(newRTCIceCandidate(signalMsg["data"].iceCandidate));

8.2.4) Signaling relay:

Signaling service part key code:

    wss.on('connection', function(ws) {ws.on('message', function(message) {let meeageObj=JSON.parse(message)//交换ICE候选 if (meeageObj['event'] =='relayICECandidate') { wss.clients.forEach(function (client) {console.log("send iceCandidate")client.send(JSON.stringify({"event": "iceCandidate","data": meeageObj['data'],"fromID": meeageObj['fromID']}));});}//交换SDPif (meeageObj['event'] =='relaySessionDescription') {console.log(meeageObj["fromID"],meeageObj["data"].type)wss.clients.forEach(function(client) {if(client!=ws) {client.send(JSON.stringify({"event": "sessionDescription","fromId":meeageObj["fromID"],"data": meeageObj["data"],}));}});}})})

9. Summarize

The main advantages of WebRTC are:

1) Convenience: For users, before the emergence of WebRTC, if they wanted to communicate in real time, they needed to install plug-ins and clients. However, for many users, downloading plug-ins, installing and updating software are complicated and prone to problems. Yes, now WebRTC technology is built into the browser, and users can achieve real-time communication through the browser without using any plug-ins or software. For developers, before Google made WebRTC open source, the technology for communicating between browsers was in the hands of large companies. The development of this technology was a very difficult task. Now developers use simple HTML tags and JavaScript API can realize the function of Web audio/video communication.

2) Free: Although WebRTC technology is relatively mature and integrates the best audio/video engine and very advanced codec, Google does not charge any fees for these technologies.

3) Powerful hole-piercing capability: WebRTC technology includes key NAT and firewall penetration technologies using STUN, ICE, TURN, RTP-over-TCP, and supports proxies.

The main disadvantages of WebRTC are:

1) Lack of design and deployment of server solutions.

2) The transmission quality is difficult to guarantee. The transmission design of WebRTC is based on P2P, which makes it difficult to guarantee transmission quality and has limited optimization methods. It can only do some end-to-end optimization, which makes it difficult to cope with the complex Internet environment. For example, the transmission quality in cross-region, cross-operator, low bandwidth, high packet loss and other scenarios basically depends on the weather, and this is exactly the typical scenario of domestic Internet applications.

3) WebRTC is more suitable for one-on-one individual chats. Although the function can be expanded to implement group chats, there is no optimization for group chats, especially very large group chats.

4) Device-side adaptation, problems such as echo and recording failure arise one after another. This is especially true on Android devices. Since there are many Android device manufacturers, each manufacturer will customize the standard Android framework, resulting in many usability issues (failure to access the microphone) and quality issues (such as echo, howling).

5) Insufficient support for Native development. As the name suggests, WebRTC is mainly for Web applications. Although it can also be used for Native development, due to the large amount of domain knowledge involved (audio and video collection, processing, encoding and decoding, real-time transmission, etc.), the entire framework design is relatively complex, and the API granularity is also relatively large. So detailed that even the compilation of engineering projects is not an easy task.
 

Guess you like

Origin blog.csdn.net/xiehuanbin/article/details/133273590
Recommended