Introduction to WebRTC Theory

As an introduction, this article is   translated from https://www.html5rocks.com/en/tutorials/webrtc/basics/ . It is an introduction to WebRTC theory. I feel that it is an article that explains webrtc more clearly and clearly. It will be easier to learn the SDK based on operating system transplantation in the future.

WebRTC is a new front in the long war for an open and unencumbered web.——Brendan Eich, inventor of JavaScript

Real-time messaging without plug-ins 

Imagine a world where your phone, TV, and computer can communicate on a common platform. Imagine it becomes easier to add video chat and peer-to-peer data sharing to your web applications. This is the vision of WebRTC.

Want to give it a try? WebRTC is available on desktop and mobile devices in Google Chrome, Safari, Firefox, and Opera. A good place to start is   the simple video chat application at appr.tc :

  1. Open appr.tc in your browser   .
  2. Click Join to join the chat room and let the app use your webcam.
  3. Open the URL shown at the end of the page in a new tab, or preferably on another computer.

Quick start

  1. If you haven't used the getUserMedia API yet, take a look at  Capture audio and video in HTML5  and  simpl.info getUserMedia .
  2. To understand the RTCPeerConnection API, see this  following example  and  simpl.info RTCPeerConnection .
  3. To understand how WebRTC uses the server for signaling as well as firewall and NAT traversal, see this example of  the control print log in appr.tc .
  4. Can’t wait to try WebRTC now? Try the 20+  demos  that demonstrate the WebRTC JavaScript API.
  5. Are you having trouble launching WebRTC on your machine? Visit here for  the WebRTC Troubleshooter .

Plus, head straight into the WebRTC codelab  , a step-by-step guide on how to build a complete video chat application, including a simple signaling server.

A brief history of WebRTC

The last major challenge in networking is enabling human-to-human communication via voice and video: real-time communications, or RTC for short. In web applications, RTC should be as natural as entering text in a text input. Without it, your ability to innovate and develop new ways of interacting with people will be limited. Historically, RTC has been collaborative and complex, requiring expensive audio and video technology to be licensed or developed in-house. Integrating RTC technology with existing content, data and services is difficult and time-consuming, especially on the Web.

Gmail video chat became popular in 2008, and in 2011 Google introduced Hangouts, which used Talk (like Gmail). Google acquired GIPS, a company that develops many of the components needed for RTC, such as codecs and echo cancellation technology. Google open sourced the technology developed by GIPS and worked with relevant standards bodies at the Internet Engineering Task Force (IETF) and World Wide Web Consortium (W3C) to ensure industry consensus. In May 2011, Ericsson built the first implementation of WebRTC.

WebRTC implements an open standard for real-time, plug-in-free video, audio, and data communications. This is what happened in the past:

  • Many web services use RTC but require downloads, native applications or plug-ins. These include Skype, Facebook and Hangouts.
  • Downloading, installing, and updating plugins is complex, error-prone, and annoying.
  • Plugins are difficult to deploy, debug, troubleshoot, test, and maintain, and may require licensing and integration with complex, expensive technology. Convincing people to install a plugin in the first place is often difficult!

The guiding principle of the WebRTC project is that its API should be open source, free, standardized, built into web browsers, and more efficient than existing technologies.

 

Where are we now?

WebRTC is used in various applications such as Google Meet. WebRTC is also integrated with WebKitGTK+ and Qt native applications.

WebRTC implements the following three sets of APIs:

These APIs are defined in the following two specifications:

All three APIs are supported in Chrome, Safari, Firefox, Edge, and Opera on mobile devices and desktops.

getUserMedia: For demos and code, see  the WebRTC samples  or try Chris Wilson's amazing examples  that use getUserMedia as input for Web Audio.

RTCPeerConnection: For a simple demo and a fully functional video chat application, check out  the WebRTC samples Peer connection  and  appr.tc. These examples use  adapter.js ,  a JavaScript shim maintained by Google with help from  the WebRTC community , a library needed to abstract the differences between browsers.

RTCDataChannel: To see it in action, check out  the WebRTC samples  to check out one of the data channel demos.

The WebRTC codelab  /  WebRTC codelab  shows how to use all three APIs to build simple applications for video chat and file sharing.

The first WebRTC

A few things a WebRTC application needs to do:

  • Get streaming audio, video or other data.
  • Network information, such as IP address and port, is obtained and exchanged with other WebRTC clients (called peers) to enable connections, even through NATs and firewalls.
  • Coordinates signaling communications to report errors and start or close sessions.
  • Exchange information about media and client capabilities, such as resolution and codecs.
  • Exchange audio streams, video streams or data streams.

To obtain and deliver streaming data, WebRTC implements the following APIs:

  • MediaStream  Can access data streams, such as those from the user's camera and microphone.
  • RTCPeerConnection Enable audio or video calls with encryption and bandwidth management.
  • RTCDataChannel Enables peer-to-peer communication of common data.

(More on the networking and signaling aspects of WebRTC later.)

MediaStream API (often also called  getUserMedia API)

MediaStream The API  represents synchronous streaming of media. For example, streams taken from camera and microphone inputs have synchronized video and audio tracks. (Don't confuse MediaStreamTrack with the <track> element, which is an entirely different thing.) Probably the easiest way to understand the MediaStream API is to see it in use:

  1. In your browser, navigate to  WebRTC samples getUserMedia .
  2. F12 opens the console
  3. Global scope, check  stream variables.

Each MediaStream has an input (possibly the MediaStream generated by getUserMedia()) and an output (possibly passed to a video element or RTCPeerConnection). Each MediaStream has a label, such as 'Xk7EuLhsuHKbnjLWkW4yYGNJJ8ONsgwHBvLQ'. The getAudioTracks and getVideoTracks methods return arrays of MediaStreamTracks.

For the getUserMedia example, stream.getAudioTracks returns an empty array (because there is no audio), and assuming a functioning webcam is connected, stream.getVideoTracks returns a MediaStreamTrack array that represents the stream from the webcam. Each MediaStreamTrack has a type ("Video" or "Audio"), a label (such as "FaceTime HD Camera"), and represents one or more audio or video channels. In the current example case, there is only one video track and no audio, but it is easy to think of use cases for multiple tracks, such as chat applications that get video streams from front camera, rear camera, microphone, and share their screen app.

A MediaStream can be attached to a video element by setting the srcObject attribute attribute. This was previously done by setting the src attribute to the object URL created using URL.createObjectURL, but this is no longer recommended.

MediaStreamTrack is actively using the camera, which is consuming resources and keeping the camera open and the camera lit. When you are no longer using the track, make sure to call track.stop so you can turn off the camera.

getUserMedia can also be used as an input node to  the Web Audio API  :

// Cope with browser differences.
let audioContext;
if (typeof AudioContext === 'function') {
  audioContext = new AudioContext();
} else if (typeof webkitAudioContext === 'function') {
  audioContext = new webkitAudioContext(); // eslint-disable-line new-cap
} else {
  console.log('Sorry! Web Audio not supported.');
}

// Create a filter node.
var filterNode = audioContext.createBiquadFilter();
// See https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#BiquadFilterNode-section
filterNode.type = 'highpass';
// Cutoff frequency. For highpass, audio is attenuated below this frequency.
filterNode.frequency.value = 10000;

// Create a gain node to change audio volume.
var gainNode = audioContext.createGain();
// Default is 1 (no change). Less than 1 means audio is attenuated
// and vice versa.
gainNode.gain.value = 0.5;

navigator.mediaDevices.getUserMedia({audio: true}, (stream) => {
  // Create an AudioNode from the stream.
  const mediaStreamSource =
    audioContext.createMediaStreamSource(stream);
  mediaStreamSource.connect(filterNode);
  filterNode.connect(gainNode);
  // Connect the gain node to the destination. For example, play the sound.
  gainNode.connect(audioContext.destination);
});

Chromium-based apps and extensions can also incorporate getUserMedia. After adding audioCapture and/or videoCapture permissions in the manifest, permissions are requested and granted only once during installation. Thereafter there is no need to ask the user for permission to access the camera or microphone. Only need to grant permission once to getUserMedia(). The first time, an "Allow" button will appear in the browser's information bar. Chrome deprecated HTTP access to getUserMedia() in late 2015 because it was classified as a powerful feature. The purpose may be to enable MediaStream for any streaming data source, not just cameras or microphones. This will enable streaming from stored data or arbitrary data sources such as sensors or other inputs.

getUserMedia() really comes to life when combined with other JavaScript APIs and libraries:

  • Webcam Toy  is a photo booth application that uses WebGL to add weird and wonderful effects to photos that can be shared or saved locally.
  • FaceKat  is    a face tracking game built using headtrackr.js .
  • ASCII Camera   uses the Canvas API to generate ASCII images.

Restrictions

Constraints  constraints can be used to set the value of the video resolution for getUserMedia. This also allows support for other constraints such as aspect ratio; facing mode (front or rear camera); frame rate, height and width; and  applyConstraintsmethods. For an example, see WebRTC examples  : select resolution . getUserMedia

Pitfall reminder: getUserMedia constraints may affect the available configuration of shared resources. For example, if you open the camera in 640 x 480 mode via one page tab, another page tab won't be able to use constraints to open it in high-resolution mode because it can only open in one mode. Note that this is an implementation detail. It would be possible to have the second tab reopen the camera in high-resolution mode and use video processing to shrink the first tab's video track to 640 x 480, but this is not yet implemented.

A DOMException or OverconstrainedError will be given if the requested resolution is not available, or if disallowed constraint values ​​are set. To see it in action, see  WebRTC samples  getUserMedia: select resolution 

Screen and tab capture

Chrome apps can also share live video of a single browser tab or the entire desktop via the chrome.tabCapture and chrome.desktopCapture APIs. (For a demo and more information, see Using Screensharing with WebRTC . The article is a few years old, but still interesting.)

Using the experimental chromeMediaSource constraint, it is also possible to use screenshots as a MediaStream source in Chrome. Please note that screen capture requires HTTPS and should only be used for development as it is enabled via the command line flags described in this article.

 

Signaling: session control, network and media information

WebRTC uses RTCPeerConnection to communicate streaming data between browsers (also called peers), but it also requires a mechanism to coordinate the communication and send control messages, a process called signaling. WebRTC does not specify signaling methods and protocols. Signaling is not part of the RTCPeerConnection API. Instead, WebRTC application developers can choose any messaging protocol they like, such as SIP or XMPP, and any appropriate full-duplex (bidirectional) channel for signaling communications. The appr.tc  example uses XHR and Channel API as signaling mechanisms. codelab  /  codelab  uses  Socket.io  running on a  Node server .

Signaling is used to exchange three types of information:

  • Session control messages: Initialize or close communication and report errors.
  • Network Configuration: What is your computer's IP address and port to the outside world?
  • Multimedia feature configuration: What codecs and resolutions can your browser and the browser it is communicating with handle?

Before peer-to-peer streaming can begin, the exchange of information via signaling must complete successfully. For example, suppose Alice wants to communicate with Bob. Here is a code example from  the W3C WebRTC spec  that shows the signaling process in action. This code assumes that there is some signaling mechanism created in the createSignalingChannel method.

// handles JSON.stringify/parse
const signaling = new SignalingChannel();
const constraints = {audio: true, video: true};
const configuration = {iceServers: [{urls: 'stuns:stun.example.org'}]};
const pc = new RTCPeerConnection(configuration);

// Send any ice candidates to the other peer.
pc.onicecandidate = ({candidate}) => signaling.send({candidate});

// Let the "negotiationneeded" event trigger offer generation.
pc.onnegotiationneeded = async () => {
  try {
    await pc.setLocalDescription(await pc.createOffer());
    // Send the offer to the other peer.
    signaling.send({desc: pc.localDescription});
  } catch (err) {
    console.error(err);
  }
};

// Once remote track media arrives, show it in remote video element.
pc.ontrack = (event) => {
  // Don't set srcObject again if it is already set.
  if (remoteView.srcObject) return;
  remoteView.srcObject = event.streams[0];
};

// Call start() to initiate.
async function start() {
  try {
    // Get local stream, show it in self-view, and add it to be sent.
    const stream =
      await navigator.mediaDevices.getUserMedia(constraints);
    stream.getTracks().forEach((track) =>
      pc.addTrack(track, stream));
    selfView.srcObject = stream;
  } catch (err) {
    console.error(err);
  }
}

signaling.onmessage = async ({desc, candidate}) => {
  try {
    if (desc) {
      // If you get an offer, you need to reply with an answer.
      if (desc.type === 'offer') {
        await pc.setRemoteDescription(desc);
        const stream =
          await navigator.mediaDevices.getUserMedia(constraints);
        stream.getTracks().forEach((track) =>
          pc.addTrack(track, stream));
        await pc.setLocalDescription(await pc.createAnswer());
        signaling.send({desc: pc.localDescription});
      } else if (desc.type === 'answer') {
        await pc.setRemoteDescription(desc);
      } else {
        console.log('Unsupported SDP type.');
      }
    } else if (candidate) {
      await pc.addIceCandidate(candidate);
    }
  } catch (err) {
    console.error(err);
  }
};

First, Alice and Bob exchange network information. ( Finding candidates  refers to the process of finding network interfaces and ports using  the ICE framework  .)

  1. Alice creates an RTCPeerConnection object and adds an onicecandidate handler that calls back when a network candidate is available.
  2. Alice sends the serialized candidate data to Bob, via whatever signaling channel was used previously (e.g. WebSocket or some other mechanism)
  3. When Bob gets the candidate message from Alice, he calls addIceCandidate to add the candidate information to the remote peer description (setRemoteDescription) in his own RTCPeerConnection object.

WebRTC clients (also called peers in this example, or Alice and Bob) also need to determine and exchange local and remote audio and video media information, such as resolution and codec capabilities. The exchange of media configuration information is signaled by exchanging offers  and answers using the Session Description Protocol (SDP)  :

  1. Alice Alice executes  RTCPeerConnection createOffer() the method. The object returned by the method is passed to RTCSessionDescription - Alice's local session description.
  2. In the callback, Alice sets the local description using setLocalDescription and then sends this session description to Bob through its signaling channel. Note that RTCPeerConnection does not start collecting candidates until setLocalDescription is called. This is codified in the JSEP IETF draft .
  3. Bob uses setRemoteDescription to set the description sent to him by Alice as the remote description.
  4. Bob executes  RTCPeerConnection createAnswer() the method, passing it the remote description he got from Alice (RTCPeerConnection) so that a local session compatible with her can be generated. createAnswer() The callback returns  RTCSessionDescriptionthe object, Bob sets it to the local description and sends it to Alice.
  5. When Alice gets Bob's session description, she sets it to the remote description using setRemoteDescription.
  6. Ping!

Tip: Make sure to call close() after the RTCPeerConnection is no longer in use to allow the RTCPeerConnection to be garbage collected. Otherwise, the thread and connection will remain active. Possible leak of massive resources in WebRTC!

RTCSessionDescription is an object conforming to the SDP session description protocol. After serialization, the SDP descriptor looks roughly like this:

v=0
o=- 3883943731 1 IN IP4 127.0.0.1
s=
t=0 0
a=group:BUNDLE audio video
m=audio 1 RTP/SAVPF 103 104 0 8 106 105 13 126
// ...
a=ssrc:2223794119 label:H4fjnMzxy3dPIgQ7HxuCTLb4wLLLeRHnFxh810

The acquisition and exchange of network and media information can occur simultaneously, but both processes must be completed before audio and video streaming between peers can begin.

The offer/answer architecture described previously is called the JavaScript Session Establishment Protocol, or JSEP. (There's an excellent animated  Ericsson's demo video  explaining the signaling and streaming process in this demo video of Ericsson's first WebRTC implementation.)

Once the signaling process has completed successfully, data can be streamed directly point-to-point between the caller and callee - or, failing that, through an intermediate relay server (more on this later). Streaming is the main work of RTCPeerConnection.

RTCPeerConnection

RTCPeerConnection is a WebRTC component used to handle stable and efficient communication of streaming data between peers. Below is the WebRTC architecture diagram showing the role of RTCPeerConnection. You'll notice that the green part is complicated!

From a JavaScript perspective, the main thing that can be understood from this diagram is that RTCPeerConnection saves web developers from potentially all kinds of complex issues.

A lot of work goes into the codecs and protocols used by WebRTC to enable real-time communication even over unreliable networks:

  • Hide packet loss
  • echo cancellation
  • bandwidth adaptability
  • Dynamic jitter buffering
  • automatic gain control
  • Noise reduction suppression
  • Image cleaning

The previous W3C code showed a simplified example of WebRTC from a signaling perspective, and below is a walkthrough of two working WebRTC applications. The first is a simple example demonstrating RTCPeerConnection, the second is a full-featured video chat client.

RTCPeerConnection without servers

The following code is taken from WebRTC samples Peer connection and has local and remote RTCPeerConnection (and local and remote video) on the same web page. This isn't very useful in reality - the caller and callee are on the same page, but it does make it clearer how the RTCPeerConnection API works, since the RTCPeerConnection objects on the page can exchange data and messages directly without having to use an intermediary. Order mechanism. In this example, pc1 represents the local peer (the calling party) and pc2 represents the remote peer (the called party).

1. Create a new one  RTCPeerConnection and  getUserMedia add multimedia streams from Get

// Servers is an optional configuration file. (See TURN and STUN discussion later.)
pc1 = new RTCPeerConnection(servers);
// ...
localStream.getTracks().forEach((track) => {
  pc1.addTrack(track, localStream);
});

2. Create an offer and set it to the local description of pc1 and the remote description of pc2. This can be done directly in code without using signaling since both caller and callee are on the same page:

pc1.setLocalDescription(desc).then(() => {
      onSetLocalSuccess(pc1);
    },
    onSetSessionDescriptionError
  );
  trace('pc2 setRemoteDescription start');
  pc2.setRemoteDescription(desc).then(() => {
      onSetRemoteSuccess(pc2);
    },
    onSetSessionDescriptionError
  );

3. Create pc2, and after adding the stream from pc1, display it in the video element:

pc2 = new RTCPeerConnection(servers);
pc2.ontrack = gotRemoteStream;
//...
function gotRemoteStream(e){
  vid2.srcObject = e.stream;
}

RTCPeerConnection API plus servers

In the real world, WebRTC requires a server (no matter how simple), so the following can happen:

  • Users discover and exchange real-world details with each other, such as names.
  • WebRTC client applications (peer-to-peer) exchange network information.
  • Peers exchange data about the media, such as video format and resolution.
  • WebRTC client applications penetrate NAT gateways  and firewalls.

In other words, WebRTC requires four types of server-side functionality:

  • User discovery and communication
  • Send signal
  • NAT / firewall penetration
  • Relay servers and forwarding servers for failed attempts when peer-to-peer communication fails

NAT traversal, end-to-end peer-to-peer networking, and the requirements for building server applications for user discovery and signaling are outside the scope of this article. It can be said that the ICE framework uses  the STUN  protocol and its extension TURN to enable RTCPeerConnection to cope with NAT traversal and other network issues. ICE is a framework for connecting peers, such as two video chat clients. Initially, ICE attempts to connect peers directly via UDP with the shortest possible latency. During this process, the STUN server has only one task: to enable peers behind the NAT to find its public address and port. For more information about STUN and TURN, see Build the backend services needed for a WebRTC  app .

If UDP fails, ICE tries to use TCP. If the direct connection fails - especially due to enterprise NAT traversal and firewalls - ICE uses intermediate (relay) TURN servers. In other words, ICE first uses STUN with UDP to connect the peer directly, and then, if that fails, falls back to a TURN relay relay server. Finding candidates refers to the process of finding network interfaces and ports.

WebRTC engineer Justin Uberti   provided more information about ICE, STUN, and TURN in the 2013 Google I/O WebRTC presentation . (The presentation  slides  provide examples of TURN and STUN server implementations.)

 The video chat demo on appr.tc is a great place to try out WebRTC (using a STUN server for signaling and NAT/firewall traversal). The application uses adapter.js to isolate application specification changes and prefix differences. The code is deliberately verbose in its documentation. Check the console for the sequence of events. Below is a detailed walkthrough of the code.

If you find this a bit baffling, you might prefer (my translation of) WebRTC codelab /(original link) WebRTC codelab .
This step-by-step guide shows how to build a complete video chat application, including a simple signaling server running on a node server.

Network topology

WebRTC as currently implemented only supports one-to-one communication, but can be used in more complex network scenarios, such as with multiple peers, each communicating directly with each other or via a Multipoint Control Unit for  short  . MCU – A server that can handle large numbers of participants and perform selective stream forwarding and audio and video mixing or recording.

Many existing WebRTC applications only demonstrate communication between web browsers, but a gateway server enables WebRTC applications running on browsers to interact with devices, such as phones (also known as PSTN) and VOIP systems. In May 2012, Doangbango Telecom open sourced  a sipml5 SIP client built using WebRTC and WebSocket, which (among other potential uses) allows video calls between browsers and apps running on iOS and Android. At Google I/O, Tethr and Tropo demonstrated in a briefcase a disaster communications framework that uses OpenBTS units to enable communication between feature phones and computers over WebRTC. No carrier phone communication! (Original text:  a framework for disaster communications  in a briefcase  using an  OpenBTS cell  to enable communications between feature phones and computers through WebRTC.)

RTCDataChannel API

In addition to audio and video, WebRTC also supports real-time communication of other types of data. The RTCDataChannel API enables low-latency and high-throughput peer-to-peer exchange of arbitrary data. For a single-page demonstration, and to learn how to build a simple file transfer application, see the   WebRTC samples  and  WebRTC codelab respectively . There are many potential use cases for this API, including:

  • game
  • remote desktop application
  • Live text chat
  • file transfer
  • decentralized network

The API has several features to fully leverage RTCPeerConnection and enable powerful and flexible peer-to-peer communication:

  • Using RTCPeerConnection for session settings
  • Multiple simultaneous channels with priority
  • Reliable and unreliable delivery semantics
  • Built-in security (DTLS) and congestion control
  • Ability to work with or without audio or video

The syntax is intentionally designed to be similar to WebSocket, with a send method and message events:

const localConnection = new RTCPeerConnection(servers);
const remoteConnection = new RTCPeerConnection(servers);
const sendChannel = localConnection.createDataChannel('sendDataChannel');

remoteConnection.ondatachannel = (event) => {
  receiveChannel = event.channel;
  receiveChannel.onmessage = onReceiveMessage;
  receiveChannel.onopen = onReceiveChannelStateChange;
  receiveChannel.onclose = onReceiveChannelStateChange;
};

function onReceiveMessage(event) {
  document.querySelector("textarea#send").value = event.data;
}

document.querySelector("button#send").onclick = () => {
  var data = document.querySelector("textarea#send").value;
  sendChannel.send(data);
};

Communication is directly between browsers, so RTCDataChannel can be much faster than WebSocket even if a relay (TURN) server is required when hole punching firewalls and NAT fails. RTCDataChannel is available in Chrome, Safari, Firefox, Opera and Samsung Internet. Cube Slam  game uses API to communicate game state. Play as a friend or play as a bear! File sharing supported by Sharefest, an innovative platform for sharing via RTCDataChannel and peerCDN, makes it clear at a glance how WebRTC enables peer-to-peer content distribution. For more information about RTCDataChannel, check out the IETF's  draft protocol spec

safety

Real-time communication applications or plug-ins can compromise security in several ways. For example:

  • Unencrypted media or data can be intercepted between browsers or between browsers and servers.
  • Apps may record and distribute video or audio without the user's knowledge.
  • Malware or viruses may be installed along with apparently harmless plug-ins or applications.

WebRTC has several features to avoid these problems:

  • WebRTC implementation uses secure protocols such as  DTLS  and  SRTP .
  • Encryption is required for all WebRTC components, including signaling mechanisms.
  • WebRTC is not a plug-in. Its components run within the browser sandbox rather than in separate processes. Components do not need to be installed separately and can be updated simply by updating your browser.
  • Access to the camera and microphone must be explicitly granted, and the user interface clearly shows this when the camera or microphone is running.

A complete discussion of streaming security is beyond the scope of this article. For more information, see the IETF's proposed WebRTC Security  Architecture .

 

Developer tools Developer tools

WebRTC statistics for ongoing sessions can be found at (enter the following url in the blank tab):

  • chrome://webrtc-internals in Chrome
  • opera://webrtc-internals in Opera
  • about:webrtc in Firefox

Get more Learn more

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/a360940265a/article/details/114445340