Zero-based Quick Start WebRTC: Basic Concepts, Key Technologies, Differences from WebSocket, etc.

1. Content overview

This article is a quick guide specially written for beginners who learn open source real-time audio and video engineering WebRTC.

This article mainly shares the basic concepts and key technical terms of WebRTC (including NAT, STUN, TURN, ICE, SDP and signaling), focuses on how WebRTC implements P2P communication and the role of WebRTC signaling, and discusses the role of WebRTC in technology Advantages and disadvantages above, and finally a simple WebRTC Demo code is provided.

2. What is WebRTC?

WebRTC (full name Web Real-Time Communication), that is, web real-time communication. It is a technical solution that supports web browsers to conduct real-time voice conversations or video conversations. From the perspective of front-end technology development, it is a set of callable API standards. This technology can enable many different applications, such as video conferencing, file transfer, chat and desktop sharing, etc. without additional plug-ins.

Before the release of WebRTC, the cost of developing real-time audio and video interactive applications was very expensive, and there were many technical issues to consider, such as audio and video codec issues, data transmission issues, delay, packet loss, jitter, echo processing and elimination, etc. , if you want to be compatible with real-time audio and video communication on the browser side, you need to install additional plug-ins.

May 2010: Google acquired the GIPS engine of VoIP software developer Global IP Solutions for US$68.2 million, and changed its name to "WebRTC" (see "The Great WebRTC: The Ecosystem Is Increasingly Perfect, or Real-time Audio and Video Technology Cabbage " ). It aims to establish a platform for real-time communication between Internet browsers, making WebRTC technology one of the H5 standards.

January 2012: Google has integrated this software into the Chrome browser, and Opera has initially integrated WebRTC.

June 2013: Mozilla Firefox released version 22.0 to officially integrate and support WebRTC.

November 2017: The W3C WebRTC 1.0 draft is officially finalized.

January 2021: WebRTC was published as an official standard by W3C and IETF (see " WebRTC 1.0: Real-Time Communication Between Browsers ").

As of now, WebRTC is completely open source and free. It uses the RTP protocol to transmit audio and video, and supports browsers such as Chrome, Mozilla, Opera, Microsoft Edge, and Android browsers.

(The content of this section is quoted from the article " Introduction to Real-time Audio and Video: Analysis of the Technical Principles and Use of Open Source Engineering WebRTC ")

★The business card at the end of the article can receive audio and video development learning materials for free, including (FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmaps, etc.

see below!

3. Why do you need WebRTC?

3.1 Why do we build WebRTC?

The reason for its establishment is that people need a standard, low-latency way to transfer media data (video & audio).

The so-called "standard" means that we need an easy-to-use API. The so-called "low-latency" means that a suitable protocol is required, and UDP is obviously a good choice, because UDP does not have too many response processes (Acknowledgment). But the protocol we need is better than UDP and can support P2P communication. Because once you rely on the server to deliver content, additional delays will be introduced due to reverse proxy or penetration, and users need to perform operations such as termination, observation, processing, and conversion streams, which will cause additional consumption. For video transmission, especially live broadcast, conversation and other scenarios, users want the content to arrive as soon as possible, so P2P is the fastest path.

In addition, WebRTC also aims to enable rich communication between browsers. The browser has been around for a long time, it "has" a lot of good resources, it has access to the camera and microphone, and these features are worth exploiting. Users do not need to write their own applications, but can easily use the standard API based on WebRTC. Not only browsers, but also when mobile devices and IoT devices communicate.

3.2 What exactly happened in WebRTC?

For example: A wants to communicate with B, but A and B "don't know each other". So A first needs to find all ways that Public (not B) can connect to it, check whether A has a public IP that can be recognized or used by Public, if not, check whether A's router allows public port forwarding rules, whether there is public representatives and more. B does the same thing as above.

In addition: A and B will also collect a lot of information such as encryption methods, security parameters, and video codecs they support. Note that this information has not been sent to the other end, and it is only collected extensively at this stage. All of this information constitutes the "SDP".

Next: A and B will send session information through other methods (it can be WhatsApp, QR, Tweet, WebSockets, HTTP Fetch...), and WebRTC does not care about the specific method, as long as it can be from A to B (B to A )can.

This way of working is a bit "stupid" on the surface, and some people may think "Since I already have a communication line between A and B, what do I need WebRTC to do?" But if you think about it carefully, you can find that, In WebRTC, as long as the two parties exchange SDP for the first time, real P2P communication will be realized later, and there is no need for intermediate channels such as WhatsApp, QR, etc., and there will be no faster communication path than this. So in the end A is connected to B through the optimal path, which is the workflow of WebRTC.

This example is explained in more detail as follows:

As shown in the figure above: Assume that A finds A1, A2, A3 three ways to access it, and also finds information such as security parameters and media options. Meanwhile, B does the same job. Next, they exchanged the above information through some means such as WhatsApp. Then A finds that B2 is the best path available, and B also finds that A1 is the best path available, then the two will directly connect to each other through this path.

Essentially this is how WebRTC works:

4. Key technologies and concepts of WebRTC

4.1 Overview

Next, we will have an initial understanding of the key technologies and concepts of WebRTC, and describe the details.

First, let's understand the details of NAT (learn how WebRTC performs correct network address translation), and then understand why we need STUN and TURN, and also introduce ICE, SDP, and signaling exchange.

4.2NAT(Network Address Translation)

If you have a public IP address, the connection process will have no problem. Because you will always listen to the port like a web server, after providing the port and IP to the other party, you can directly connect with it. But in most cases, users are hidden behind public networks and cannot connect directly.

In the example shown in the figure below: the router has a Public IP 5.5.5.5 and a Private IP 10.0.0.1 (also known as gateway), your machine only has a Private IP 10.0.0.2, but you want to access the IP as 4.4.4.4:80 machine, how to realize it?

First: your machine will construct a packet stating that it wants to make a GET request to 4.4.4.4:80, with 10.0.0.2 being the source IP address. Next: Your machine will judge whether it can directly connect to 4.4.4.4:80 through the subnet mask, and the calculation result will show that 4.4.4.4:80 is not in your subnet, so it cannot communicate directly. So the next step is to send the request to the router and communicate with the gateway. The router will replace the source IP address and port with Public IP and a random port, but before that, it will create a NAT table to record the correspondence between the three. In this way, the peer can receive your GET request and perform subsequent processing (as shown in the figure below).

After this: the server 4.4.4.4:80 will send a reply to your machine, the working principle is the same as above, query the corresponding address according to the NAT table to complete the communication (as shown in the figure below).

NAT conversion methods mainly include the following types:

1) One-to-One NAT (Full Cone NAT): One to One NAT (Full-cone NAT) The data packets to be sent to the external IP:port on the router can always be mapped to the internal IP:port, without exception. For example, all packets sent to 5.5.5.5:3333 will always be automatically forwarded to 10.0.0.2:8992, no matter whether the packet comes from 4.4.4.4:80 or any other address.

2) IP restricted NAT: Address restricted NAT For security reasons, some routers will limit the address, considering whether it has communicated with the address before. That is, a data packet on the router to be sent to an external IP:port can be mapped to an internal IP:port, provided that the source address of the data packet matches the NAT table, it doesn't matter what the port is. For example, in the data packet sent to 5.5.5.5:3333, only the source IP is 4.4.4.4 or the IP that has been recorded in other tables will be automatically forwarded to 10.0.0.2:8992, even if this IP is not the same as before Communication on port 3333.

3) Port restricted NAT: Compared with the former, Port restricted NAT increases the port restriction, that is, the data packet to be sent to the external IP:port on the router can be mapped to the internal IP:port, provided that the source IP of the data packet and Port must match the NAT table. For example, in the data packet sent to 5.5.5.5:3333, only the IP: Port from 4.4.4.4:80 or other records in the table will be automatically forwarded to 10.0.0.2:8992, even if this IP: Port It was not the communication with port 3333 before.

4) Symmetric NAT: Symmetric NAT This method is the most restrictive one, that is, it must match the complete IP:port, the difference is that in the data packets sent to 5.5.5.5:3333, only those from 4.4.4.4:80 will be passed It is automatically forwarded to 10.0.0.2:8992, and other packages cannot pass through. This method cannot be used in WebRTC, because WebRTC requires a STUN server. Once the STUN server establishes a Public representative, Symmetric NAT requires that it can only communicate with a specific peer, which is not suitable for WebRTC.

By default: WebRTC can support the first three NAT methods, but is not friendly to the last one. In fact, more than 90% of the communication is done through the first three methods, and the last one is considered useless by the author.

If you are interested in P2P technology, you can continue to read the following articles in depth:

" P2P Technology Detailed Explanation (1): NAT Detailed Explanation - Detailed Principle, P2P Introduction "

" Detailed Explanation of P2P Technology (2): Detailed Explanation of NAT Traversal (Hole Punching) Scheme in P2P (Basic Principles) "

" P2P Technology Detailed Explanation (3): Detailed Explanation of NAT Traversal (Hole Punching) Scheme in P2P (Advanced Analysis) "

" Easy to understand: quickly understand the principle of NAT penetration in P2P technology "

4.3STUN(Session Traversal Utilities for NAT)

STUN can be assigned the Public IP and Port required by an application. It is suitable for Full-cone, Address restricted and Port restricted NAT, but cannot be used for Symmetric NAT.

STUN servers usually run on port 3478 and TLS port 5349.

STUN is very lightweight, users can use docker to build a STUN server.

The purpose of the STUN server is to allow users to find their own public representation and communicate with other users through this public representation. If we were using Public IP addresses like circa 1996 or early 2000, communication would be very simple. But for now, we have to use a STUN server.

The workflow of the STUN server is shown in the figure below:

First create a data packet for STUN request: the address of the STUN server is 9.9.9.9:3478, also create a NAT table on the router and perform address translation, and then the data packet is sent to the STUN server.

After the server receives the request, it constructs a public representation 5.5.5.5:3333 for the 10.0.0.2 machine, and packs this information into a data packet for feedback.

The above is a detailed process of a STUN request. The following figure is an example. STUN performs the following tasks in the entire communication process: firstly, give the machine at 10.0.0.2 a Public to indicate 5.5.5.5:3333, and at the same time give the machine at 192.168.1.2 a Public Indicates 7.7.7.7:4444. Both then connect using the obtained Public representation.

It's worth noting: the two haven't communicated before. If it is Full-cone NAT, then there is no problem to connect. If it is Address restricted NAT, the first request to connect will fail. In this case, the user needs to establish at least one communication request through the server, so that both addresses can be saved in the routers at both ends, so that the matching address can be found when making a connection request through the Public representation again, and then the connection can be completed . Port restricted NAT works similarly.

The following are some of the public servers provided by Google, interested can refer to:

http://stun1.l.google.com:19302
http://stun2.l.google.com:19302
http://stun3.l.google.com:19302
http://stun4.l.google.com:19302
http://stun.stunprotocol.org:3478

4.4 TURN(Traversal Using Relays around NAT)

In the case of applying Symmetric NAT, TURN must be used.

All communication content must be forwarded by the TURN server, so the maintenance cost of the TURN server is relatively high, which is why almost no one provides such a server for free for users.

The figure below is an example of the workflow of a TURN server. There is no direct P2P communication between the two, and all information is forwarded by the TURN server.

Here is an open source library that can also help you create your own TURN server, the address is: https://github.com/coturn/coturn .

4.5ICE(Interactive Connectivity Establishment)

After many STUN and TURN servers have been established, there are many paths from A to B. In order to better handle these paths, ICE is proposed.

ICE will collect all available communication paths as "candidates" (ICE Candidates), which may be local IP addresses, addresses provided by STUN and TURN servers, and so on. All the addresses collected will be put into SDP, and then sent to the peer end, and the peer end will understand the important information provided by us by analyzing the SDP.

Therefore, ICE is a very critical component of WebRTC.

Finally: If you want to know more about STUN, TURN, ICE, you can read: " P2P Technology Detailed Explanation (4): P2P Technology STUN, TURN, ICE Detailed Explanation ".

4.6SDP(Session Description Protocol)

SDP is a format for expressing ICE Candidates, which describes network options, media options, security options and many other information. Developers can even customize SDP content.

In fact, SDP is not a protocol, but a data format, but SDP is one of the most important concepts in WebRTC. Its design purpose is to send the SDP generated by the user to other ends, and the way of sending does not matter.

4.7Signaling (signaling exchange)

The Signaling process is to pass the SDP generated by the user to the party who wants to communicate in some way.

As mentioned above, it doesn't matter in what way. Many people   pass SDP information through WebSocket or socket.io , and this process is Signal SDP.

Although it is time-consuming to find all ICE candidates, once the process is completed, the next step is to create an SDP, generate a QR code and publish the QR code to twitter, and others can scan the QR code Get the corresponding SDP.

It doesn't matter whether this process is through twitter, QR code, Whatsapp, WebSockets, or HTTP request, because it is actually passing a long string to others.

In short, Signaling is to pass SDP information to another party.

4.8 Summary

A typical WebRTC communication process is as follows:

  • 1) A wants to establish a connection with B;
  • 2) A creates an offer, which searches for all ICE candidates, security options, audio and video options, etc. and creates an SDP (in simple terms, this offer is an SDP);
  • 3) A transmits the SDP signaling to B (Signaling);
  • 4) B sets according to A's offer and creates an answer (answer);
  • 5) B transmits the Answer signaling to A (Signaling);
  • 6) The connection is established.

5. WebRTC signaling (Signal)

5.1 What is signaling

Signaling is the process of configuring, controlling, and ending an inter-user communication session. Peer-to-peer communication (ie P2P) requires signaling to establish.

If the two ends want to communicate, three signaling steps are mainly required:

  • 1) Share session control information;
  • 2) Exchange network information such as IP addresses and ports;
  • 3) Exchange user's codec and media format.

5.2 Why communication requires signaling

So why does communication require signaling:

  • 1) Session control information will control all establishment, disconnection, and sending of end-to-end connections;
  • 2) IP and port information are used to find the location of the user network layer;
  • 3) The codec and multimedia format are used to determine the resolution and multimedia settings established between users;
  • 4) All these settings are exchanged according to the SDP protocol (Session Description Protocol).

5.3 Why WebRTC needs signaling

If two users want P2P communication, an additional server is needed between the two ends to exchange initial data to set up the WebRTC connection. This server is called a signaling server.

After the signaling process is over, all multimedia data will be exchanged end-to-end through RTCPeerconnection.

Knowledge points:

  • 1) The signaling server only helps WebRTC exchange metadata to establish a connection, and does not really affect the WebRTC process;
  • 2) The signaling server can be built by any server technology, such as  WebSocket , socket.io , SIP, etc.;
  • 3) RTCPeerConnection is an API used by WebRTC to establish connections and communicate between users.

5.4 How to enable users to realize P2P communication

Users want to obtain their own public IP addresses. Because of NAT and firewalls, it is very difficult for two users to communicate directly. Therefore, it is necessary to use the ICE framework with the STUN and TURN protocols to solve these problems and achieve end-to-end connections.

The role of STUN: If a user has a LAN IP address behind the NAT, it is difficult to contact the user from outside the LAN, then the user can obtain his public network IP through the STUN server, so that other public networks can users to penetrate NAT and connect to him.

The role of TURN: The method used by STUN will fail when faced with symmetric NAT, so the TURN protocol needs to be used. But the problem with TURN is that STUN is no longer needed after the connection is established, while TURN needs to exist throughout the session.

5.5 Is WebRTC signaling necessary?

WebRTC allows users to communicate directly in P2P, but there is no way for one user to find another user (such as IP address, etc.). So users can also use SDP request and SDP reply, only need to have a signaling server.

5.6 SDP Requests and Responses

Before two ends can wish to communicate directly, they must both have a connection to a signaling server so that both ends can share SDP information.

SDP requests and replies include user information about audio, video, encoders, etc. A user sends an initial SDP request to create a multimedia communication session, and the peer can choose to create an SDP reply to accept or reject the SDP request after receiving it.

6. The architectural principle of WebRTC

The following figure is a schematic diagram of a simple WebRTC connection architecture:

In the connection stage, the user uses the signaling server to communicate indirectly to establish a connection. After the connection is established, the two users communicate directly through the audio and video channel.

The following figure is a detailed version of the schematic diagram of the WebRTC connection architecture:

As shown in the figure above: It can be seen that two users want to establish a WebRTC connection, and both ends can connect to the same signaling server before directly establishing a connection, and exchange SDP information through this server. After the SDP request and reply exchange, both users can obtain information such as their IP addresses and audio and video configurations. Then you need to use TURN or STUN server to penetrate NAT to achieve direct WebRTC connection between users.

7. Advantages and disadvantages of WebRTC

7.1 Advantages

1) P2P communication is awesome: there can be reduced latency for high bandwidth content. P2P is the fastest path and does not need to communicate through other third parties. Even if the transmission through the Internet has to go through a large number of routers, if the content has been encrypted, all routers will not view the content, they will directly pass the data packets, so P2P is a very good communication method. For high-bandwidth content, which is "feeded" and "pushed" directly via UDP, users will reap the best performance by delivering such content (especially video content) via P2P UDP.

2) Standard available API: WebRTC has a set of very standard and elegant API, which can be directly applied in the browser without installing other packages or using redundant development tools.

7.2 Disadvantages

1) Need to maintain STUN and TURN server: In some cases P2P can't work, you still need a TURN server. But maintaining STUN and TURN servers requires a lot of manpower and material resources, especially TURN servers. Because you first have to spend money to maintain a Public IP, and you must maintain this server so that it can start and run normally. The author personally thinks that instead of spending this kind of price, it is better to build a server with full control rights and perform reverse proxy.

2) P2P breaks down with too many participants: Suppose 100 people want to communicate with each other, would you create a P2P connection? That would be hundreds by hundreds of connections, because everyone needs to be connected to every other user, which would be very large. But if you have a centralized server, each user only needs to establish a connection with this server, and you can control all traffic through this server, this is obviously a better way. So sometimes WebRTC can't be used in games. You can't use WebRTC to create a multi-user game. Of course, 3 users are possible, but the author thinks that it is impossible to achieve hundreds of users.

8. The difference between WebSocket and WebRTC

8.1 Different design intentions

There are two main transport channels for browser communication: HTTP and WebSockets. The function of WebSocket is to realize the two-way mechanism communication of the browser.

For HTTP: mainly used to obtain web content, text or pictures, etc., it is a customer service type protocol, in which the browser is the client, and the web server is the server;

And for WebSocket: the browser connects to the web server through a WebSocket, which is also a C/S type protocol like HTTP. But HTTP is a one-way channel, while WebSocket is two-way, which means that the connection between the server and the client can be maintained until the two actively disconnect.

WebSocket is mainly used for real-time web applications and IM chat applications.

The characteristics of WebRTC compared with WebSocket are:

  • 1) WebRTC is designed for high-quality audio and video real-time communication;
  • 2) The browser end-to-end communication provided by WebRTC is much lower than the service delay provided by WebSocket.

8.2 Differences in Implementation

Mainly two points:

  • 1) WebRTC uses the UDP protocol, while WebSocket uses the TCP protocol;
  • 2) WebRTC can simultaneously provide high-quality and low-latency streaming.

8.3 WebRTC actually uses WebSocketk

In fact, WebRTC also uses WebSocket, but it is used to build the signaling mechanism of WebRTC. However, after the connection is established, because WebRTC is an end-to-end connection, no additional server is required.

9. A simple WebRTC Demo

In order to cooperate with the understanding of the content of this article, a simple WebRTC Demo code is prepared here ( click here for the download address ).

This Demo program can achieve:

  • 1) Communication between two browsers (browser A and browser B);
  • 2) A creates an offer (SDP) and sets it as a local description;
  • 3) B receives an offer and sets it as the remote description;
  • 4) B creates an answer and sets it as a local description, and passes it to A;
  • 5) A receives the answer and sets it as the remote description;
  • 6) Establish a connection, establish a data channel, and exchange data.

 

The original zero-based quick start WebRTC: basic concepts, key technologies, differences with WebSocket, etc. - Know about 

★The business card at the end of the article can receive audio and video development learning materials for free, including (FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmaps, etc.

see below!

 

Guess you like

Origin blog.csdn.net/yinshipin007/article/details/132569652