webRtc technology and application overview

Web Real-Time Communication (WebRTC) is a very popular new browser-based technology in recent years. Many VoIP manufacturers and application integration manufacturers have gradually supported WebRTC technology in their solutions. WebRTC technology realizes video and voice functions by applying it to browsers or mobile terminals and combining it with API interfaces. Of course, WebRTC has received so much attention, and it is of course inseparable from the strong support of big-name manufacturers such as Google, Microsoft, and Mozilla, the main promoters, as well as the assistance of several well-known protocol organizations, such as W3C and IETF.

Although there are many articles about WebRTC on the Internet, these articles introduce WebRTC in great detail from different angles. The official WebRTC website also publishes a lot of documents. However, many online articles are scattered and discuss from different angles. In addition, many authoritative documents and paper books are basically published in English. If some readers’ English reading skills are not that high, it may affect their digestion and absorption of technology. In order to provide Chinese readers with a more comprehensive and complete summary of WebRTC technology and applications, the author hopes to make a comparative and systematic introduction to WebRTC technology through a complete length. The content includes WebRTC background knowledge, media streaming , related protocol stacks, NAT processing, security and stealth settings, current problems and future improvements of WebRTC, WebRTC user usage scenarios, and open source WebRTC media servers and video conferencing, WebRTC testing tools and other knowledge points, so that ordinary readers can pass through a This article has a very clear idea about WebRTC technology, and provides an effective foundation for further learning WebRTC technology. Readers can quickly enter the real WebRTC technology application development. In eleven chapters, the author will give a complete introduction based on the architecture of WebRTC technology and related applications.
1.Introduction to the technical background of WebRTC

First let us briefly understand the basic communication background knowledge. If we look at the development of real-time communication and voice protocols, the earliest voice communication protocol should have occurred in 1977. People applied real-time communication technology on the network through Network Voice Protocol (NVP-rfc741) and demonstrated the usability of its technology. In the process of voice development, real-time communication has also gone through multiple historical stages, and has gradually achieved breakthroughs in combination with other technologies. The following is a partial stage of the development process of voice technology.

As shown in the example below, the initial working model was relatively simple. With the continuous improvement of technology and modification of protocols, today's voice technology has made great breakthroughs. For specific specifications of NVP, readers can check rfc741 for details.

When talking about real-time communication technology or WebRTC technology, we would also like to briefly introduce the real-time transmission protocol RTP. This technology was first used around 1992 and was released as a standard in 1996. Currently RTP is a part of VoIP, SIP or WebRTC.

In addition to the RTP protocol, H323 and SIP protocols are also background knowledge that needs to be introduced before we discuss WebRTC. H323 was released by the ITU in 1996, and SIP was released by the IETF in 1999. In the field of voice and video in recent decades, these two protocols have played a very important role in voice and video technology. Of course, the SIP protocol is now recognized by users and the market, and H323 users are gradually decreasing.

WebRTC is popular for many reasons, which we will introduce in the following chapters. The main reason is its ease of use, and it can borrow other media devices of the current user's browser, such as microphones and cameras, and directly access these network resources through the browser's API interface. Users do not need to install and download other plug-ins to obtain access to Network resource support. WebRTC can also achieve point-to-point network interaction and avoid network access problems with remote servers. Especially in the VoLTE network environment, voice can be realized through the data channel, which will greatly facilitate the end user's voice and video communication. In addition, many online games now can also display game scenes through the browser. Users can interact with classmates and friends at the same time through voice, data and video.

Now, let’s briefly introduce the functional implementation of WebRTC. The functions of WebRTC include the above core modules and API interfaces. The user's browser makes calls through interfaces with HTML, other scripting languages, and clients. Pay special attention to the RTC function of the browser, which includes transmission encoding, echo processing and other functions. Other media data can be communicated with WebRTC through the RTC function.

There are many reasons why WebRTC is recognized by the market. It mainly includes the following reasons:

Platform and device independent. Developers can develop various applications based on WebRTC through browsers that support WebRTC without worrying about compatibility issues at the terminal and operating system levels. In addition, WebRTC also provides a standard API (W3C) and its standard protocol support (IETF) to avoid platform compatibility issues.

Security processing of voice and video, WebRTC encrypts voice and video through SRTP. Users who use a browser to log in to access voice and video require relatively secure settings, which meet the security requirements of the user scenario (for example, voice and video in an unsecured Wi-Fi environment), and others cannot monitor it.

Supports advanced language and video processing, WebRTC supports the latest encoding, voice supports Opus, and video supports VP8. The built-in encoding eliminates the security risks of other third-party downloads, and can support the adjustment of the network environment to achieve better voice or video quality.

Supporting the creation of reliable transmission, WebRTC provides a reliable transmission method, including the stability of transmission that can still be achieved in a NAT environment.

Supporting multimedia stream processing, WebRTC provides the aggregation of multimedia and multiple resources, and provides the expansion of RTP and SDP.

Supports adjustment of different network environments. Because WebRTC is executed on the network platform, it is very sensitive to the network environment and bandwidth. It can detect and adjust the network environment and bandwidth requirements by itself to avoid network congestion. It guarantees this functionality through RTCP and SAVPF.

It has good compatibility with VoIP voice and video. WebRTC implements compatibility operations with other media, including SIP, Jingle and XMPP docking. At the same time, if you need to interface with other traditional protocols, you can use the WebRTC gateway to achieve smooth compatibility and ensure compatibility with traditional protocols.

Using WebRTC has the following benefits for developers and users:

Developers can achieve seamless integration without worrying about platform compatibility.

Developers can use simple API interfaces to implement application development.

Developers don't need to worry about problems caused by NAT.

Developers can use more advanced coding resources without incurring commercial license fees.

Users can use it without installation.

All user communications are encrypted.

Users can achieve reliable transmission.

Users can use HD voice and video.

Users can choose more real-time communication methods.

Next, we briefly introduce the famous triangle topology example of WebRTC:

The above example is a very common application process diagram. Users can go to the official website to obtain other instructions for its process. In particular, in the voice communication environment, many users may be concerned about how to implement network aggregation with SIP and PSTN. Let’s list a few more examples of integration solutions related to the voice environment.

If WebRTC implements PSTN calls, it will actually go through SIP/PSTN gateway conversion and can support FXO/FXS or E1 access methods.

If some IPPBX (older versions of IPPBX) do not support WebRTC, or in order to avoid problems caused by WebRTC docking, you can also dock traditional SIP/IPPBX through WebRTC gateway, and then implement the application of IPPBX + WebRTC gateway + browser WebRTC application Scenes. The author used FreePBX-2.5 combined with portSIP WebRTC gateway to implement a case two years ago.

In the above example, the IPPBX uses the FreePBX open source enterprise IPPBX. PSTN access can be achieved using a voice board or PSTN voice gateway or wireless gateway, and the connection with the browser terminal is achieved through the portsip WebRTC gateway. Because of the client's requirements, the access party used Dingxin Tongda's wireless gateway implementation and used a SIM card to make direct calls.
2.WebRTC media stream processing

In a WebRTC environment, every terminal is different and has its own way of accessing it. The following examples illustrate the process of WebRTC media stream processing on various terminals. Some terminals may be in a home network environment, some terminals may be in a company intranet environment, and some terminals may use wifi to access the Internet in a cafe. The application server is in a public network environment.

If in a normal network environment, without WebRTC, the communication between the two terminals can only be routed and exchanged through the page server. However, if there are network stability problems or the distance between the server and the terminal is relatively long, it will be difficult to guarantee real-time communication between the terminals.

If the browser supports WebRTC, routing between two terminals can be performed without going through the server. At the same time, the NAT problem can be solved. Point-to-point communication can be directly achieved between terminals, thus ensuring the stability of real-time communication.

In the introduction above, we discussed the NAT issue in WebRTC. Regarding the NAT issue, we have mentioned it many times in many previous chapters, so we will not explain NAT too much here. Today, we focus on NAT issues in WebRTC. WebRTC's built-in policy mechanism (Interactive Connectivity Establishment) is used to solve the NAT problem. In point-to-point communication, ICE achieves point-to-point communication by drilling holes. Here, the main purpose of ICE is to find the best path to connect between two terminals through transit between different servers. In most cases, ICE can achieve point-to-point intercommunication using STUN. Sometimes it also needs to be forwarded through a TURN server. ICE's detection and pairing of Peers requires six steps. rfc5245 has the following definition for ICE detection:

1. Sort the candidate pairs in priority order.
2. Send checks on each candidate pair in priority order.
3. Acknowledge checks received from the other agent.

In the fifth step, the browser needs to check the STUN data at the same time. As shown below:

The STUN server query process is as follows:

When the STUN server cannot query two terminals, it needs to use the TURN server to achieve this. Readers must note here that the strategies used may be different in different NAT scenarios. We only describe the symmetric NAT scenario here.

The following is a simple comparison between users using STUN and TURN to help readers understand the role and deployment costs of these two servers more clearly. Regarding the use of ICE and specific parameter attributes, the author will give a very detailed introduction in subsequent chapters. Users can also consult historical documents for further learning. The author here reminds me again that during the WebRTC call process, most call failures are due to ICE negotiation issues. The above six steps need special attention when troubleshooting.
3.WebRTC related protocols

WebRTC supports many RFC standards. These organizations have completed standard drafting, API definitions and some related extension protocols for WebRTC. Among them, three organizations require readers to pay attention, they are IETF, W3C and RTCWEB. They all have their own official websites that readers can check. The protocol stacks used by WebRTC technology include the following. Readers can only focus on the application layer and transport layer. These protocols have their own specification definitions in RFC. The more important thing is the ICE specification. For the ICE specification, users can check rfc5245.

Because of the continuous development of converged communications, the interoperability between WebRTC and SIP has become very important, and in enterprise converged communications, it is necessary to access the functions of PSTM or enterprise UC. Therefore, we will spend more time discussing the relationship and applications between WebRTC and SIP.

4.WebRTC related drafts

The development of any technology is inseparable from the promotion of some organizations, which have completed the standardization of technical specifications. In the above chapter, we mentioned the RTCWEB working group. This organization is drafting some new drafts on some functions of WEBRTC, and has not yet formed a formal rfc specification. These drafts are:

Real Time Protocols for Browser-based Applications Web Real-Time Communication Use-cases and Requirements Web Real-Time Communication (WebRTC): Media Transport and Use of RTP WebRTC Security Architecture Security Considerations for WebRTC WebRTC Data Channels WebRTC Data Channel Establishment Protocol JavaScript Session Establishment Protocol WebRTC Audio Codec and Processing Requirements STUN Usage for Consent Freshness Transports for RTCWEB

In addition to the above drafts, RTCWEB has also cooperated with other organizations to write other protocol standards. Currently, these standards include:

MMUSIC, this draft defines SDP extension and ICE extension support.

AVTCORE, this draft defines RTP extension support.

RMCAT, this draft defines RTP congestion control support.

TRAM, this draft defines extended support for STUN and TURN.

In the draft of RTCWEB, a variety of user scenarios and their definitions are listed, including browser-to-browser user scenarios and browser-to-network testing user scenarios:

Simple Video Communication Service
Simple Video Communication Service, NAT/Firewall that blocks UDP
Simple Video Communication Service, Firewall that
only allows traffic via a HTTP Proxy
Simple Video Communication Service, global service provider
Simple Video Communication Service, enterprise aspects
Simple Video Communication Service, access change
Simple Video Communication Service, QoS
Simple Video Communication Service with screen sharing 
Simple Video Communication Service with file exchange
Hockey Game Viewer
Multiparty video communication
Multiparty on-line game with voice communication
Telephony terminal
Fedex Call
Video conferencing system with central server

5.WebRTC media protocol stack expansion function

In this chapter, we will introduce several key concepts in the media protocol expansion function of WebRTC. First, let’s introduce the first important concept, RTP header.

In the header, everyone needs to pay attention to the parts marked in red and related values. For example, Sequence Num detects incorrect sequence numbers. If an incorrect or exceeded number is detected, an error has occurred. If the voice playback is not smooth or coherent, the Timestamp may be out of sync or an error may occur. SSRC is used to confirm the data sent to the packet. If the packet is lost, CC will be counted cumulatively. Regarding the specific syntax structure of RTP header, users can refer to rfc for further study.

RTCP is another important protocol concept. RTCP is a protocol that controls and manages each RTP media session. In many cases, the RTCP port can be set in SDP (a=). If it cannot be set, RTCP uses an odd-numbered port higher than the RTP port (RTP port + 1). For example, if the RTP port is 7000, RTCP uses port 7001.

Here, RTP and RTCP will bind their corresponding media sessions, and both parties that generate data send voice quality data through RTCP. CNMAE includes the sender's data. Of course, the size of RTCP packets is also limited, generally limited to 5% of the RTP packet size. Each RTP profile sets the RTCP sending frequency, sending time, and RTCP sending rule requirements. Through such policy settings, RTP can ensure that within a certain network bandwidth, network resources will not be consumed too much.

RTP uses profiles to negotiate the communication between the two parties, and WebRTC uses an expanded profile to support WebRTC negotiation and RTCP mechanism processing. The following is a simple example to illustrate WebRTC's data negotiation.

In the above introduction, in each actual media stream, RTP and RTCP actually use different ports to handle their own services. However, sometimes users may encounter such things. There is no problem with the mutual transmission between RTP ports, but there may be a problem with the RTCP port. In WebRTC, in order to avoid the problem just mentioned, WebRTC uses a multiplexing method (rtcp mux), which uses one port to share the ports of RTP and RTCP, reducing the number of ports occupied. Of course, this may cause connection problems between WebRTC calls and SIP calls. Users may need to check the browser-side settings or server-side settings in actual usage scenarios. For example, in the Asterisk platform, pjsip supports rtcp_mux=yes to support WebRTC port negotiation.

The impact of multiplexing in WebRTC is also very obvious. Typically, voice and video are sent to each other over different RTP ports. In the WebRTC environment, WebRTC uses multiplexing technology to send all media streams through an RTP port. It may have other effects.

The advantages and disadvantages of the way WebRTC uses a single RTP port to handle media are very obvious, as shown in:

Reduced the number of ICE Candidates collected

Reduced ICE run time

Because there are fewer sessions, the chance of session failure is reduced.

It may increase the difficulty of QoS guarantee, because the receiver also needs to handle the SSRC and Payload of its voice and video differently.

Next, let's discuss the issues about RTP and NAT in WebRTC. As we all know, RTP does not directly use its own RTP itself, it requires UDP for transmission. But UDP ports are all dynamic. In order to reduce NAT port mapping, WebRTC requires the use of Symmetric RTP and Symmetric RTCP, which makes it easier to solve NAT problems. Symmetric RTP requires that both sending and receiving use the same RTP port. For specific specifications, you can refer to Chapter 3 of rfc4961. This chapter defines the two ports.

Media stream congestion is also a very big problem, which directly affects the quality of media streams. As we all know, congestion can be handled in TCP, but UDP does not support such a mechanism. If UDP does not support it, RTP cannot support the congestion handling mechanism. However, RTCP can monitor and feedback its congestion, thus solving the support problem of RTP congestion mechanism. If it is a video conference call, the bandwidth is a more sensitive issue. In RTCP data exchange, if network congestion occurs, the sender can reduce the bandwidth to avoid congestion. This is handled via a similar mechanism in WebRTC:

Circuit Breaker, if network congestion occurs, RTP should stop sending data packets. Specific setting strategies can be implemented through RTP/AVP profile.

The RMCAT method relies on the TRFC mechanism of TCP.

6.WebRTC signaling/transmission/protocol

In this chapter, we focus on the signaling, transmission methods and several protocols used by WebRTC. Now we briefly discuss the main functions of signaling:

Negotiated creation of media capabilities

In-session signing and authentication services

Control media sessions, direct session progress, modify or end session processes

Create and modify sessions for both parties simultaneously

Of the above four functions, the first one is a necessary function, and the others are optional functions. In WebRTC, simply put, there is no so-called standard signaling, and the interaction between the browser and the server is implemented through scripting language. For WebRTC developers, the minimum requirement is to support HTTP, support HTML and WebRTC. The rest is completely dependent on the developer's own needs.

In a WebRTC environment, browsers run on JavaScript. Whichever signaling is used on the server side can ensure compatibility between users. In the following example, servers A, B and C choose different signaling to ensure compatibility on the user side.

However, some signaling does need to maintain a standard interworking method, otherwise session negotiation errors will occur. This requires that the negotiation mechanism between browsers must be unified, the negotiation between browsers can work normally, and each other understands each other's media capabilities. Therefore, no matter what signaling is used on the server side, for each session between terminals, the encoding, media, and setting results must be standard, ICE must be interoperable, and SRTP keys must be interoperable. Among the signaling transmission methods, WebRTC supports three signaling transmission methods: WebSocket, HTTP and Data Channel.

In the above illustration, the server uses WebSocket for signaling transmission. In fact, WebSocket is accessed in a new HTTP method. The browser updates the request. In this new request, the HTTP connection is converted into a WebSocket access. Please note here that the WebSocket protocol is defined by the IETF, but the WebSocket API is defined by the W3C. In addition, two browsers cannot directly open a WebSocket to access each other.

WebRTC signaling can also be transmitted via HTTP. Each browser sends data to the server through an XML HTTP request. HTTP uses GET or POST to send signaling message data to the web server.

Once the initial signaling via WebSocket or HTTP is successfully established, the Data channel is successfully created and point-to-point media interaction is initiated. The Data channel will carry voice and video signaling. Because voice and video signaling are point-to-point communications through encrypted Data channels, security is also greatly enhanced.

Above we mentioned the issue of HTTP signaling interaction. We use the following example to illustrate how to implement simple SDP media interaction through HTTP Pooling:

The following illustration illustrates the interaction of HTTP Pooling between independent servers of the Web server and the signaling server:

The following illustration demonstrates how to use WebSocket Proxy without using Pooling:

In WebRTC's signaling transmission method, we can also use SIP for interactive transmission. The SDP media negotiation here uses rfc3264. Both browser terminals and SIP terminals can be interconnected.

For SIP voice environments, WebSocket is a new transmission method when running WebRTC. Many SIP terminal softphones are now implemented through JavaScript. The script can be downloaded to the browser and supports the browser's SIP API. The browser opens the port through WebSocket to implement SIP registration, and implements encrypted transmission of WebSocket through WSS. The following example demonstrates a SIP process mechanism in WebRTC (including SIP terminal registration and calling):

Open source voice communication solutions are becoming increasingly popular among users. Here we list several popular open source projects, including media servers and SIP terminal products, so that you can test their functions. To clarify, FreeSWITCH is missing from the list.

Jingle is an extension of XMPP. The client supports JavaScript, and it also supports WebSocket signaling transmission. Because Jingle is an extension of XMPP, the signaling server here is still the XPMM server.

Through our above introduction, we roughly explained the process of various transmission methods. The following illustration summarizes the pros and cons of various methods:

JSEP (Javascript Session Establishment Protocol) is used in WebRTC to define the negotiation of media sessions and the negotiation of DATA channels. It still uses SDP object entities as session description and Offer/Answer negotiation protocol. It must be noted that JSEP does not set up any special signaling mode or state machine mode. It provides a mechanism to create Offers and Answers and apply them in the session. Therefore, the browser terminal needs to parse the data it sends by itself.

The following is a state machine illustration of JSEP. JSEP provides six state machine states. Users can go to the JSEP specification for further research.

WebRTC's SDP extension supports three relatively new functions, which are BUNDLE, MSID and arbitrary CNAME. Users can check it on the official website.

Next, let’s take a look at ICE’s processing flow. According to the introduction in the above chapter, ICE detection requires five steps (if the IP address of one of the two parties changes, ICE needs to be restarted, so it can also be regarded as six steps).
7.WebRTC NAT and ICE

WebRTC supports the NAT processing mechanism. In WebRTC, ICE is used to support NAT processing. We have made a brief introduction before, ICE requires the support of STUN and TURN servers. Regarding the use of NAT and STUN, the author has discussed it in the historical documents, which users can refer to.

The full English name of ICE is Interactive Connectivity Establishment. RFC5245 (updated RFC6336) specifies ICE. The general simple definition is: ICE=STUN+TURN+negotiation mechanism+negotiation path. The architecture of ICE is represented in the legend below.

The following is a Candidate message structure. For the meaning of each parameter in the structure, please refer to Part 3 (Terminology) of RFC3245.

In the previous chapters, the author has briefly explained the six steps of ICE creation using many illustrations. Let’s emphasize it in detail again. The six steps performed by ICE are:

Discover and collect application terminal information. Collect the communication address of the terminal and the type of terminal applicant (host, Reflexive and Relay candidate). These four addresses respectively represent the calling party's internal network address, the calling party's public network address, the called party's public network address and the relay address.

Below is an example of an SDP representing three different IP addresses.

Candidates are processed according to priority. In most cases, Relay candidates are used first.

Parse the candidate information and send it to the peer candidate

Pair candidates to ensure that both parties match

Check connectivity of paired candidated

Check if ICE can connect successfully, and if successful, send a confirmation message

In the above steps, the author first introduces some important concepts and contents in the execution steps, and then conducts detailed analysis based on specific scenarios. In the above steps, the STUN server first needs to obtain the candidate address. Regarding the specific details of STUN, readers can obtain learning materials from other authoritative websites.

In addition to STUN, ICE uses STURN's extension server TURN to obtain a relay address.

The specific TURN call process includes the following steps, starting to create a connection, media interworking after the connection is created, regularly refreshing the timeout setting and ending the session.

Both STUN and TURN have their own method, attribute settings, security settings, error code management and other detailed specifications. Users can refer to historical documents for further study, which will not be introduced here.

After obtaining the address through STUN and TURN, ICE needs to start SDP exchange. In SDP exchange, readers need to pay attention to its security settings, such as password settings and several main parameter addresses:

In particular, in the above illustration, the user uses ice-ufrag and ice-pwd to perform security authentication on STUN. These two passwords are automatically generated arbitrary passwords, with a minimum of four characters for the username and a minimum of 22 characters for the password. Several main parameters in SDP are used to implement negotiation and exchange of SDP. We will explain in detail in the next section.

After ICE obtains the SDP messages of both parties, it needs to perform a pairing check. ICE checks whether they have the same Component ID. After pairing, it combines the calling party and the called party to generate a Foundation pairing. Foundation pairing is generated by combining the local Foundation and the remote Foundation. Please pay attention to the change of a=candidate. If the local Foundation is 1, the received remote Foundation is 2. Finally, the paired Foundation is 1 2.

During ICE inspection, both terminals need to inform the other party who is the controlling party and the controlled party. When the ICE inspection officially starts, each candidate pairing group will enter five status inspections depending on the actual status:

Frozen, the check has not started yet

Waiting, waiting state, not executed yet

In Progress, in processing state

Suceeded/Failed, the check is completed and is in a successful state, or the device is checked and is in a failed state.

After the candidate check is completed, the ICE controller can still notify the ICE controlled party to change the candidate pair to support media sending through the USE-Candidate attribute parameter. The ICE controlled party replies with USE-Candidate to confirm this pairing modification.

After the ICE check is completed, in order to keep the state alive, both parties need to send refresh messages through Keepalives to ensure that the connection is normal and that NAT mapping does not time out and other issues. This time period is 15 seconds.

In the current ICE protocol standard, a relatively embarrassing problem is the processing of terminal response messages. STURN will send a response message, but the terminal will not process the response message. This is also a function that currently requires further support by the ICE extension protocol. For example, if no response message is received, does ICE need to restart; if a response message is received, how to proceed with the next step of the response processing process. Regarding ICE response message processing, readers can refer to (draft-muthu-behave-consent-freshness-01).

When both terminals are running, if ICE finds that the address of one of the candidates has changed, ICE will restart ICE and re-pair. The above is a brief introduction about ICE creation, checking, pairing and time refresh. In order to explain the ICE negotiation process in more detail, we illustrate these specific steps through a SIP/ICE process:

Through the priority oder, combine the two pairs to check the ICE to start testing. If both parties test successfully, the next step of signaling interaction will be carried out, for example, SIP 2 sends a 180 message and a 200 OK message. If the initial sent message is inconsistent with the received message, SIP Phone 1 needs to resend the Re-INVITE message, then perform test verification, and finally receive a 200 OK message.

After the ICE test is successful, both parties start sending RTP voice.

Regarding the support of ICE, in many of our common environments, some SIP terminals can support ICE, but some terminals may not support it. Users can check the ICE configuration capabilities of various softphones. The following SIP message indicates that the terminal supports ICE: sip.ice.

If the peer terminal does not support ICE, the terminal has only two options: 1) Continue the connection without using ICE. ICE supports an automatic detection return mechanism to notify the peer that it does not support ICE. 2) Or continue to use ICE with optional authorization support. According to the provisions of Mandating Support in RFC5768, the SIP terminal can add an ice option in the Require of the INVITE request.

In addition, users may sometimes see examples where there is no sip.ice in the SIP INVITE header of the terminal, but there is indeed ICE candidate information in the SDP. This is also the result of mutual incompatibility, but in the end this is an incompatibility. Supports ICE logo.

Because of the rapid development of SIP technology itself, the version of ICE is actually constantly updated. We briefly introduce two "upgraded" versions of ICE. Here, what I call upgraded versions are just an optimization of ICE, they are not an upgrade or update of ICE itself:

The main function of ICE-lite is to simplify the complexity of ICE, for example, we saw SBC.

The main purpose of Trickle ICE is to shorten the negotiation processing time of ICE and avoid redistributing candidates that have been forwarded. It can be turned on if necessary. Unlike ICE's standard processing flow, which needs to collect the candidate's information status information, it checks the connection status with the host candidate at the beginning and processes other exchange mechanisms in parallel. Therefore, Trickle ICE reduces the processing time relatively.

8.WebRTC security and privacy

Security issues and privacy are very important topics in Internet communications. In WebRTC, security aspects involve many technologies. Of course, during the use of browsers, many manufacturers provide some security mechanisms, and individuals should also have a certain degree of security awareness. We will not spend time introducing them here. In WebRTC, the two most important terminal resources are the camera and the microphone. Therefore, users need certain security settings or permission settings to protect these media resources. The use of WebRTC requires browsers to support more protocols and more server-side configurations, which will inevitably bring more security risks and the possibility of being attacked. Below, the author lists several safety-related suggestions that readers should pay attention to:

Attackers may use WebRTC to attack through the JavaScript API interface.

Browser users may need to update their browsers frequently to prevent attacks.

WebRTC's signaling can also be attacked, it all depends on the signaling and port used. For example, if WebSocket and SIP are used, attackers may attack through the security settings of these interfaces.

WebRTC media may be attacked, for example, whether it may be monitored or recorded.

SRTP cannot encrypt the RTP header, so the media between the two browsers may still not be secure.

Although SRTP still has certain limitations, SRTP is still the main security protocol in WebRTC. Now, let's take a look at the SRTP processing flow, which mainly goes through the following four steps.

Among these four processes, steps 1 and 2 have been introduced before. Here, we focus on steps 3 and 4. During the security authentication process of DTLS, it uses the client/server protocol for processing. It can use CA certificates and self-authorized certificates for certificate verification. Because DTLS is a client/server working method, one end of the browser must be the client and the other end must be the server. In WebRTC, both browsers must choose their roles. The role selection is set in the SDP message (a=setup), the Offer contains a=setup:actpass, and the Answer may contain a=setup:active or passive.

TLS uses X.509, which is a certificate issued by a CA, but browsers generally do not have these certificates. DTLS-SRTP can use certificates generated by public/private key.

There are many usage scenarios for WebRTC. What we are more concerned about here is the security issues in the corporate office environment. Therefore, for enterprise users, the following aspects need to be considered when deploying WebRTC:

Security risks caused by corporate network firewall settings, ACL access settings and point-to-point data flow issues

Do browser-to-browser audio and video recordings, system logs, and enterprise security policy formulation affect the deployment of WebRTC?

Can it be perfectly integrated with the current enterprise network?

9.WebRTC usage scenarios

There are many usage scenarios for WebRTC. Because WebRTC is a relatively new technology, there may still be many new application scenarios emerging. The current usage scenarios are roughly divided into two parts: one is the WebRTC usage scenario in the communication state, and the other is the WebRTC usage scenario in the non-communication state. WebRTC usage scenarios in communication include the following:

Page-based phone/video conferencing

Communication services with customers, including UC converged communications, customer communication

Enterprise converged communications/IPPBX/call center, supports SIP/HTML implementation and SIP/PSTN calls

Distributed communication methods/public services, etc.

Mobile WebRTC support, WebRTC not only supports desktop browsers, but also supports Android/IOS native API interfaces

Simple code WebRTC application scenario

Achieve other operations by controlling the camera and microphone

Telemedicine/Home Care

Online customer service/on-site support

Online one-to-one training

Media live broadcast

smart home

Industrial manufacturing

WebRTC application scenarios in non-communication states include:

Gaming apps include chatting, sharing files, etc.

Overlay web application

machine learning

Internet of things

File Sharing

virtual reality games

10. Current status and development trends of WebRTC

Although WebRTC technology currently has many application scenarios, the technology is also developing very rapidly. However, its technology updates very quickly, and readers need to frequently check the official website for technological developments and trends.

Some of the more important links are as follows:

https://www.w3.org/TR/webrtc/ https://w3c.github.io/webrtc-pc/ https://www.w3.org/TR/mediacapture-streams/

Because WebRTC relies on browser support, currently, most browsers support WebRTC functions and some functions, so readers should check the support status of these browsers to develop their own applications:

More browsers will support WebRTC in the future. Although WebRTC applications have very broad prospects, they still face many challenges:

Compatibility issues on various platforms, especially video encoding compatibility

Standardized deployment issues

Migration to mobile platforms remains sparse

Impact on mobile battery life

Lack of support from government and industry standards

According to official plans and market requirements, WebRTC technology still needs to do a lot of work in the future, and several major tasks need to be completed in the near future:

The final versions of W3C and IETF specifications and protocols are formed, because many of these recommendations are drafts and need to be finalized, and more time will be needed to complete them in the future.

Browsers need to support more WebRTC features and the latest version

Video encoding is widely used in WebRTC, but needs to be finalized

In enterprise applications, there are still not many WebRTC applications, and more applications need to increase the proportion of WebRTC usage.

11.WebRTC server and open source project examples

We have mentioned in the previous technology that WebRTC supports many application scenarios. Among them, readers may be more interested in some solutions for video conferencing. Currently, open source WebRTC servers are relatively popular:

Jitsi

current

Janus WebRTC Gateway

Mediasoup

The following example is an example of the integration of Kurento and Asterisk. SIP terminals are managed through Asterisk. Here, Kurento is used as a WebRTC media server to implement the mixing function of video conferencing.

Because of the development of WebRTC, testing tools have also slowly increased. There are many testing tools for WebRTC on the Internet. Testing tools also serve a completely different purpose. Today, the author will introduce to you a stress testing tool about WebRTC (Jattack: a WebRTC load testing tool). The paper includes technical architecture, test process, test results, and mainly tests and analyzes system resources.

For specific test methods and test results, readers can check the links to reference materials and conduct further research through the author's papers. There are also many commercial testing tools for WebRTC, which can detect the execution status of WebRTC, stress testing and other functions. The more famous ones are testRTC, readers can purchase or try its demo version.

Because of browser compatibility issues, many WebRTC applications cannot be successfully deployed. Testing the compatibility of different browsers is also a very headache. Google has released a tool (KITE) for compatibility testing of different browsers, readers can learn more about it. This tool is also very convenient for testing.

12. Summary

In the WebRTC technical overview, the author gave a relatively complete and comprehensive introduction to WebRTC technology from eleven aspects. These chapters include: background knowledge, media process, WebRTC organization ITEF/W3C, signaling protocol, media protocol, NAT/ICE creation process, security and privacy support, WebRTC user scenarios, parts that need to be improved in future WebRTC technology, and also lists Several issues need to be faced when deploying WebRTC. Finally, the author provides readers with several open source-based solutions, open source-based WebRTC testing solutions, and tools for WebRTC testing.

The author tries his best to give a relatively detailed and comprehensive introduction to the main technical nodes in each chapter. Due to space limitations, some relevant technical details require readers to do further study on their own. Readers can access their learning resources according to these reference links. You can also learn about the processes of these technologies through RFC specifications and some drafts.

webRtc technology and application overview

4.WebRTC related drafts

Guess you like