A brief chat on WebRTC video calls

WebRTC provides a set of standard APIs so that web applications can directly provide real-time audio and video communication functions. Most browsers and operating systems support WebRTC, and real-time audio and video calls can be initiated directly on the browser. This article uses the perspective of a WebRTC beginner to complete a 1V1 web version of real-time audio and video calls.

To complete an audio and video call, you need to understand four modules: audio and video collection, STUN/TURN server, signaling server, and P2P connection between end-to-end. Use WebRTC's API to complete audio and video collection, and cooperate with the signaling server and WebRTC's RTCPeerConnection method to achieve 1V1 calls. The simple process is as follows:

Next, their functions and core APIs will be explained in turn.

As a benefit of this article, you can receive free C++ audio and video learning materials package, technical videos/codes, including (audio and video development, interview questions, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, codec, push-pull streaming, srs)↓↓↓ ↓↓↓See below↓↓Click at the bottom of the article to get it for free↓↓

Audio and video collection

WebRTC uses getUserMedia to obtain the media stream object MediaStream corresponding to the camera and microphone. The media stream can be transmitted through WebRTC and shared among multiple peers. Assign the stream object to the srcObject of the video element,

Realize local playback of audio and video

Attributes

meaning

width

video width

height

video height

aspectRatio

Proportion

frameRate

Frame rate

facingMode

Mirror mode

resizeMode

size mode

API:navigator.mediaDevices.getUserMedia
参数:constraints
返回:promise,方法调用成功得到MediaStream对象。

const localVideo = document.querySelector("video");

function gotLocalMediaStream(mediaStream) {
  localVideo.srcObject = mediaStream; 
}

navigator.mediaDevices
  .getUserMedia({ 
      video: {
        width: 640,    
        height: 480,  
        frameRate:15, 
        facingMode: 'enviroment', // 设置为后置摄像头 
        deviceId : deviceId ? {exact:deviceId} : undefined 
      },
      audio: false
   })
  .then(gotLocalMediaStream)
  .catch((error) => console.log("navigator.getUserMedia error: ", error));

Connection management

Now that you know how to capture local audio and video, learn how to establish a connection with the other end to transmit audio and video data.

RTCPeerConnection is a unified interface for WebRTC to implement network connection, media management, and data management. Several important classes in RTCPeerConnection are required to establish a P2P connection: SDP, ICE, STUN/TURN.

  1. Session description information RTCSessionDescription (SDP)

SDP is the capability of each end, including audio codec type, transmission protocol, etc. This information must be passed to establish a connection. Both parties know whether the video supports audio and what the encoding method is, and they can all obtain it through SDP.

For example, when transmitting video, my encoding is H264 and the other party can only decode H265, making communication impossible.

The SDP description is divided into two parts, namely session level description (session level) and media level description (media level). For its specific composition, please refer to RFC4566. Those with an asterisk (*) are optional. Common content is as follows:

Session description(会话级别描述) 
    v= (protocol version) 
    o= (originator and session identifier)
    s= (session name) 
    c=* (connection information -- not required if included in all media) One or more Time descriptions ("t=" and "r=" lines; see below) 
    a=* (zero or more session attribute lines) Zero or more Media descriptions 
    
Time description 
    t= (time the session is active) 

Media description(媒体级别描述), if present 
    m= (media name and transport address) 
    c=* (connection information -- optional if included at session level) 
    a=* (zero or more media attribute lines)

When SDP is parsed, each SDP Line is in the form of key=.... After parsing out that the key is a, there may be two ways. Please refer to RFC4566:

a=<attribute> 
a=<attribute>:<value>

Sometimes it does not necessarily mean that the colon (:) is <attribute>:<value>. In fact, there will also be a colon in the value, for example:

a=fingerprint:sha-256 7C:93:85:40:01:07:91:BE 
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset 
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time 
a=ssrc:2527104241 msid:gLzQPGuagv3xXolwPiiGAULOwOLNItvl8LyS

Take a look at a specific example:

alert(pc.remoteDescription.sdp);

 v=0
 o=alice 2890844526 2890844526 IN IP4 host.anywhere.com
 s=
 c=IN IP4 host.anywhere.com
 t=0 0
 //下面的媒体描述,在媒体描述部分包括音频和视频两路媒体
 m=audio 49170 RTP/AVP 0
 a=fmtp:111 minptime=10;useinbandfec=1 //对格式参数的描述
 a=rtpmap:0 PCMU/8000 //对RTP数据的描述
 
... 
 //上面是音频媒体描述,下面是视频媒体描述
 m=video 51372 RTP/AVP 31
 a=rtpmap:31 H261/90000
 ... 
 m=video 53000 RTP/AVP 32
 a=rtpmap:32 MPV/90000

2. ICE candidate RTCIceCandidate

The most convenient method for WebRTC point-to-point connection is direct IP connection of both parties. However, in actual applications, both parties will cause trouble in obtaining addresses through NAT devices.

WebRTC uses the ICE framework to determine the best path for establishing network connections at both ends, shielding developers from complex technical details.

(NAT and ICE frameworks are a black box for developers using WebRTC. In order to optimize the reading experience, this part is placed at the end as supplementary knowledge)

Developers need to know:

  1. principle

Two nodes exchange ICE candidates to negotiate how they will connect. Once both ends agree on a mutually compatible candidate, that candidate's SDP is used to create and open a connection over which media streaming begins.

  1. Two APIs

onicecandidate: Triggered after the local agent creates an SDP Offer and calls setLocalDescription(offer), and passes the candidate information to the remote end through the signaling server in the eventHandler.

addIceCandidate: Called after receiving the candidate information sent by the signaling server to add an ICE agent to the local machine.

API:pc.onicecandidate = eventHandler
pc.onicecandidate = function(event) {
  if (event.candidate) {
    // Send the candidate to the remote peer
  } else {
    // All ICE candidates have been sent
  }
}


API:pc.addIceCandidate
pc.addIceCandidate(candidate).then(_=>{
  // Do stuff when the candidate is successfully passed to the ICE agent
}).catch(e=>{
  console.log("Error: Failure during addIceCandidate()");
});

signaling server

WebRTC's SDP and ICE information need to rely on the signaling server for message transmission and exchange, and the establishment of a P2P connection before audio and video calls can be made and text information can be transmitted. WebRTC cannot communicate without a signaling server.

Signaling servers are typically built using socket.io's real-time communication capabilities. Socket.io is cross-platform, cross-terminal, and cross-language, making it convenient for us to implement signaling on each end and connect with our server.

This picture expresses the role of the signaling server in the entire call process.

Use code to see how to set up a socket.io signaling server

var express = require("express");
var app = express();
var http = require("http");
const { Server } = require("socket.io");
const httpServer = http.createServer(app);
const io = new Server(httpServer);

io.on("connection", (socket) => {
    console.log("a user connected");
    socket.on("message", (room, data) => {
      logger.debug("message, room: " + room + ", data, type:" + data.type);
      socket.to(room).emit("message", room, data);
    })
    socket.on("join", (room) => {
      socket.join(room);
    })
});

P2P connection between end-to-end

  1. connection process

The process of establishing a network connection between A and B is as follows:

  • A initiates a WebRTC call to B
  • Create a peerConnection object and specify the address of Turn/Stun in the parameters.
var pcConfig = {
  iceServers: [
    {
      urls: "turn:stun.al.learningrtc.cn:3478",
      credential: "mypasswd",
      username: "garrylea",
    },
    {
      urls:[
        "stun:stun.example.com",
        "stun:stun-1.example.com"
      ]
    }
  ],
};

pc = new RTCPeerConnection(pcConfig);
  • A calls the createOffer method to create a local session description (SDP offer). The SDP offer contains all MediaStreamTrack information about the codecs and options that have been attached to the WebRTC session, the browser supports, and the ICE proxy, with the purpose of sending it to the potential through the signaling channel. A remote endpoint to request a connection or update the configuration of an existing connection.
  • A calls the setLocalDescription method to set the proposal to the local session description and passes it to the ICE layer. Then send the session description to B through the signaling server
API:pc.createOffer
参数:无
返回:SDP Offer

API:pc. setLocalDescription
参数:offer
返回:Promise<null>

function sendMessage(roomid, data) {
  if (!socket) {
    console.log("socket is null");
  }
  socket.emit("message", roomid, data);
}

const offer = await pc.createOffer()
await pc.setLocalDescription(offer).catch(handleOfferError);
message.log(`传输发起方本地SDP`);
sendMessage(roomid, offer);

After the A-side pc.setLocalDescription(offer) is created, an icecandidate event is sent to RTCPeerConnection, and the onicecandidate event is triggered. The B-side receives a new ICE candidate address information sent through signaling from the remote page. The B-side can add an ICE agent by calling RTCPeerConnection.addIceCandidate().

As a benefit of this article, you can receive free C++ audio and video learning materials package, technical videos/codes, including (audio and video development, interview questions, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, codec, push-pull streaming, srs)↓↓↓ ↓↓↓See below↓↓Click at the bottom of the article to get it for free↓↓

//A端
pc.onicecandidate = (event) => {
  if (!event.candidate) return;
  sendMessage(roomid, {
    type: "candidate",
    label: event.candidate.sdpMLineIndex,
    id: event.candidate.sdpMid,
    candidate: event.candidate.candidate,
  });
};


//B端
socket.onmessage = e => {
 if (e.data.hasOwnProperty("type") && e.data.type === "candidate") {
  var candidate = new RTCIceCandidate({
    sdpMLineIndex: data.label,
    candidate: data.candidate,
  });
  pc.addIceCandidate(candidate)
    .then(() => {
      console.log("Successed to add ice candidate");
    })
    .catch((err) => {
      console.error(err);
    });
 }
}
  • As the caller, A obtains the local media stream, calls the addtrack method to add the audio and video stream to the RTCPeerConnection object and transmits it to the other end. When joining, the other end triggers the ontrack event.
媒体流加入媒体轨道
API:stream.getTracks
参数:无
返回:媒体轨道对象数组

const pc = new RTCPeerConnection();
stream.getTracks().forEach((track) => {
  pc.addTrack(track, stream); 
});

const remoteVideo = document.querySelector("#remote-video");
pc.ontrack = (e) => {
  if (e && e.streams) {
    message.log("收到对方音频/视频流数据...");
    remoteVideo.srcObject = e.streams[0];
  }
};
  • As the calling party, B receives the session information from A from the signaling server, calls the setRemoteDescription method to pass the proposal to the ICE layer, and calls the addTrack method to join RTCPeerConnction.
  • B calls the createAnswer method to create a response, calls the setLocalDeacription method to set the response to the local session and passes it to the ICE layer.
socket.onmessage = e => {
    message.log("接收到发送方SDP");
    await pc.setRemoteDescription(new RTCSessionDescription(e.data));
    message.log("创建接收方(应答)SDP");
    const answer = await pc.createAnswer();
    message.log(`传输接收方(应答)SDP`);
    sendMessage(roomid, answer);
    await pc.setLocalDescription(answer);
}

AB has its own and the other party's SDP, and reaches an agreement on media exchange. The collected ICE completes the connectivity test to establish the best connection method, establishes a P2P connection, and obtains the other party's audio and video media streams.

pc.ontrack = (e) => {
  if (e && e.streams) {
    message.log("收到对方音频/视频流数据...");
    remoteVideo.srcObject = e.streams[0];
  }
};
  1. Bidirectional data channel connection

RTCDataChannelton can establish point-to-point P2P interconnection through the RTCPeerConnection API, without the need for an intermediary server and with lower latency.

One end establishes a datachannel, and the other end obtains the datachannel object through ondatachannel.

API:pc.createDataChannel
参数: label  通道名
      options?  通道参数
返回:RTCDataChannel


function receivemsg(e) {
  var msg = e.data;
  if (msg) {
    message.log("-> " + msg + "\r\n");
  } else {
    console.error("received msg is null");
  }
}

const dc = pc.createDataChannel("chat");
dc.onmessage = receivemsg;
dc.onopen = function () {
  console.log("datachannel open");
};

dc.onclose = function () {
  console.log("datachannel close");
};

pc.ondatachannel = e => {
  if(!dc){
    dc = e.channel;
    dc.onmessage = receivemsg;
    dc.onopen = dataChannelStateChange;
    dc.opclose = dataChannelStateChange;
  }
}; //当对接创建数据通道时会回调该方法。

NAT and ICE framework

As mentioned above, ICE integrates a variety of NAT traversal technologies, such as STUN and TURN, which can achieve NAT traversal and discover P2P transmission path mechanisms between hosts. Next, let’s briefly introduce what NAT, STUN, and TURN are.

  1. Network Address Translation (NAT)

NAT is often deployed at the network egress of an organization. The network is divided into two parts: the private network and the public network. The NAT gateway is set at the routing exit from the private network to the public network. Bidirectional data between the private network and the public network must pass through the NAT gateway. A large number of devices within an organization can share a public IP address through NAT, solving the problem of insufficient IPv4 addresses.

As shown in the figure below, there are two organizations. Each organization's NAT assigns a public IP, which are 1.2.3.4 and 1.2.3.5 respectively. Each organization's private network device converts the internal network address to a public network address through NAT, and then joins the Internet.

There are four implementation methods for NAT to treat UDP, namely: complete cone type, address-restricted cone type, port-restricted cone type, and symmetric type.

  1. Session Traversal Utilities for NAT (STUN)

STUN allows a client behind a NAT (or multiple NATs) to find out its own public network address, what type of NAT it is behind, and the public network port that the NAT is bound to a certain local port.

STUN is a C/S mode protocol. The client sends a STUN request and the STUN service response informs the IP address and port number assigned to the host by NAT. It is also a Request/Response protocol. The default port number is 3478.

If you want the internal network host to know its external network IP, you need to set up a STUN server on the public network and send a Request to this server, and the server will return its public network IP.

Below is a captured pair of STUN binding requests and responses. First, the client sends a Binding Request (STUN binding request) to the STUN server with the address 216.93.246.18.

The server responded with a Binding Response and returned the public IP:

  1. traversal Using Relay NAT(TURN)

TURN is a data transfer protocol. Allows NAT or firewall penetration via TCP or UDP. TURN is a Client/Server protocol. The NAT penetration method of TURN is similar to that of STUN. Both achieve NAT penetration by obtaining the public network address in the application layer.

  1. ICE collection

Both ends of ICE do not know the location and NAT type of the network they are in. ICE can dynamically discover the optimal transmission path. The ICE side collects local addresses, NAT external network addresses through the STUN service, and relay addresses through TURN, so there are three candidate addresses:

host type, that is, the IP and port of the local intranet;

srflx type, which is the IP and port of the external network after local NAT mapping;

relay type, that is, the IP and port of the relay server.

 
    IP: xxx.xxx.xxx.xxx, 
    port: number, 
    type: host/srflx/relay, 
    priority: number, 
    protocol: UDP/TCP, 
    usernameFragment: string 
    ...
 }

In the figure below, Alice and Bob collect three types of candidates through STUN and TURN servers.

ICE performs connectivity testing after collecting candidates to determine the best P2P transmission path between hosts.

Effect

As a benefit of this article, you can receive free C++ audio and video learning materials package, technical videos/codes, including (audio and video development, interview questions, FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, codec, push-pull streaming, srs)↓↓↓ ↓↓↓See below↓↓Click at the bottom of the article to get it for free↓↓ 

Guess you like

Origin blog.csdn.net/m0_60259116/article/details/132875906