[Switch] Using WebRTC to build a front-end video chat room - signaling

 

Transfer: Using WebRTC to build a front-end video chat room - signaling articles

Reprinted from: http://www.tuicool.com/articles/eYJvee

It is recommended to take a look before reading this. Using WebRTC to build a front-end video chat room - getting started

previous words

This article describes the signaling exchanges involved in WebRTC and signaling exchanges in chat rooms. The main content comes from WebRTC in the real world: STUN, TURN and signaling . I extracted some information here and added my own Some thoughts while developing.

WebRTC server

WebRTC provides browser-to-browser (peer-to-peer) communication, but that doesn't mean WebRTC doesn't require a server. Leaving aside some extended services based on the server, WebRTC must use the server for at least two things:
1. The metadata (signaling) exchanged between browsers to establish communication must pass through the server
2. In order to traverse NAT and firewalls

Why do you need signaling?

We need to establish communication between browsers through a series of signaling. And what exactly needs to be exchanged through signaling? Here is a rough list:
1. Connection control messages used to control communication on or off
2. Messages used to notify each other when an error occurs
3. Media stream metadata, such as decoders, decoder configuration, bandwidth, media Type, etc.
4. Key data used to establish a secure connection
5. Data on the network seen by the outside world, such as IP address, port, etc.

There is obviously no way to pass data between browsers until a connection is established. So we need to pass this data between browsers through the server's transit, and then establish a point-to-point connection between browsers. But none of this is implemented in the WebRTC API.

Why does WebRTC not implement signaling exchange?

The reason for not implementing the signaling exchange by WebRTC is simple: the developers of the WebRTC standard want to maximize compatibility with existing mature technologies. The specific connection establishment method is specified by a protocol called JSEP (JavaScript Session Establishment Protocol). Using JSEP has two advantages:
1. In JSEP, the key information to be exchanged is multimedia session description. Since developers use different protocols for signaling in their developed applications (SIP or XMPP or a protocol defined by the developers themselves), the idea of ​​WebRTC establishing a call is based on the media flow control level, so as to communicate with the upper layer of signaling. The transmissions are separated to prevent signal pollution between them. As long as the upper layer signaling provides it with key information such as the multimedia session descriptor, a connection can be established, no matter what method the developer uses to deliver it.
2. The architecture of JSEP also avoids saving the state of the connection on the browser, preventing it from acting like a state machine. Since the page is frequently refreshed, if the state of the connection is saved in the browser, it will be lost on each refresh. Using JSEP enables state to be saved on the server

Architecture diagram of JSEP

Session Description Protocol

JSEP divides the signaling transmitted between clients into two types: offer signaling and answer signaling. The format of their main content follows the Session Description Protocol (SDP for short). The content of an SDP signaling is roughly as follows:

v=0
o=- 7806956 075423448571 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video data
a=msid-semantic: WMS 5UhOcZZB1uXtVbYAU5thB0SpkXbzk9FHo30g
m=audio 1 RTP/SAVPF 111 103 104 0 8 106 105 13 126
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:grnpQ0BSTSnBLroq
a=ice-pwd:N5i4DZKMM2L7FEYnhO8V7Kg5
a=ice-options:google-ice
a=fingerprint:sha-256 01:A3:18:0E:36:5E:EF:24:18:8C:8B:0C:9E:B0:84:F6:34:E9:42:E3:0F:43:64:ED:EC:46:2C:3C:23:E3:78:7B
a=setup:actpass
a=mid:audio
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=recvonly
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:qzcKu22ar1+lYah6o8ggzGcQ5obCttoOO2IzXwFV
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
a=maxptime:60
m=video 1 RTP/SAVPF 100 116 117
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:grnpQ0BSTSnBLroq
a=ice-pwd:N5i4DZKMM2L7FEYnhO8V7Kg5
a=ice-options:google-ice
a=fingerprint:sha-256 01:A3:18:0E:36:5E:EF:24:18:8C:8B:0C:9E:B0:84:F6:34:E9:42:E3:0F:43:64:ED:EC:46:2C:3C:23:E3:78:7B
a=setup:actpass
a=mid:video
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=sendrecv
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:qzcKu22ar1+lYah6o8ggzGcQ5obCttoOO2IzXwFV
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 goog-remb
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
a=ssrc:3162115896 cname:/nERF7Ern+udqf++
a=ssrc:3162115896 msid:5UhOcZZB1uXtVbYAU5thB0SpkXbzk9FHo30g 221b204e-c9a0-4b01-b361-e17e9bf8f639
a=ssrc:3162115896 mslabel:5UhOcZZB1uXtVbYAU5thB0SpkXbzk9FHo30g
a=ssrc:3162115896 label:221b204e-c9a0-4b01-b361-e17e9bf8f639
m=application 1 DTLS/SCTP 5000
c=IN IP40.0.0.0
a=ice-ufrag:grnpQ0BSTSnBLroq
a=ice-pwd:N5i4DZKMM2L7FEYnhO8V7Kg5
a=ice-options:google-ice
a=fingerprint:sha-256 01:A3:18:0E:36:5E:EF:24:18:8C:8B:0C:9E:B0:84:F6:34:E9:42:E3:0F:43:64:ED:EC:46:2C:3C:23:E3:78:7B
a=setup:actpass
a=mid:data
a=sctpmap:5000 webrtc-datachannel 1024

What are all these things? To be honest, I don't know. I put such a large paragraph here, just to make the content of the article appear a lot... If you want to know more, you can refer to SDP for the WebRTC draft-nandakumar-rtcweb-sdp-04 to analyze it yourself

In fact, it can be simplified. It is a string that describes itself in a point-to-point connection. We can encapsulate it in JSON for transmission. After the PeerConnection is established, it is transferred through the server, and its own SDP descriptor and the other party. The SDP descriptor is handed over to PeerConnection

Signaling and RTCPeerConnection establishment

As introduced in the previous article, WebRTC uses RTCPeerConnection to transfer stream data between browsers. After establishing an RTCPeerConnection instance, to use it to establish a point-to-point channel, we need to do two things:
1. Determine the local machine The characteristics of the media stream on the Internet, such as resolution, codec capability, etc. (SDP descriptor)
2. The network addresses of the hosts on both ends of the connection (ICE Candidate)

It should be noted that since the hosts on both ends of the connection may be on the intranet or behind the firewall, we need a common positioning method for all networked computers. This involves NAT/firewall traversal technology and the ICE framework that WebRTC uses to achieve this. This part was introduced in the previous article and will not be repeated here.

Exchange SDP descriptors through offer and answer

Roughly, the process of establishing a point-to-point connection between two users (A and B) should look like this (the error situation is not considered here, RTCPeerConnection is referred to as PC):
1. A and B each establish a PC instance
2. A through the PC The provided createOffer()method creates an offer signaling that includes A's SDP descriptor.
3. A delivers A's SDP descriptor to A's PC instance through the setLocalDescription()method the PC
. 4. A sends the offer signaling to B through the server.
5 . B extracts the SDP descriptor contained in A's offer signaling, and passes setRemoteDescription()it to B's PC instance through the method provided by the
PC. 6. B uses the createAnswer()method create a SDP descriptor answer message containing B. Let
7. B sends B's SDP descriptor to B's PC instance through the setLocalDescription()method PC
8. B sends the answer signaling to A through the server
9. After receiving B's answer signaling, B sends the The SDP descriptor is extracted, and the setRemoteDescripttion()method is to A's own PC instance

After this series of signaling exchanges, the PC instances created by A and B both contain the SDP descriptors of A and B, completing the first of two things. We still need to do the second thing - get the network addresses of the hosts on both ends of the connection

Establish NAT/firewall traversal connections through the ICE framework

This network address should be directly accessible from the outside world, and WebRTC uses the ICE framework to obtain this address. RTCPeerConnection can pass in the address of the ICE server when it is created, such as:

var iceServer = {
    "iceServers": [{
        "url": "stun:stun.l.google.com:19302"
    }]
};
var pc = new RTCPeerConnection(iceServer);

Of course, this address also needs to be exchanged. Take A and B as an example. The exchange process is as follows (RTCPeerConnection is referred to as PC):
1. A and B each create a PC instance configured with an ICE server, and add onicecandidatean event
. 2. When the network When the candidate is available, the onicecandidatefunction
3. Inside the callback function, A or B encapsulates the network candidate message in the ICE Candidate signaling, relays it through the server, and delivers it to the other party
4. A or B receives the other party's message through the server relay station When the ICE Candidate signaling is sent, parse it and obtain the network candidate, and add it to the PC instance through the addIceCandidate()method of the PC instance

In this way, the connection is established, and the media stream data can be transmitted by addStream()adding . After adding the stream to the RTCPeerConnection instance, the other party can listen to it through onaddstreamthe bound callback function. The call addStream()can be made before the connection is completed, and after the connection is established, the other party can also monitor the media stream

Signaling in chat rooms

The above is the signaling exchange process between two users, but we need to build a chat room for multi-user online video chat. So some extensions are needed to meet this requirement

User action

First of all, you need to determine the general process of a user's operation in the chat room:
1. Open the page to connect to the server
2. Enter the chat room
3. Establish a point-to-point connection with all other users who are already in the chat room, and output it on the page
4. If other users in the chat room leave, they should be notified, close their connection and remove their output on the page
5. If another user joins, they should be notified, establish a connection with the newly added user, and output at Page
6. Leave the page, close all connections

As can be seen from the above, in addition to the establishment of a point-to-point connection, the server needs to do at least the following things:
1. When a new user joins a room, send the new user's information to other users in the room
2. When a new user joins a room, send The information of other users in the room is sent to the new user who joins the room
3. When the user leaves the room, the information of the leaving user is sent to other users in the room

Realize ideas

Taking WebSocket as an example, the above user operation process can be modified as follows:
1. The browser establishes a WebSocket connection with the server
2. Sends a signaling (join) to join the chat room, and the signaling needs to include the chat room that the user entered. Name
3. The server sends a signaling of other users (peers) according to the room that the user joins. The signaling contains the information of other users in the chat room, and the browser builds a point-to-point connection with other users one by one according to the information.
4. If any When the user leaves, the server sends a user leaving signaling (remove_peer), which contains the information of the leaving user. The browser closes and leaves the user information according to the information, and performs corresponding clearing operations.
5. If a new user joins, the server Send a user joining signaling (new_peer), the signaling contains the information of the newly added user, and the browser establishes a point-to-point connection with the new user according to the information
6. The user leaves the page and closes the WebSocket connection

server implementation

Since the user can just establish a connection and may not have entered a specific room, first we need a container to save all user connections and monitor whether the user has established a WebSocket connection with the server:

var server = new WebSocketServer();
var sockets = [];

server.on('connection', function(socket){
socket.on('close', function(){
var i = sockets.indexOf(socket);
sockets.splice(i, 1);
//关闭连接后的其他操作
});
sockets.push(socket);
//连接建立后的其他操作
});

Due to the division of the room, we need to create a container on the server to save the user information in the room. Obviously the object is more suitable, the key is the room name, and the value is a list of user information.

At the same time, we need to monitor the above-mentioned signaling (join) for the user to join the room. After the new user joins, we need to send information about other users in the room (peers) to the new user and send new user information (new_peer) to other users in the room, and When user leaves, send the message of leaving user to other users (remove_peer):

So the code roughly becomes like this:

var server = new WebSocketServer();
var sockets = [];
var rooms = {};

/*
join信令所接收的格式
{
"eventName": "join",
"data": {
"room": "roomName"
}
}
*/
var joinRoom = function(data, socket) {
var room = data.room || "__default";
var curRoomSockets; //当前房间的socket列表
var socketIds = []; //房间其他用户的id

curRoomSockets = rooms[room] = rooms[room] || [];

//给所有房间内的其他人发送新用户的id
for (var i = curRoomSockets.length; i--;) {
socketIds.push(curRoomSockets[i].id);
curRoomSockets[i].send(JSON.stringify({
"eventName": "new_peer",
"data": {
"socketId": socket.id
}
}));
}

//将新用户的连接加入到房间的连接列表中
curRoomSockets.push(socket);
socket.room = room;

//给新用户发送其他用户的信息,及服务器给新用户自己赋予的id
socket.send(JSON.stringify({
"eventName": "peers",
"data": {
"socketIds": socketIds,
"you": socket.id
}
}));
};

server.on('connection', function(socket) {
//为socket构建一个特有的id,用来作为区分用户的标记
socket.id = getRandomString();
//用户关闭连接后,应做的处理
socket.on('close', function() {
var i = sockets.indexOf(socket);
var room = socket.room;
var curRoomSockets = rooms[room];
sockets.splice(i, 1);
//通知房间内其他用户
if (curRoomSockets) {
for (i = curRoomSockets.length; i--;) {
curRoomSockets[i].send(JSON.stringify({
"eventName": "remove_peer",
"data": {
"socketId": socket.id
}
}));
}
}
//从room中删除socket
if (room) {
i = this.rooms[room].indexOf(socket);
this.rooms[room].splice(i, 1);
if (this.rooms[room].length === 0) {
delete this.rooms[room];
}
}
//关闭连接后的其他操作
});
//根据前台页面传递过来的信令进行解析,确定应该如何处理
socket.on('message', function(data) {
var json = JSON.parse(data);
if (json.eventName) {
if (json.eventName === "join") {
joinRoom(data, socket);
}
}
});
//将连接保存
sockets.push(socket);
//连接建立后的其他操作
});

Finally, it is enough to add point-to-point signaling forwarding. For a complete code, please refer to the source code of the SkyRTC project I wrote.

References

WebRTC in the real world: STUN, TURN and signaling

SDP for the WebRTC draft-nandakumar-rtcweb-sdp-04

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326446426&siteId=291194637