Build a many-to-many audio and video communication service

Three architecture schemes for multi-party audio and video communication

1. Mesh scheme

That is, multiple terminals are connected in pairs to form a network structure. This scheme is an extended version of the 1v1 WebRTC communication model, and any two nodes can be regarded as a 1v1 WebRTC communication model.

Advantages: No server needs to transfer data, STUN/TUTN is only responsible for NAT traversal, so it can be realized by using the existing WebRTC communication model without the need to develop a media server.

Disadvantage: A media stream needs to be forwarded to each participant, which occupies a lot of upstream bandwidth. The more participants, the greater the bandwidth occupied. The occupancy of the client's machine resources is proportional to the number of participants. When there are too many participants, this solution is unavailable. From a practical point of view, this kind of program will have very big problems when there are more than 4 people. In addition, since STUN/TUTN is only responsible for NAT traversal, if some people cannot achieve NAT traversal, they cannot connect.

2. MCU (Multipoint Conferencing Unit) solution

The scheme consists of a server and multiple terminals to form a star structure. The main processing logic of the MCU is: receive the audio and video streams of each shared end, decode, mix and re-encode with other decoded audio and video streams, and then send the mixed audio and video streams to everyone in the room.

In fact, the server side is an audio and video mixer, and the pressure on the server of this solution will be very large. MCU technology appeared very early in the field of video conferencing, and the current technology is also very mature, mainly used in the field of hardware video conferencing.

Private message me to receive the latest and most complete learning and improvement materials in 2022 , including ( C/C++ , Linux , FFmpeg , webRTC , rtmp , hls , rtsp , ffplay , srs )

 

 

3. SFU (Selective Forwarding Unit) scheme

This solution is also composed of a server and multiple terminals, but unlike MCU, SFU does not mix audio and video streams. After receiving the audio and video streams shared by a terminal, it directly forwards the audio and video streams to the rooms in the room. other terminals.

It is actually an audio and video routing repeater, and it can do some flow control according to the downlink network status of the terminal, and can selectively discard some media data according to the current bandwidth situation and network delay to ensure the continuity of communication. The details are described below.

Summarize

Compared with the Mesh solution, both SFU and MCU require a transit server. Because the MCU solution needs to mix multiple video channels into one channel, it requires a lot of operations and consumes a lot of CPU resources. Generally, special hardware is required for processing. The SFU scheme does not require multi-channel video mixing, so the demand for CPU is not so large, but it may lead to the problem of inconsistent pictures seen by different people at the same time in the case of multi-channel video.

Therefore, if you don't want to use a relay server, choose the Mesh solution; if you have high requirements for multi-channel video consistency and have special hardware, it is recommended to choose the MCU mode; otherwise, it is recommended to use the SFU mode.

SFU program

The SFU scheme is divided into Simulcast mode and SVC mode according to the different ways of discarding media data during network delay.

Simulcast mode

The so-called Simulcast mode means that the video sharer can send multiple video streams of different resolutions to the SFU at the same time (usually three channels, such as 1080P, 720P, 360P). The SFU can select one of the three received streams to send out according to the situation of each terminal.

For example, because the PC-side network is particularly good, a 1080P resolution video is sent to the PC-side; while the mobile network is poor, a 360P resolution video is sent to the Phone.

SVC mode

 

SVC is a scalable video coding mode. It divides the video into multiple layers - core layer, middle layer and extension layer during video encoding. The upper layer depends on the lower layer, and the higher the upper layer is, the clearer the lower layer is, and the more blurred the lower layer is. In the case of poor bandwidth, only the bottom layer, that is, the core layer, can be transmitted. In the case of sufficient bandwidth, all three layers can be transmitted.

Deploy multi-party video conferencing services

Based on the above analysis, we implement a project based on the SFU scheme. Open source projects include Licode, Janus-gateway, MediaSoup, Medooze. Here we deploy the Medooze-based Demo project SFU.

Operating environment preparation

  • OS: Ubuntu 18.04
  • Locale: NodeJS
apt-get update && apt-get install -y nodejs npm git-core

Install dependencies

git clone https://github.com/medooze/sfu.git
cd sfu
npm install

generate certificate

The certificate will have time to integrate and sort it out in the future.

start the service

node index.js IP

Docker service

Dockerfile

FROM ubuntu:18.04

WORKDIR /
RUN apt-get update && apt-get upgrade && apt install -y nodejs npm git-core python3 wget curl && npm install -g n && n stable
RUN hash -r && npm install -g [email protected]
RUN git clone https://github.com/medooze/sfu

WORKDIR /sfu
RUN npm install
RUN openssl req -x509 -out ./server.cert -keyout ./server.key   -newkey rsa:2048 -nodes -sha256   -subj '/CN=*'
ENTRYPOINT ["node" "index.js"]

Docker image

// 可以用上面的 Dockerfile 文件自己打镜像,也可用用我已经打包好的镜像
docker pull zhaowg/sfu:v1

How to use the Docker version

docker run -it --net=host  zhaowg/sfu:v1 <IP>

Guess you like

Origin blog.csdn.net/m0_60259116/article/details/124330174