Audio and video calls based on WebRTC

With the development of the Internet, the real-time audio and video call function has become an indispensable and important function in the fields of remote office, social entertainment and online education. As an open standard real-time communication protocol, WebRTC can easily realize real-time audio and video communication between browsers.

This time, we mainly share the audio and video call technology based on WebRTC, and explain the key concepts of WebRTC principle and audio and video transmission.

Through case practice, we will show you how to build an audio and video call application.

background

With the rapid development of Internet technology, real-time audio and video calls have become the core and commonly used functions in online education, remote office, social media and other fields. As an open real-time communication standard, WebRTC (Web Real-Time Communication) provides developers with the ability to quickly build real-time audio and video call systems. In this course, we will use WebRTC from 0 to 1 to build an application case of audio and video calls based on P2P architecture.

Application Scenario

  • Peer-to-peer video chat: such as WeChat video and other real-time video call applications.
  • Multi-person video conferencing: enterprise-level multi-person video conferencing systems, such as Feishu, DingTalk, Tencent Meeting, etc.
  • Online education: such as Tencent Classroom, NetEase Cloud Classroom, etc.
  • Live broadcast: game live broadcast, course live broadcast, etc.

P2P communication principle

P2P communication is peer-to-peer communication.

To realize real-time audio and video communication between two clients, and the two clients may be in different network environments and use different devices, what problems need to be solved?

Mainly the following three questions:

  • How to find each other?
  • How do different audio and video codec capabilities communicate?
  • How to get in touch with each other?

Below we will discuss these three issues one by one.

How to find each other?

In the process of P2P communication, both parties need to exchange some metadata such as media information, network data, etc. We usually call this process "signaling".

The corresponding server is the "signaling server (signaling server)", and some people usually call it the "room server", because it can not only exchange media information and network information with each other, but also manage room information.

for example:

1) Notify each other who has joined the room; 2) Who has left the room; 3) Tell the third party whether the room is full or not and whether they can join the room.

To avoid redundancy and maximize compatibility with existing technologies, the WebRTC standard does not specify signaling methods and protocols. In this course, websocket will be used to build a signaling server

How do different audio and video codec capabilities communicate?

Different browsers have different codec capabilities for audio and video.

For example: Taking an example in daily life, Xiao Li can speak Chinese and English, while Xiao Wang can speak Chinese and French. In order to ensure that both parties can correctly understand each other's meaning, the easiest way is to use the language they all know, that is, Chinese to communicate.

In WebRTC: There is a special protocol called Session Description Protocol (SDP), which can be used to describe the above-mentioned information.

Therefore: both parties involved in audio and video communication must exchange SDP information if they want to know the media format supported by the other party. The process of exchanging SDP is usually called media negotiation.

How to get in touch with each other?

In fact, it is the process of network negotiation, that is, the two parties participating in real-time audio and video communication need to understand each other's network conditions, so that it is possible to find a link for mutual communication.

The ideal network situation is that each client has its own private public IP address, so that a direct point-to-point connection can be made. In fact, for network security and other reasons, most clients are in a local area network and require network address translation (NAT).

In WebRTC we use ICE mechanism to establish network connection. The ICE protocol uses a series of technologies (such as STUN and TURN servers) to help the communicating parties discover and negotiate available public network addresses, thus realizing NAT traversal.

ICE works as follows:

  1. Firstly, the communication parties collect local network addresses (including private addresses and public addresses) and candidate addresses obtained through STUN and TURN servers.
  2. Next, the two parties exchange these candidate addresses through the signaling server.
  3. Both communicating parties use these candidate addresses for connection testing to determine the best available address.
  4. Once an available address is found, the two communicating parties can start a real-time audio and video call.

In WebRTC, network information is usually described by candidate

The summary of the above three problems: the media information SDP  and network information  candidate of each end are obtained through the API provided by WebRTC   , and exchanged through the signaling server , and then the connection channel between the two ends is established to complete the real-time video and voice call.

Common APIs

Audio and video capturegetUserMedia

// 获取本地音视频流
const getLocalStream = async () => {
  const stream = await navigator.mediaDevices.getUserMedia({ // 获取音视频流
    audio: true,
    video: true
  })

  localVideo.value!.srcObject = stream
  localVideo.value!.play()

  return stream
}

core object RTCPeerConnection

RTCPeerConnection  , as an API for creating point-to-point connections, is the key to our real-time audio and video communication.

const peer = new RTCPeerConnection({
  // iceServers: [
  //   { url: "stun:stun.l.google.com:19302" }, // 谷歌的公共服务
  //   {
  //     urls: "turn:***",
  //     credential: "***",
  //     username: "***",
  //   },
  // ],
});

The following methods are mainly used:

Media consultation method:

  • createOffer
  • createAnswer
  • setLocalDesccription
  • setRemoteDesccription

important events:

  • onicecandidate
  • onaddstream
     

The entire media negotiation process can be simplified into three steps corresponding to the above four media negotiation methods:

  1. The calling end creates an Offer (createOffer) and sends the offer message (the content is the SDP information of the calling end) to the receiving end through the signaling server, and at the same time calls setLocalDesccription to save the Offer containing the local SDP information
  2. After receiving the Offer information from the opposite end, the receiving end calls the setRemoteDesccription method to save the Offer containing the SDP information of the opposite end, and creates an Answer (createAnswer) and sends the Answer message (the content is the SDP information of the receiving end) to the calling end through the signaling server
  3. After receiving the Answer information from the opposite end, the calling end calls the setRemoteDesccription method to save the Answer containing the SDP information of the opposite end.

After the above three steps, the media negotiation part in the P2P communication process is completed. In fact, calling setLocalDesccription at the calling end and the receiving end also starts to collect the network information (candidate) of each end, and then each end monitors the event onicecandidate The respective candidates are collected and sent to the peer end through the signaling server, and then the network channel of P2P communication is opened, and the video stream of the other party is obtained by monitoring the onaddstream event to complete the entire video call process.

practice

Project build

front-end project

  1. For project use vue3+ts, run the following command:
npm create vite@latest webrtc-client -- --template vue-ts

2. And import tailwindcss:

npm install -D tailwindcss postcss autoprefixer
npx tailwindcss init -p

tailwind.config.js 3. Add paths to all template files in the generated  configuration file.

/** @type {import('tailwindcss').Config} */
module.exports = {
  content: [
    "./index.html",
    "./src/**/*.{vue,js,ts,jsx,tsx}",
  ],
  theme: {
    extend: {},
  },
  plugins: [],
}

4. The revised style.csscontent is as follows:

@tailwind base;
@tailwind components;
@tailwind utilities;

App.vue5. The content in the custom modification is as follows:

<script lang="ts" setup>
import { ref } from 'vue'

const called = ref<boolean>(false) // 是否是接收方
const caller = ref<boolean>(false) // 是否是发起方
const calling = ref<boolean>(false) // 呼叫中
const communicating = ref<boolean>(false) // 视频通话中
const localVideo = ref<HTMLVideoElement>() // video标签实例,播放本人的视频
const remoteVideo = ref<HTMLVideoElement>() // video标签实例,播放对方的视频

// 发起方发起视频请求
const callRemote = () => {
  console.log('发起视频');
}

// 接收方同意视频请求
const acceptCall = () => {
  console.log('同意视频邀请');
}

// 挂断视频
const hangUp = () => {
  console.log('挂断视频');
}
</script>

<template>
  <div class="flex items-center flex-col text-center p-12 h-screen">
    <div class="relative h-full mb-4">
      <video
        ref="localVideo" 
        class="w-96 h-full bg-gray-200 mb-4 object-cover"
      ></video>
      <video
        ref="remoteVideo"
        class="w-32 h-48 absolute bottom-0 right-0 object-cover"
      ></video>
      <div v-if="caller && calling" class="absolute top-2/3 left-36 flex flex-col items-center">
        <p class="mb-4 text-white">等待对方接听...</p>
        <img @click="hangUp" src="/refuse.svg" class="w-16 cursor-pointer" alt="">
      </div>
      <div v-if="called && calling" class="absolute top-2/3 left-32 flex flex-col items-center">
        <p class="mb-4 text-white">收到视频邀请...</p>
        <div class="flex">
          <img @click="hangUp" src="/refuse.svg" class="w-16 cursor-pointer mr-4" alt="">
          <img @click="acceptCall" src="/accept.svg" class="w-16 cursor-pointer" alt="">
        </div>
      </div>
    </div>
    <div class="flex gap-2 mb-4">
      <button 
        class="rounded-md bg-indigo-600 px-4 py-2 text-sm font-semibold text-white" 
        @click="callRemote"
      >发起视频</button>
      <button 
        class="rounded-md bg-red-600 px-4 py-2 text-sm font-semibold text-white" 
        @click="hangUp"
      >挂断视频</button>
    </div>
  </div>
</template>
 

After performing the above steps, you can run it npm run devto start the project locally

backend project

Create a webrtc-serverfolder, execute it npm init , press Enter all the way, and then run the following command to install socket.ioand nodemon:

npm install socket.io nodemon

Created index.jsfile and add the following content:

const socket = require('socket.io');
const http = require('http');

const server = http.createServer()

const io = socket(server, {
  cors: {
    origin: '*' // 配置跨域
  }
});

io.on('connection', sock => {
  console.log('连接成功...')
  // 向客户端发送连接成功的消息
  sock.emit('connectionSuccess');
})

server.listen(3000, () => {
  console.log('服务器启动成功');
});

package.jsonAdd the command in start, using nodemonthe startup project:

"scripts": {
  "test": "echo \"Error: no test specified\" && exit 1",
  "start": "nodemon index.js"
},

After execution, npm run startthe node service can be started on port 3000

Front-end connection signaling server

The front end needs to be installed socket.io-clientand connected to the signaling server:

<script setup lang="ts">
  // App.vue
  import { ref, onMounted, onUnmounted } from 'vue'
	import { io, Socket } from "socket.io-client";
  
  // ...
  const socket = ref<Socket>() // Socket实例
  
  onMounted(() => {
    const sock = io('localhost:3000'); // 对应服务的端口
    
    // 连接成功
    sock.on('connectionSuccess', () => {
      console.log('连接成功')
    });
    
    socket.value = sock;
  })
  
  // ...
</script>
 

Make a video request

Role: User A – Initiator, User B – Receiver

Room: Analog Chat Window

Join the room when the connection is successful:

// 前端代码
const roomId = '001'

sock.on('connectionSuccess', () => {
  console.log('连接服务器成功...');
  sock.emit('joinRoom', roomId) // 前端发送加入房间事件
})

// 服务端代码
sock.on('joinRoom', (roomId) => {
  sock.join(roomId) // 加入房间
})

User A initiates a video request and notifies User B:

  1. User A initiates a video request and notifies user B through the signaling server
// 发起方发起视频请求
const callRemote = async () => {
  console.log('发起视频');
  caller.value = true;
  calling.value = true;
  await getLocalStream()
	// 向信令服务器发送发起请求的事件
  socket.value?.emit('callRemote', roomId)
}

2. User B agrees to the video request and notifies user A through the signaling server

// 接收方同意视频请求
const acceptCall = () => {
  console.log('同意视频邀请');
  socket.value?.emit('acceptCall', roomId)
}

Start exchanging SDP information and candidate information:

  1. User A creates an RTCPeerConnection, adds local audio and video streams, generates an offer, and sends the offer to user B through the signaling server
// 创建RTCPeerConnection
peer.value = new RTCPeerConnection()
// 添加本地音视频流
peer.value.addStream(localStream.value)
// 生成offer
const offer = await peer.value.createOffer({
  offerToReceiveAudio: 1,
  offerToReceiveVideo: 1
})
console.log('offer', offer);
// 设置本地描述的offer
await peer.value.setLocalDescription(offer);
// 通过信令服务器将offer发送给用户B
socket.value?.emit('sendOffer', { offer, roomId })
 

2. User B receives the offer from user A

sock.on('sendOffer', (offer) => {
  if (called.value) { // 判断接收方
    console.log('收到offer', offer);
  }
})

3. User B needs to create its own RTCPeerConnection, add local audio and video streams, set remote description information, generate an answer, and send it to user A through the signaling server

// 创建自己的RTCPeerConnection
peer.value = new RTCPeerConnection()
// 添加本地音视频流
const stream = await getLocalStream()
peer.value.addStream(stream)
// 设置远端描述信息
await peer.value.setRemoteDescription(offer);
const answer = await peer.value.createAnswer()
console.log(answer);
await peer.value.setLocalDescription(answer);
// 发送answer给信令服务器
socket.value?.emit('sendAnswer', { answer, roomId })

4. User A receives the answer from user B

sock.on('sendAnswer', (answer) => {
  if (caller.value) { // 判断是否是发送方
    // 设置远端answer信息
    peer.value.setRemoteDescription(answer);
  }
})

5. User A obtains the candidate information and sends the candidate to user B through the signaling server

// 通过监听onicecandidate事件获取candidate信息
peer.value.onicecandidate = (event: any) => {
  if (event.candidate) {
    console.log('用户A获取candidate信息', event.candidate);
    // 通过信令服务器发送candidate信息给用户B
    socket.value?.emit('sendCandidate', {
      roomId,
      candidate: event.candidate
    })
  }
}

6. User B adds user A's candidate information

// 添加candidate信息
sock.on('sendCandidate', async (candidate) => {
  await peer.value.addIceCandidate(candidate);
})

7. User B obtains the candidate information and sends the candidate to user A through the signaling server (as above)

peer.value.onicecandidate = (event: any) => {
  if (event.candidate) {
    console.log('用户B获取candidate信息', event.candidate);
    // 通过信令服务器发送candidate信息给用户A
    socket.value?.emit('sendCandidate', {
      roomId,
      candidate: event.candidate
    })
  }
}

8. User A adds the candidate information of user B (as above)

// 添加candidate信息
sock.on('sendCandidate', async (candidate) => {
  await peer.value.addIceCandidate(candidate);
})

9. Next, user A and user B can conduct P2P communication flow

// 监听onaddstream来获取对方的音视频流
peer.value.onaddstream = (event: any) => {
  calling.value = false;
  communicating.value = true;
  remoteVideo.value!.srcObject = event.stream
  remoteVideo.value!.play()
}

hang up video

// 挂断视频
const hangUp = () => {
  console.log('挂断视频');
  socket.value?.emit('hangUp', roomId)
}

// 状态复原
const reset = () => {
  called.value = false
  caller.value = false
  calling.value = false
  communicating.value = false
  peer.value = null
  localVideo.value!.srcObject = null
  remoteVideo.value!.srcObject = null
  localStream.value = undefined
}

Extension: peerjs

Documentation: https://peerjs.com/docs/#start

Server implementation

// 使用peer搭建信令服务器
const { PeerServer } = require('peer');
const peerServer = PeerServer({ port: 3001, path: '/myPeerServer' });

Front-end implementation

<script setup lang="ts">
import { ref, onMounted } from 'vue'
import { Peer } from "peerjs";

const url = ref<string>()
const localVideo = ref<HTMLVideoElement>()
const remoteVideo = ref<HTMLVideoElement>()
const peerId = ref<string>()
const remoteId = ref<string>()
const peer = ref<any>()
const caller = ref<boolean>(false)
const called = ref<boolean>(false)
const callObj = ref<any>(false)

onMounted(() => {
  // 
  peer.value = new Peer({ // 连接信令服务器
    host: 'localhost',
    port: 3001,
    path: '/myPeerServer'
  });
  peer.value.on('open', (id: string) => {
    peerId.value = id
  })

  // 接收视频请求
  peer.value.on('call', async (call: any) => {
    called.value = true
    callObj.value = call
  });
})

// 获取本地音视频流
async function getLocalStream(constraints: MediaStreamConstraints) {
  // 获取媒体流
  const stream = await navigator.mediaDevices.getUserMedia(constraints)
  // 将媒体流设置到 video 标签上播放
  localVideo.value!.srcObject = stream;
  localVideo.value!.play();
  return stream
}

const acceptCalled = async () => {
  // 接收视频
  const stream = await getLocalStream({ video: true, audio: true })
  callObj.value.answer(stream);
  callObj.value.on('stream', (remoteStream: any) => {
    called.value = false
    // 将远程媒体流添加到 video 元素中
    remoteVideo.value!.srcObject = remoteStream;
    remoteVideo.value!.play();
  });
}

// 开启视频
const callRemote = async () => {
  if (!remoteId.value) {
    alert('请输入对方ID')
    return
  }
  const stream = await getLocalStream({ video: true, audio: true })

  // 将本地媒体流发送给远程 Peer
  const call = peer.value.call(remoteId.value, stream);
  caller.value = true
  call.on('stream', (remoteStream: any) => {
    caller.value = false
    // 将远程媒体流添加到 video 元素中
    remoteVideo.value!.srcObject = remoteStream;
    remoteVideo.value!.play();
  });
}
</script>

Tutorial: WebRTC implements audio and video calls

The original text is based on WebRTC to achieve audio and video calls-Knowledge 

 

★The business card at the end of the article can receive audio and video development learning materials for free, including (FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmaps, etc.

see below!

Guess you like

Origin blog.csdn.net/yinshipin007/article/details/131988337