How does WebRTC achieve audio and video recording

foreword

This article describes WeRTChow to achieve audio and video recording

Reference materials:
1. 2022 Android Eleven Big Factory interview questions; 134 real questions; no longer afraid of interviews
2. Tencent Android development notes

1. What is webRTC

" WebRTC( Web Real-Time Communications) is a real-time communication technology that allows network applications or sites to establish a peer-to-peer (Peer-to-Peer) connection between browsers without intermediaries to achieve video streaming and/or audio Streaming or other arbitrary data transmission. WebRTCThe inclusion of these standards makes it possible for users to create peer-to-peer (Peer-to-Peer) data sharing and teleconferencing without installing any plug-ins or third-party software." In summary, In fact, there are four points:

Cross-platform
(mainly) for browsers
real-time transmission
audio and video engine

After a little in-depth study webRTC, I found that it can be used not only for "audio and video recording, video calls", but also for "cameras, music players, shared remote desktops, instant messaging tools, P2P network acceleration, file transfers, real-time "Face recognition" and other scenarios - of course, it is completed on the basis of combining many other technologies. But a few of these seem to be "familiar" to us: camera, face recognition, shared desktop. This is because RTC uses an API based on audio and video streams!

2. webRTC audio and video data collection

The most important thing to realize data transmission is data collection. Here is a very important API:

let promise=navigator.mediaDevices.getUserMedia(containts);

This API prompts the user for permission to use the media input, which produces a MediaStream containing a track of the requested media type. This stream can contain a video track (from hardware or virtual video sources, such as cameras, video capture devices, screen sharing services, etc.), an audio track (also from hardware or virtual audio sources, such as microphones, A/D converters, etc. etc.), and possibly other track types.

It returns an object and will call back an object Promise on success . If the user denies the permission, or the required media source is not available, an or will be called back .resolveMediaStreampromiserejectPermissionDeniedErrorNotFoundError

The most important thing here is the "orbit": because it is based on "flow". What it gets is an MediaSource object (it's an streamobject)! That means: if you want to use this result, you must either use srcObjectattributes or URL.createObjectURL() convert to url!

/**
* params:
*          object: 用于创建URL的File 对象、Blob对象或者MediaSource对象。
* return : 一个DOMString包含了一个对象URL，该URL可用于指定源 object的内容。
*/
objectURL= URL.create0bjectURL(object) ;

Back to the API itself, it can be understood as follows: through it, you can get the collection of all audio and video channels on the current page, divide them into different "tracks" according to different interfaces, and we can set and output configuration items for them.

In the article, you can see that this API is used in the part of implementing "photographing"

Let's first write an element on the page video:

<video autoplay playsinline id="player"></video>

Because it is necessary to obtain audio and video streams, and HTML5the related elements in it are video and elements audio, the only ones that can obtain these two at the same time videoare elements (if you only need audio streams, you can use audioelements).

According to the above getUserMedia API usage, it is not difficult to write the following grammar:

navigator.mediaDevices.getUserMedia(constraints)
                        .then(gotMediaStream)
                        .catch(handleError);

3. webRTC acquisition constraints

"Constraint" refers to some configuration items of the control object. Constraints fall into two categories: video constraints and audio constraints.

In webRTC, commonly used video constraints are:

width
height
aspectRatio: aspect ratio
frameRate
facingMode: Camera flip (front and rear cameras) (this configuration is mainly for mobile terminals, because the browser only has a front camera)
resizeMode: whether to clip

The commonly used audio constraints are:

volume:Volume
sampleRate:Sampling Rate
sampleSize: sample size
echoCancellation: Echo cancellation setting
autoGainControl: Auto Gain
noiseSuppression: noise reduction
latency: Delay (according to different scenarios, generally speaking, the smaller the delay, the better the experience)
channelCount: Mono/Dual channel…

contraintswhat is it Officially it is called " MediaStreamContraints". We can see it more clearly through the code:

dictionary MediaStreamContraints{
    (boolean or MediaTrackContaints) video = false;
    (boolean or MediaTrackContaints) audio = false;
}

It is responsible for the collection of audio and video constraints

If video/audioit is simply set as boolthe type, it is simply to decide whether to collect
If video/audioyes Media Track, you can further set such as: video resolution, frame rate, audio volume, sampling rate, etc.

According to the above description, we can improve the configuration items in the code - of course, usually the first thing to do when using this API is to determine whether the user's current browser supports:

if(!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia){
    console.log('getUserMedia is not supported!');
    return;
}else{
    var constraints={
        // 这里也可以直接：video:false/true，则知识简单的表示不采集/采集视频
        video:{
            width:640,
            height:480,
            frameRate:60
        },
        // 这里也可以直接：audio:false/true，则只是简单的表示不采集/采集音频
        audio:{
            noiseSuppression:true,
            echoCancellation:true
        }
    }
    navigator.mediaDevices.getUserMedia(constraints)
                            .then(gotMediaStream)
                            .catch(handleError);

}

Now it's time to implement getUserMedia API the callback function for success and failure. Here we need to understand "what to do when calling back" - in fact, it is nothing more than handing over the "acquired stream": save it as a global variable or directly as video/audioa srcObjectvalue (success) or throws an error (failure)

var videoplay=document.querySelector('#player');
function gotMediaStream(stream){
    // 【1】
    videoplay.srcObject=stream;
    // 【2】
    // return navigator.mediaDevices.enumerateDevices();
}
function handleError(err){
    console.log('getUserMedia error:',err);
}

A comment is used API : mediaDevices.enumerateDevices() . MediaDevices method enumerates all available media input and output devices (such as microphone, camera, headset, etc.). The returned promise MediaDevice is resolved with an array of information describing the device.
It can also be regarded as the "track collection" mentioned above. For example, if you promise write the comment according to the method here, you will find that it is an array containing three "tracks". Two audio (input, output) one video :

And when you output streamthe parameters you will find

In some scenarios, through this APIwe can expose some device configurations to users

So far the first effect in the video - real-time capture of audio and video is complete. This is also the basis for the following: if you can't capture it, how can you record it?

First add some needed nodes:

<video id="recplayer" playsinline></video>
<button id="record">Start Record</button>
<button id="recplay" disabled>Play</button>
<button id="download" disabled>Download</button>

As mentioned earlier, what is obtained after capture is a "stream" object, so when receiving, it must also be able to get the stream and operate it API : MediaStream API!

MDNIt is introduced as follows: " MediaRecorder It is MediaStream Recording API an interface provided for easy recording of media, and it needs to MediaRecorder() be instantiated by calling the constructor." Because it is MediaStream Recording API a provided interface, it has a constructor:

MediaRecorder.MediaRecorder(): Create a new object and record the MediaRecorderspecified object. The supported configuration items include setting the type of container (such as " " or " ") and the bit rate of audio and video or the same bit rate for bothMediaStreamMIMEvideo/webmvideo/mp4

According to MDNthe method provided above, we can get a clearer idea: call the constructor first, pass in the stream obtained earlier, APIget an object after parsing, and pass them into the array in turn under the specified time slice (Because the work has basically been completed so far, it is convenient to use the array to receive and convert to Blobobject operation in the future):

function startRecord(){
    // 定义一个接收数组
    buffer=[];

    var options={
        mimeType:'video/webm;codecs=vp8'
    }
    // 返回一个Boolean值,来表示设置的MIME type 是否被当前用户的设备支持.
    if(!MediaRecorder.isTypeSupported(options.mimeType)){
        console.error(`${options.mimeType} is not supported`);
        return;
    }

    try{
        mediaRecorder=new MediaRecorder(window.stream,options);
    }catch(e){
        console.error('Fail to create');
        return;
    }
    mediaRecorder.ondataavailable=handleDataAvailable;
    // 时间片
    // 开始录制媒体,这个方法调用时可以通过给timeslice参数设置一个毫秒值,如果设置这个毫秒值,那么录制的媒体会按照你设置的值进行分割成一个个单独的区块
    mediaRecorder.start(10);
}

There are two points to note here:

Acquired streamstream: Because the function of capturing audio and video before must be encapsulated, so if you want to use it here again, you must set this stream object as a global object——I directly mounted it on the object, and commented in the previous windowcode [1] places the receiving object: window.stream=stream;
bufferAn array is defined , mediaRecorderwhich is the same as the object. Considering the ondataavailable use in the method and the need to stop capturing and recording the audio and video stream after completion, the variable is also declared under the global

var buffer;
var mediaRecorder;

ondataavailableThe method is triggered after the recording data preparation is completed. What we need to do in it is to store the sliced data in sequence according to the previous thinking

function handleDataAvailable(e){
    // console.log(e)
    if(e && e.data && e.data.size>0){
        buffer.push(e.data);
    }
}

//结束录制
function stopRecord(){
    mediaRecorder.stop();
}

Then call:

let recvideo=document.querySelector('#recplayer');
let btnRecord=document.querySelector('#record');
let btnPlay=document.querySelector('#recplay');
let btnDownload=document.querySelector('#download');
// 开始/停止录制
btnRecord.onclick=()=>{
    if(btnRecord.textContent==='Start Record'){
        startRecord();
        btnRecord.textContent='Stop Record';
        btnPlay.disabled=true;
        btnDownload.disabled=true;
    }else{
        stopRecord();
        btnRecord.textContent='Start Record';
        btnPlay.disabled=false;
        btnDownload.disabled=false;
    }
}
// 播放
btnPlay.onclick=()=>{
    var blob=new Blob(buffer,{type: 'video/webm'});
    recvideo.src=window.URL.createObjectURL(blob);
    recvideo.srcObject=null;
    recvideo.controls=true;
    recvideo.play();
}
// 下载
btnDownload.onclick=()=>{
    var blob=new Blob(buffer,{type:'video/webm'});
    var url=window.URL.createObjectURL(blob);
    var a=document.createElement('a');
    a.href=url;
    a.style.display='none';
    a.download='recording.webm';
    a.click();
}

At this point in the article, the functions in the video are basically realized, but there is still a problem: if you only want to record audio, at this time when you are talking, because the capture stream is always working, there are actually two sound sources at this time—you voice and captured your voice. So we need to turn off the sound when capturing! getUserMedia APIBut if you add it in according to the above, volume:0 it will not have any effect-because there is no browser support for this attribute so far.

However, we think that what we carry audio and video is a video/audiocontainer, so we can actually directly use the properties DOMof its corresponding nodes volumeto control:

// 在前面代码中注释为【2】的地方添加代码
videoplay.volume=0;

Reference materials:
1. 2022 Android Eleven Big Factory interview questions; 134 real questions; no longer afraid of interviews
How does WebRTC realize audio and video recording

How does WebRTC achieve audio and video recording in the original text-Knowledge

★The business card at the end of the article can receive audio and video development learning materials for free, including (FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmaps, etc.

see below!