Audio and video applications--webrtc video recording summary

Background: To record in a video call, the specific requirements are as follows:

(1) Video call recording, save it as AVI format, and upload it to the server, the server sends to the message part of the device, but the management machine itself does not store the recording;

(2) Can monitor other equipment and support video recording

Therefore, it is necessary that the Android system now supports two types of recording: video recording and monitoring recording. There are development related recording functions on the device, but the overall hardware and performance of the device determine that there is a problem with the operation of the previous framework. At the same time, the frame of call recording and monitoring recording scores is different, so both sets of frameworks are required to support.

Android camera frame

Android camera framework Insert picture description here
application framework

The application code is at the application framework level, and it uses the Camera 2 API to interact with the camera hardware. Internally, these codes call the corresponding Binder interface to access the native code that interacts with the camera.

AIDL

The Binder interface associated with CameraService can be found in frameworks/av/camera/aidl/android/hardware. The generated code calls the lower-level native code to obtain access to the physical camera, and returns the data used to create the CameraDevice at the framework level and finally create the CameraCaptureSession object.

Native framework

This framework is located in frameworks/av/ and provides native classes equivalent to CameraDevice and CameraCaptureSession classes. See also NDK camera2 reference.

Binder IPC interface

The IPC binder interface is used to implement communication across process boundaries. Several camera Binder classes that call camera services are located in the frameworks/av/camera/camera/aidl/android/hardware directory. ICameraService is the interface of the camera service; ICameraDeviceUser is the interface of a specific camera device that has been opened; ICameraServiceListener and ICameraDeviceCallbacks are callbacks to the CameraService and CameraDevice of the application framework, respectively.

Camera service

Located in frameworks/av/services/camera/libcameraservice/CameraService. The camera service under cpp is the actual code for interacting with HAL.

THING

The hardware abstraction layer defines a standard interface that is called by the camera service and that you must implement to ensure the normal operation of the camera hardware.

If you find it difficult to find learning materials for audio and video development, you can add the editor's C/C++ communication group: 960994558 learning materials have been shared in the group, looking forward to your joining~

Insert picture description here

Webrtc part

Insert picture description here
Each path corresponds to a module, and the function of each module is roughly as follows:

Component

description

common_audio

Sound general functions, mainly abstracted sound processing functions

common_video

Video general functions, mainly abstracted image processing functions

media

Media-related content catalog

modules

Various modules, including codecs, neteq, mixing, bit rate control, tools, etc.

system_wrapper

Operating system related library function interface, the specific implementation is divided into operating system win, linux, android, mac

audio

The voice engine, which feels a bit like a logic layer, handles all voice-related operation logic and statistics information, and manages and maintains the audio channel

video

Video engine, video-related operation logic and statistical information, management and maintenance of video channel

Recording framework in WebRTC :

Audio recording process

The audio recording process can be divided into three sub-processes, namely: start the audio recording process/audio recording process in progress/close the audio recording process.


Turning on/off the audio recording process webrtc provides two interfaces, ave_VoE_StartRecording and ave_VoE_StopRecording, for the Java layer to turn on/off audio recording.

Turn on the audio recording process: Insert picture description here
Turn off the audio recording process: The process of
Insert picture description here
audio recording in progress

The process of audio recording in progress is the core of the entire recording. Its core is the TransmitMixer object, which can obtain the audio data of each channel and local microphone, and then uniformly mix them, and then write them into the file through the file handle. Unlike the video recording process, this process is unified and belongs to the same thread as the audio data collection. Audio collection, mixing, and file writing process:
Insert picture description here
image collection process

The image capture process in webrtc is initiated from VideoCaptureAndroid, and the captured image is returned to the ViECapture class in webrtc for processing, including fusion, encoding, and sending.
Insert picture description here
Video recording process

The video recording process can be divided into three sub-processes, namely: start the video recording process/video recording in progress process/close the video recording process.

Turn on/off the video recording process

webrtc provides two interfaces, ave_ViE_StartRecording and ave_ViE_StopRecording, for the Java layer to turn on/off video recording.

Turn on the video recording process: Insert picture description here
Turn off the audio recording process: The process of
Insert picture description here
video recording in progress is the core of the entire recording. Its core is the Capturer object and the recording thread. The Capturer object can obtain the image data of each channel and the local camera, and then unify them Mix it and add it to the buffer. The recording thread digests the Buffer through the Timer, encodes the image data and writes it into the file through the file handle. Different from the audio recording process, during the recording process, image acquisition and mixing and sound acquisition and mixing are processed in their respective threads, and image encoding and file writing are processed in the thread created when recording is started. A total of 3 threads are involved. .
Insert picture description here
Sound collection and mixing process 3.2 The audio recording process mentioned in the audio recording process is basically the same. The only difference is that the mixed audio data is directly written into the file in the audio recording, while in the video recording, it is Write the mixed audio data to the audio queue Buffer, which is used to write the file together with the video data in a separate recording thread.

Insert picture description here
Image coding and file writing process

Image encoding and file writing process (using AviRecorder::Process as the object analysis): Insert picture description hereProblems in Webrtc recording Insert picture description here
Recording stuck problems

Audio and video are two different threads during recording. At the same time, the video is decoded according to the received data and mixed with the local data, then encoded, and then written into the data file. Therefore, the performance overhead is increased. Most of the time is spent on mix and encode. The CPU occupies a relatively large amount. At the same time, a lot of other video data frames will be lost. Therefore, the recorded video appears to be very stuck, but the sound is normal.
Insert picture description here
FrameRate problem

For AVI files, in the implementation of the WebRTC code, the AVI header is the first to generate the content of the header file, and then cross-write the audio and video data frame by frame. Therefore, the frame rate and other related video data at this time are directly written according to the default value. The relative equipment performance gap is quite large, so the data written at this time is not the real video parameters, which causes the video playback to freeze. Insert picture description here
Audio and video are out of sync

  1. The amount of audio and video data is inconsistent;

The audio and video collection and file writing threads are independent, while the audio collection data and the received data are relatively stable, and the file writing is also relatively stable. Basically, the file is written in a frame of 10ms. However, the video data is relatively large, and the rate is unstable when receiving, which easily causes the processing time of one frame of data in different threads to be uncertain, and it is easy to lose some data.

  1. Network stability leads to unstable receiving video rate;

Solution in Webrtc

Increase cache

Increase the video caching mechanism to save all received video frames to the maximum. Insert picture description here
Remove code

Because image encoding takes too much time, and the customer does not need local video, the encoding step is eliminated from the entire process.

Frame rate statistics

Video is out of sync. During a call, the frame rate changes dynamically. According to the previous method of writing a fixed value, it may not match all situations completely, resulting in asynchronization of audio and video. At the same time, the initial statistical value is also inaccurate.

l New solution: calculate the frame rate according to the audio duration and the number of video frames;

l Preferred plan: start recording and end recording to record the time points separately, and then use the time difference and the number of frames to calculate the frame rate;

Both have errors, but the preferred solution will be closer to the actual frame rate.

New technology discussion

(1) Frame loss and frame compensation strategy:

That is, the input frame rate is 25, the output frame rate is 30. Or 30 to 25, etc. The frame rate is from small to large, or from large to small, and the output frame rate is a fixed frame rate. In this case, it involves frame dropping or frame compensation processing. For example, if you input 25 and output 30, you need to make up 5 frames. If you input 30 and output 25, you need to drop 5 frames. The key to the problem is which frames should be dropped and which frames should be filled. Generally we think that if 30->25, 5 frames are lost, then can we lose one frame every 6 frames, exactly 30 frames can lose 5 frames.

(2) Customization of recording format:

Now the customer's requirement is that the video is recorded in AVI and the audio is in WAV format, and different format options can be added, and the recording in various formats that can be supported in WebRTC. This requires the full support of WebRTC, including specific format encoding and decoding, and specific format file operations.

(3) Recording SDK package:

Encapsulate the recording function into a simple SDK, which is convenient to provide customers for secondary development.

Others, everyone is welcome to add. . .

Guess you like

Origin blog.csdn.net/weixin_52622200/article/details/114104772