Based on WebRTC to realize custom coding resolution sending

If you ask what technology field is the hottest in 2020? There is no doubt: audio and video. In 2020, the strong development of telecommuting and online education is inseparable from audio and video. Video conferencing, online teaching, and live entertainment are all typical application scenarios for audio and video.
Richer usage scenarios require us to consider how to provide more configurable items, such as resolution, frame rate, bit rate, etc., in order to achieve a better user experience. This article will focus on "resolution" for specific sharing.

How to implement custom encoding resolution

Let's first look at the definition of "resolution". Resolution: A parameter that measures the amount of pixel data in an image, and a key indicator to measure the quality of a frame of image or video. The higher the resolution, the larger the image volume (number of bytes) and the better the picture quality. For a video stream with YUV i420 format and resolution 1080p, the volume of a frame of image is 1920x1080x1.5x8/1024/1024≈23.73Mbit, and the frame rate is 30, so the size of 1s is 30x23.73≈711.9Mbit. It can be seen that the large amount of data requires high bit rate, so in the actual transmission process, the video needs to be compressed and encoded. Therefore, the resolution of the original data collected by the video capture device is called the acquisition resolution, and the resolution of the data actually sent to the encoder is called the encoding resolution.
Whether the video picture is clear and the proportions are appropriate, these will directly affect the user experience. The choice of camera capture resolution is limited, and sometimes the resolution we want cannot be directly captured by the camera. Then, the ability to configure the appropriate encoding resolution according to the scene is crucial. How to convert the captured video into the encoding resolution we want to send? This is what we mainly share today.
WebRTC is Google's open source, powerful real-time audio and video project. Most developers on the market build real-time audio and video communication solutions based on WebRTC. Each module in WebRTC has a good abstraction decoupling process, which is very friendly to our secondary development. When we build real-time audio and video communication solutions, we need to understand and learn the design ideas and code modules of WebRTC, and have the ability to develop and expand. In this article, we are based on the WebRTC Release 72 version, and talk about how to implement custom encoding resolution.
First, let's think about the following questions:
What is the pipeline of video data from collection to encoding and sending?
How to select the appropriate acquisition resolution according to the set encoding resolution?
How can I get the desired encoding resolution?
The content of this article will also be shared from the above three points.

The installation environment and various video materials have been prepared for everyone, and the materials are placed in their own group: 832218493 (you need to pick it up)

Insert picture description here
Insert picture description here

Pipeline of video data

First, let's take a look at the pipeline of video data. Video data is generated by VideoCapturer. After VideoCapturer collects data, it is processed by VideoAdapter, and then distributed to registered VideoSink via VideoSource's VideoBroadcaster. VideoSink is the encoder Encoder Sink and the local preview Preview Sink.

For video resolution, the process is: set the desired resolution to VideoCapturer, VideoCapturer selects the appropriate resolution to capture, and the original capture resolution data is calculated by VideoAdapter, and then scaled and cropped to obtain the encoding after it does not meet expectations. The video data of the resolution is sent to the encoder after encoding.

Insert picture description here

There are two key questions here:
How does VideoCapturer choose the appropriate capture resolution?
How does the VideoAdapter convert the capture resolution to the encoding resolution?

How to choose the right acquisition resolution

Selection of capture resolution
WebRTC abstracts a Base class for video capture: videocapturer.cc. We call the abstraction VideoCapturer. Set the parameter attributes in VideoCapturer, such as video resolution, frame rate, supported pixel format, etc., VideoCapturer will According to the set parameters, calculate the best collection format, and then use this collection format to call the VDM (Video Device Module) of each platform. The specific settings are as follows: The
code is taken from src/media/base/videocapturer.h in WebRTC

VideoCapturer.h
bool GetBestCaptureFormat(const VideoFormat& desired, VideoFormat* best_format);//内部遍历设备支持的所有采集格式调用GetFormatDistance()计算出每个格式的distance,选出distance最小的那个格式
int64_t GetFormatDistance(const VideoFormat& desired, const VideoFormat& supported);//根据算法计算出设备支持的格式与我们想要的采集格式的差距,distance为0即刚好满足我们的设置
void SetSupportedFormats(const std::vector<VideoFormat>& formats);//设置采集设备支持的格式fps,resolution,NV12, I420,MJPEG等       

According to the set parameters, sometimes GetBestCaptureFormat() cannot get the capture format that is more in line with our settings, because different devices have different capture capabilities. iOS, Android, PC, Mac native camera capture and external USB camera capture support resolution It is different, especially the external USB camera capture ability is uneven. Therefore, we need to slightly adjust GetFormatDistance() to meet our needs. Let's talk about how to adjust the code to meet our needs.

Selection strategy source code analysis

Let's analyze the source code of GetFormatDistance() first, and extract part of the code: The
code is taken from src/media/base/videocapturer.cc in WebRTC

// Get the distance between the supported and desired formats.
int64_t VideoCapturer::GetFormatDistance(const VideoFormat& desired,
                                         const VideoFormat& supported) {
    
    
  //....省略部分代码
  // Check resolution and fps.
  int desired_width = desired.width;//编码分辨率宽
  int desired_height = desired.height;//编码分辨率高
  int64_t delta_w = supported.width - desired_width;//宽的差
  
  float supported_fps = VideoFormat::IntervalToFpsFloat(supported.interval);//采集设备支持的帧率
  float delta_fps = supported_fps - VideoFormat::IntervalToFpsFloat(desired.interval);//帧率差
  int64_t aspect_h = desired_width
                         ? supported.width * desired_height / desired_width
                         : desired_height;//计算出设置的宽高比的高,采集设备的分辨率支持一般宽>高
  int64_t delta_h = supported.height - aspect_h;//高的差
  int64_t delta_fourcc;//设置的支持像素格式优先顺序,比如优先设置了NV12,同样分辨率和帧率的情况优先使用NV12格式采集
  
  //....省略部分降级策略代码,主要针对设备支持的分辨率和帧率不满足设置后的降级策略
  
  int64_t distance = 0;
  distance |=
      (delta_w << 28) | (delta_h << 16) | (delta_fps << 8) | delta_fourcc;

  return distance;
}

We mainly focus on the Distance parameter. Distance is a concept in WebRTC. It is the difference between the set acquisition format and the acquisition format supported by the device according to a certain algorithm strategy. The smaller the difference, the closer the acquisition format supported by the device is to the desired format, which is 0. That is, it just matches.

Distance is composed of four parts: delta_w, delta_h, delta_fps, delta_fourcc, delta_w (resolution width) has the heaviest weight, delta_h (resolution high) second, delta_fps (frame rate) again, delta_fourcc (pixel format) last. The problem caused by this is that the proportion of wide is too high, and the proportion of high is too low to match the more accurately supported resolution.

Example:
Taking iPhone xs Max 800x800 fps:10 as an example, we extract the distance of some acquisition formats. The original GetFormatDistance() algorithm does not meet the demand. What you want is 800x800. You can see from the figure below that the result is Best 960x540, not as expected:

Supported NV12 192x144x10 distance 489635708928
Supported NV12 352x288x10 distance 360789835776
Supported NV12 480x360x10 distance 257721630720
Supported NV12 640x480x10 distance 128880476160
Supported NV12 960x540x10 distance 43032248320
Supported NV12 1024x768x10 distance 60179873792
Supported NV12 1280x720x10 distance 128959119360
Supported NV12 1440x1080x10 distance 171869470720
Supported NV12 1920x1080x10 distance 300812861440
Supported NV12 1920x1440x10 distance 300742082560
Supported NV12 3088x2316x10 distance 614332104704
Best NV12 960x540x10 distance 43032248320

Selection strategy adjustment

In order to obtain the resolution we want, according to our analysis, we need to clearly adjust the GetFormatDisctance() algorithm, adjust the weight of the resolution to the highest, and the frame rate second. If the pixel format is not specified, the pixel format is the last, then modify details as following:

int64_t VideoCapturer::GetFormatDistance(const VideoFormat& desired,
const VideoFormat& supported) {
    
    
 //....省略部分代码
  // Check resolution and fps.
int desired_width = desired.width; //编码分辨率宽
int desired_height = desired.height; //编码分辨率高
  int64_t delta_w = supported.width - desired_width;
  int64_t delta_h = supported.height - desired_height;
  int64_t delta_fps = supported.framerate() - desired.framerate();
  distance = std::abs(delta_w) + std::abs(delta_h);
  //....省略降级策略, 比如设置了1080p,但是摄像采集设备最高支持720p,需要降级
  distance = (distance << 16 | std::abs(delta_fps) << 8 | delta_fourcc);
return distance;
}

After modification: Distance consists of three parts: resolution (delta_w+delta_h), frame rate delta_fps, pixel delta_fourcc, where (delta_w+delta_h) has the highest proportion, delta_fps second, and delta_fourcc last.

Example:
Take the iPhone xs Max 800x800 fps:10 as an example. We extract the Distance of some acquisition formats. After the modification of GetFormatDistance(), we want 800x800, and the selected Best is 1440x1080. We can get 800x800 by zooming and cropping. In line with expectations (if the resolution requirements are not particularly accurate, you can adjust the downgrade strategy and choose 1024x768):

Supported NV12 192x144x10 distance 828375040
Supported NV12 352x288x10 distance 629145600
Supported NV12 480x360x10 distance 498073600
Supported NV12 640x480x10 distance 314572800
Supported NV12 960x540x10 distance 275251200
Supported NV12 1024x768x10 distance 167772160
Supported NV12 1280x720x10 distance 367001600
Supported NV12 1440x1080x10 distance 60293120
Supported NV12 1920x1080x10 distance 91750400
Supported NV12 1920x1440x10 distance 115343360
Supported NV12 3088x2316x10 distance 249298944
Best NV12 1440x1080x10 distance 60293120

How to achieve acquisition resolution to encoding resolution

After the video data is collected, it will be processed by VideoAdapter (abstract in WebRTC) and then distributed to the corresponding Sink (abstract in WebRTC). We make a slight adjustment in the VideoAdapter to calculate the parameters required for scaling and cropping, and then scale the video data with LibYUV and then crop it to the encoding resolution (in order to preserve as much picture image information as possible, first use the scaling process, the aspect ratio is inconsistent When cropping excess pixel information). Here we focus on two issues:

Still use the above example, we want the resolution to be 800x800, but the best capture resolution we get is 1440x1080, then how to get the set encoding resolution 800x800 from the 1440x1080 capture resolution?
Video data will be processed by the VideoAdapter during the process of streaming from VideoCapture to VideoSink. What exactly does the VideoAdapter do?

Let's start a specific analysis on these two issues, let's first understand what VideoAdapter is.

VideoAdapter introduction

This is how the VideoAdapter is described in WebRTC:

VideoAdapter adapts an input video frame to an output frame based on the specified input and output formats. The adaptation includes dropping frames to reduce frame rate and scaling frames.VideoAdapter is
thread safe.

We can understand that: VideoAdapter is a module for data input and output control, which can control the frame rate and resolution and degrade the resolution accordingly. In the VQC (Video Quality Control) video quality control module, through the configuration of the VideoAdapter, the frame rate can be dynamically reduced under low bandwidth and high CPU conditions, and the resolution can be dynamically scaled to ensure smooth video To improve user experience.
Taken from src/media/base/videoadapter.h

VideoAdapter.h
bool AdaptFrameResolution(int in_width,
int in_height,
                            int64_t in_timestamp_ns,
int* cropped_width,
int* cropped_height,
int* out_width,
int* out_height);
void OnOutputFormatRequest(
const absl::optional<std::pair<int, int>>& target_aspect_ratio,
const absl::optional<int>& max_pixel_count,
const absl::optional<int>& max_fps);
void OnOutputFormatRequest(const absl::optional<VideoFormat>& format);

VideoAdapter source code analysis

The VideoAdapter calls AdaptFrameResolution() according to the set desried_format to calculate the cropped_width, cropped_height, out_width, out_height parameters that should be scaled and cropped from the capture resolution to the encoding resolution. The native adaptFrameResolution of WebRTC calculates the zoom parameters based on the calculated pixel area, and Can't get precise width & height:
taken from src/media/base/videoadapter.cc

bool VideoAdapter::AdaptFrameResolution(int in_width,
int in_height,
                                        int64_t in_timestamp_ns,
int* cropped_width,
int* cropped_height,
int* out_width,
int* out_height) {
    
    
//.....省略部分代码
// Calculate how the input should be cropped.
if (!target_aspect_ratio || target_aspect_ratio->first <= 0 ||
        target_aspect_ratio->second <= 0) {
    
    
      *cropped_width = in_width;
      *cropped_height = in_height;
    } else {
    
    
const float requested_aspect =
          target_aspect_ratio->first /
static_cast<float>(target_aspect_ratio->second);
      *cropped_width =
          std::min(in_width, static_cast<int>(in_height * requested_aspect));
      *cropped_height =
          std::min(in_height, static_cast<int>(in_width / requested_aspect));
    }
const Fraction scale;//vqc 缩放系数 ....省略代码
    // Calculate final output size.
    *out_width = *cropped_width / scale.denominator * scale.numerator;
    *out_height = *cropped_height / scale.denominator * scale.numerator;
 }

Example:
Taking iPhone xs Max 800x800 fps:10 as an example, set the encoding resolution to 800x800 and the acquisition resolution to 1440x1080. According to the native algorithm, the new resolution calculated is 720x720, which does not meet expectations.
VideoAdapter adjustment
VideoAdapter is an important part of video quality adjustment in VQC (Video Quality Control Module). The reason why VQC can complete frame rate control, resolution scaling and other operations mainly relies on VideoAdapter, so the modification needs to consider the impact on VQC.
In order to accurately obtain the desired resolution without affecting the resolution control of the VQC module, we make the following adjustments to AdaptFrameResolution():

bool VideoAdapter::AdaptFrameResolution(int in_width,
int in_height,
                                        int64_t in_timestamp_ns,
int* cropped_width,
int* cropped_height,
int* out_width,
int* out_height) {
    
    
  //....省略部分代码
bool in_more =
        (static_cast<float>(in_width) / static_cast<float>(in_height)) >=
        (static_cast<float>(desired_width_) /
static_cast<float>(desired_height_));
if (in_more) {
    
    
        *cropped_height = in_height;
        *cropped_width = *cropped_height * desired_width_ / desired_height_;
    } else {
    
    
      *cropped_width = in_width;
      *cropped_height = *cropped_width * desired_height_ / desired_width_;
    }
    *out_width = desired_width_;
    *out_height = desired_height_;
    //....省略部分代码
return true;
}

Example:
Take the iPhone xs Max 800x800 fps:10 as an example, set the encoding resolution to 800x800 and the acquisition resolution to 1440x1080. According to the adjusted algorithm, the calculated encoding resolution is 800x800, which is in line with expectations.

to sum up

This article mainly introduces how to realize the configuration of encoding resolution based on WebRTC. When we want to modify the video encoding resolution, we need to understand the entire process of video data collection, transmission, processing, encoding, etc. Here are a few key steps to share today, when we want to implement custom encoding When sending resolution:
First, set the desired encoding resolution;
modify VideoCapturer.cc to select the appropriate acquisition resolution according to the encoding resolution;
modify VideoAdapter.cc to calculate the acquisition resolution scaling and cropping to the encoding resolution Parameters;
use libyuv to scale and crop the original data to the encoding resolution according to the scaling and cropping parameters;
then send the new data to the encoder for encoding and send;
finally, Done.
In the same way, we can also make other adjustments based on this idea. The above is all the introduction of this article. We will continue to share more audio and video-related technical implementations, and welcome to leave a message to exchange related technologies with us.
The 5G era has come, and the application areas of audio and video will become wider and wider, and everything is promising.

Guess you like

Origin blog.csdn.net/lingshengxueyuan/article/details/112986451