Interpretation of the results of the 2022 MSU World Coder Contest by multi-dimensional evaluation indicators

It is the ultimate performance and the best commercial use.

Above 19 items, the ultimate bandwidth reduction of 63%

Recently, the results of the 2022 MSU World Video Encoder Competition were officially announced. According to the report, Alibaba Media Processing Service MPS (Alibaba Media Processing Service) s264 and s265 encoders won the first place in 19 evaluations. Compared with the competition designated benchmark encoder (AWS Elemental MediaConvert), it can save up to 63% of the code rate. Dramatically reduce bandwidth and storage costs.

The MSU World Video Encoder Competition is the most authoritative and influential global top event in the field of video encoding and compression. It has been held for 17 times so far. In this year's cloud transcoding track alone, 19 coders from 8 domestic and foreign participating companies participated in the competition, including well-known technology companies such as Microsoft, Amazon, and Tencent.

This time, Alibaba Cloud Video Cloud's self-developed s264 encoder took the lead in the H264 subjective and objective tracks , and won 15 of the 19 subdivision tracks, and it can save at least 16% of the cost under the same subjective quality. Bandwidth and storage costs are 13 times ahead of competitors in terms of transcoding efficiency; at the same time, the self-developed s265 encoder has 2 to 6 times higher transcoding efficiency and more accurate codes than competitors at comparable bandwidth and storage costs rate control .

480p Comparison

720p Comparison

1080p Comparison

1080p Subjective Comparison (subjective track)

In order to comprehensively evaluate the level of participating encoders, the MSU competition uses multiple classic objective indicators such as PSNR, SSIM, and VMAF, as well as subjective indicators based on human eye scoring :

PSNR, based on the calculation of the error between the original video and the distorted video based on the corresponding pixels to evaluate the quality of the distorted video, is the most traditional and basic evaluation criterion in the current video quality assessment, but because PSNR does not directly consider the visual characteristics of the human eye, so The evaluation results are not completely consistent with people's subjective feelings;
SSIM estimates the visual quality of distorted images from three aspects: brightness, contrast, and structural information. It aims to compare the structural similarity between the original video and the distorted video, and to study the damage of the perceptual structure to evaluate the video quality, which can better reflect the human eye. subjective characteristics;
VMAF is a video quality evaluation index that combines human visual modeling with machine learning. It combines algorithms of different evaluation dimensions to obtain an image quality evaluation standard that can accurately reflect subjective will. The vision system is a complex system, and this indicator requires a large number of effective data sets that are in line with the actual evaluation environment.
The subjective quality of the human eye represents the golden standard of video quality, because people are the ultimate consumers of videos, so subjective quality evaluation is from the perspective of the observer, truly reflecting people's visual experience and aesthetic level, and avoiding various None of these objective quality models can fully simulate the problem of the human visual system.

Observing the MSU competition, you will find that the "SSIM" is also selected as the main evaluation indicator by the MSU organizer.

“For objective quality measurements we used YUV-SSIM metric (see Appendix F.1) as a main objective indicator, and other metrics (PSNR, VMAF) as an additional quality metrics. Our team is constantly researching the area of objective video quality metrics to find good solutions for large comparisons.”

In the actual development process of the encoder, in order to reduce the difficulty and cost of testing, the objective evaluation is often the main method, but when the final version is released, the subjective quality is still the most important evaluation basis. Practice has shown that using such an evaluation method for the encoder can not only ensure the development efficiency of the encoder, but also ensure that the quality of the encoded image conforms to the subjective characteristics of the human eye.

Cloud transcoding, what is the best commercial use?

Whether it is subjective or objective indicators, in the limited focus of the public, they often pay too much attention to single-dimensional picture quality, while ignoring the more critical application indicators-transcoding speed and bit rate control.

In this competition, in addition to the image quality of the encoded video, the organizer of the MSU competition also evaluated important indicators such as the transcoding speed and bit rate control (bit rate accuracy) of each cloud transcoding manufacturer , which can comprehensively evaluate each manufacturer Encoder performance and commercial value . The real performance and commercial value are also the core of the encoder.

First, is the transcoding speed.

We know that the higher the bit rate, the lower the degree of video compression. Conversely, the lower the bit rate, the higher the degree of video compression. While ensuring the image quality, this indicator most directly affects bandwidth and storage costs .

At the same time, the faster the transcoding speed, the higher the timeliness of the transcoding task . In actual commercial scenarios, the efficiency improvement brought by the speed is self-evident. Higher transcoding efficiency also means lower power consumption .

As shown in the figure below, under the same quality, the vertical axis indicates the average bitrate of the output files of each manufacturer's encoder relative to the benchmark encoder; the horizontal axis indicates, under the same quality, the encoding required by the benchmark encoder time scale.

Take the H264 1080P, YUV (6:1:1) SSIM metric sub-track as an example

As indicated by the horizontal (Faster) and vertical (Better) arrows, the closer to the upper left corner of the picture, it means: under the same quality, the lower the code rate of the encoder, the faster the transcoding speed. It can be seen that, in addition to the excellent subjective and objective picture quality mentioned above, Ali MPS s264 is even more in the lead in terms of encoder performance and commercial value.

Also in the HEVC/AV1 track, according to the vertical comparison shown in the figure below, the Ali MPS s265 encoder can save up to 63% bit rate compared with the benchmark encoder when the quality is the same. At the same time, horizontal comparison shows that the transcoding efficiency is 2 to 6 times higher than that of competitors under the same quality and comparable bandwidth and storage costs .

Take HEVC 1080P, YUV (6:1:1) SSIM metric sub-track as an example

It is worth mentioning that, as shown in the figure below, the results on the homepage of the MSU official website show that: Ali MPS has the fastest transcoding speed, and the time fluctuation required for transcoding is the smallest . Related transcoding services , and this is also a reflection of the technical strength and comprehensive capabilities of cloud vendors.

The abscissa represents the transcoding duration, and the length of the legend represents the fluctuation of the transcoding duration

Furthermore, it is rate control.

Bit rate control accuracy is also an important indicator that needs to be referred to in actual commercial use. Why do you say that?

In actual commercial use, customers are very sensitive to picture quality and cost. If the code control is not good, the actual output code rate of the encoder may be quite different from the target code rate, which will have a great impact on the actual experience of customers.

For example, when the customer's demand is to reduce bandwidth and storage costs, the actual output bit rate of the encoder may be much higher than the target bit rate, which will lead to an increase in customer bandwidth and storage costs; When fidelity is used, the video bit rate output by the encoder may be much lower than the target bit rate, which will seriously damage the overall image quality and fail to meet the needs of end users. Taken together, highly unstable code control will eventually lead to the loss of commercial value.

It can be seen that the basicity and necessity of rate control.

For bit rate control, the figure below shows the ratio of the actual output bit rate of the HEVC/AV1 track to the preset target bit rate. Infinitely close to 1 means the higher the accuracy of the encoder bit rate control .

As an example in the following figure, assuming that the customer needs to compress the video to 500M, if the bit rate control is not good, the difference between the actual output bit rate and the preset target bit rate is at least 2 times, and at least 7 times, then the output video may be between 1000M and Randomly generated between 3500M.

Therefore, the core goal of an optimal encoder is not to compete for a single performance index, but to seek the ultimate multi-dimensional balance between the ultimate quality, the ultimate bit rate, the ultimate efficiency, and the cost , and ultimately serve the track and customers Bring the most effective application breakthrough, and this is the best quality of a commercial encoder.

The self-evolution of "integration of software and hardware"

From the perspective of the MSU competition, we can see more technological breakthroughs and application innovations when transferred to commercial implementation.

Ali MPS is mainly based on the two encoders s264 and s265 self-developed by Video Cloud, covering live broadcast, on-demand, and RTC scenarios. From the kernel, pre-processing to code control, more than 100 algorithms have been developed based on different application scenarios.

In addition, Alibaba Cloud Video Cloud and Pingtouge data center solution team jointly conducted in-depth optimization of s264 and s265 encoders for Yitian ECS, creating an ARM-friendly video encoder.

In terms of ARM video coding optimization, the video coding data structure was reconstructed, the parallel framework was re-tuned, and the fast algorithm strategy was re-tuned, and cross-layer in-depth optimization was performed from the software , assembly , and hardware levels to create the ultimate cost.

In the future, based on the super computing power of Yitian ECS, Alibaba Cloud Video Cloud will focus on video codec and video processing to continuously tap the computing power space, and continue to shape the ultimate performance through the joint optimization of "integration of software and hardware".

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4713941/blog/8696024