Tao Department Internal Lectures | Audio and Video Technology Standards: Industry Panorama and Alibaba Innovation

BBTime——Bridging Business&Technology is sponsored by the Amoy Technology PMO&Technology Strategy Development Team. It regularly invites industry celebrities, university scholars, and senior experts inside and outside the industry to gather people who know business and technology best to share the best practices for creating real value for business. I hope to continuously link technological innovation and commercial value and grasp the direction of Internet commercial technology.


This issue of [BBTime-Alibaba Audio and Video Technology Analysis] We are honored to invite Mr. Ye Yan, a researcher from Ali Dharma Academy to explain "Audio and Video Technology Standards: Industry Panorama and Ali Innovation". The following is a transcript of the speech content, I hope to give it to everyone Bring help.

This article mainly explains for everyone from the following three parts:

  • The latest international video standard VVC, this is a very important news in the video industry recently.

  • The video trend and the application of VVC under this trend, including the VVC codec developed by Dharma Academy and Taobao team.

  • Other related standards

A brief history of international video standards

The picture above shows two heavyweight international video standards organizations: the International Telecommunications Union ITU-T and the International Organization for Standardization ISO/IEC MPEG. These two giants have been working on video standards 30 years ago, and they are now the sixth generation. Among them, several video standards jointly formulated by these two giants have a very far-reaching impact on the international video industry. For example, MPEG2, which perfectly helps the video industry complete the important transformation from analog TV to digital TV; H264's contribution to the industry is even more self-evident, everyone knows no matter what terminal (TV, mobile phone, computer) and what service ( Broadcasting, satellite, Internet, video conferencing, etc.) basically fully support this standard; H265 has made an important contribution to the popularization of high-definition ultra-high-definition video and HDR video. The latest 6th-generation standard VVC, in addition to serving existing applications, reducing bandwidth costs and improving user experience, it can also empower emerging video applications under 5G, such as AR/VR, 360-degree panoramic video, and ultra-high-definition video. 4K, 8K, etc.

 

Why do we persist in making the 6th generation video standard like this? Every time the standard is updated, the video industry chain needs to open up an end-to-end ecosystem from the service-side content producer to the final consumer, including every link in the middle, and every link must be updated. Because it takes such a great effort to update the video standard, we have a basic requirement for each new standard: double the coding efficiency under the same video quality, which means that compared with the previous generation, the bandwidth of the new standard must be saved. Reached 50%.

 

Let's take a look at some of the words frequently encountered in the VVC standard:

  • VVC: Versatile Video Coding, Versatinle refers to flexible and multifunctional features

  • VTM: Test model reference software platform

  • JVET: Joint Committee of ITU-T and ISO/IEC MPEG

  • H.266: VVC is the double standard of two international standards organizations, H.266 is the standard number of VVC in ITU-T

 

Before the official standardization of VVC began, the International Standards Organization and member companies had carried out pre-research and technology accumulation for many years. As can be seen from the figure above, starting from the beginning of 2015, JVET has undergone two and a half years of pre-research on coding technology, and built and improved the JEM reference software platform; by mid-2017, under the same PSNR indicator, JEM is relative to HEVC A bit rate saving of 34% has been achieved, providing strong technical support and performance proof for the official start of the formulation of the next-generation video standard.

 

In addition, in the pre-research process of JVET accumulating next-generation coding technology, due to the influence of emerging applications such as AR/VR, JVET has also fully studied 360-degree panoramic video. To this end, JVET has established the 360Lib reference software platform, combined with JEM, to provide a complete set of workflow and performance analysis capabilities for panoramic video processing, compression and quality evaluation. In October 2017, when VVC was basically mature in JEM and 360Lib, the two standards organizations ITU-T and ISO/IECMPEG published a joint technical solicitation, which included 3 main video formats: standard dynamic SDR video (mainstream video format ), high dynamic HDR video, and 360 panoramic video. This is also the first technical call for multiple video formats since the 6th generation standard.

 

In April 2018, a total of 32 organizations around the world submitted 23 responses to the solicitation. Under the same PSNR, the best response provided a bit rate savings of more than 40%. Since then, VVC standardization has officially started. From April 2018 to July 2020, after more than two years of hard work, the first version of VVC was officially finalized.

 

The Video Standards Team of Dharma Academy began to participate in the formulation of the VVC standard in early 2019. It took a year and a half and submitted many technical proposals to be adopted into the VVC standard, making important contributions to the formulation of the VVC standard.

 

 

The blue part in the above figure represents the performance evolution of the VVC reference software platform VTM-1.0 to VTM-9.0, and the performance gain of VVC relative to HEVC on high-definition ultra-high-definition video. We can see that the VVC standardization process for more than two years is mainly divided into two stages: the first half mainly focuses on adding advanced coding tools to improve the performance gain of VVC, so the compression performance rises rapidly during the first year of VVC standardization; in the second half At this stage, the standards committee JVET pays more attention to the detailed work of the VVC standard design, pays attention to the design integration between the various VVC coding tools, and ensures the realization of low-power and efficient software and hardware. Therefore, in the second year of VVC standardization, we can also See that the performance gain of VVC tends to stabilize gradually.

 

In addition, the above figure also provides the complexity evolution process of the VVC reference software platform VTM-1.0 to VTM-9.0. The red line shows the encoding time. As you can see, as the encoding performance gain increases, the encoding complexity also increases quickly. The gray line represents the complexity of the decoder, which has been maintained at less than twice compared to HEVC, which shows that the complexity of the VVC decoder is very acceptable. How to make a good VVC real-time encoder and get the highest performance under the premise of the lowest complexity, there are a lot of technologies and knowledge in the middle, which is also the important reason why Shoutao and Dharma Institute jointly develop this project.

 

The figure above lists more than 30 VVC coding tools. Under the framework of hybrid video coding, new tools have been added to all functional modules to improve the compression performance of VVC. In addition, VVC features flexible and multifunctional features, so some important video content of specific scenes has been considered in the standard formulation process, such as encoding tools for screen content and 360-degree panoramic video.

The figure above shows the contribution of each coding tool in VVC to performance gain and complexity. On this graph, if an encoding tool falls on the upper right of the graph, it means that it has good compression performance and low complexity. But we can see that there is actually no free lunch. Actual data shows that tools with good coding performance have relatively high complexity, such as ALF. Therefore, when we develop actual commercial encoders, how to reasonably choose to use these encoding tools is of vital importance to the feasibility of the encoder's complexity and performance. In addition, from the above figure, we can see that among the many coding tools of VVC, there are 8 performance gains that can exceed 1%, and the others are relatively small.

The figure above shows the performance gain of VVC on mainstream SDR video. For high-definition and ultra-high-definition video, under the same PSNR index, VVC can save 38.9% of bandwidth compared to HEVC, and for image coding, this performance gain is 26.7%.

 

The code rate savings shown in the table above does not reach 50%. As a new-generation standard, does VVC achieve the design goal of doubling the efficiency? Because the most authoritative basis for judging video quality is subjective quality, formal subjective quality verification will be carried out before and after the finalization of each generation of standards, and the final bandwidth savings of each generation of standards are also measured under the same subjective quality. The following figure shows the preliminary data obtained by VVC subjective quality verification work on two ultra-high-definition 4K videos through very strict subjective quality evaluation methods. We can see that under the same subjective quality, the bit rate of VVC is more than that of HEVC. 50%.

VVC has multi-functional and flexible characteristics. The above picture shows the objective performance of HDR video and 360 panoramic video. We can see that under the same objective performance, VVC can save up to 30% in the two mainstream HDR video content (PQ and HLG), and save up to 32.5% in 360 panoramic video. Among them, the gain on 360 video is mainly obtained from two aspects: On the one hand, VVC replaces HEVC with a more powerful coding kernel, and the other part of the performance gain is obtained through the use of more advanced projection formats. In addition, the above data only shows the bit rate savings under objective performance. Subjective evaluation of HDR and 360 panoramic video is also being carried out in an orderly manner. The subjective evaluation of 360 panoramic video is also mainly led by the Ali standard team. It is expected next year There will be a formal report at the beginning.

 

 

In the process of participating in the formulation of the VVC standard, the DAMO Academy’s video standards team contributed to low-latency real-time communication, screen content, lossless compression, high dynamic range compression, inter-frame prediction, high-level syntax, etc. in terms of coding technology. technology.

At the same time, our team members serve as the acting host of the JVET conference and branch, the person in charge of the panoramic video in the VCC performance acceptance work, the editor of the test model algorithm description document, the chairman of the special discussion group (AHG), and the person in charge of several core experiments. Established a certain influence for Alibaba in the international video standards organization.

 

Let's take a look at the latest video industry trends and the application of VVC in these video trends.

From the above industry report on the forecast of various types of data on the Internet, we can see that video will always be a large bandwidth user; compared with last year's pie chart (left), not only the overall data volume will be 5 times after 5 years The proportion of video in the overall data volume will continue to grow rapidly.

 

There are four main reasons for the continuous and rapid growth of video data: First, the video is more abundant, whether it is e-commerce (Taobao), social networking, entertainment (Youku), or news, including emerging applications such as smart cities, there are more and more forms of video consumption ; 2. Everyone is more and more accustomed to video consumption anytime, anywhere; 3. Consumers have higher and higher requirements for video signals, from high-definition to ultra-high-definition; finally, everyone hopes that the video format is more novel, so based on immersive video AR/VR applications will rise quickly.

 

Take Taobao live broadcast, bandwidth cost accounts for a large proportion. From the perspective of daily activity and average duration, less than a year's time has increased very rapidly. The monthly bandwidth cost has increased by orders of magnitude and accounts for a very important part of the overall business cost. Nowadays, the live broadcast screen is also very complicated, and there are more sports. Everyone has higher and higher requirements for the definition of the anchor, and the requirements for technical indicators such as resolution and frame rate have increased. At present, Taobao has achieved an average bandwidth of 800Kpbs for challenging video content. From the perspective of the H265 encoder, it has achieved extreme compression. If you want to significantly reduce bandwidth costs, you can only do so by updating video standards.

 

The main goal of Ali's 266 project is to serve Taobao live broadcasts. It is hoped that Taobao live broadcasts can be encoded in real time during Double 11 next year, and the compression performance will be significantly improved compared to Ali 265.

 

Fraunhofer HHI is a very prestigious German research institution that has done many generations of video standard development and has also made great contributions to the development of the VVC standard. They announced their open source VVC codec in September of this year. We did an actual measurement of this open source VVC codec, and the encoding speed can only reach 0.5 frames per second on Taobao live video, which is far from our real-time encoding requirements. And for applications like Taobao Live, the decoder must have the best mobile terminal optimization. These reasons have made us more aware that we need to make our own first-class codecs to efficiently serve our internal business. This is a very important thing and the main goal of the Ali 266 project.

 

Finally, we will look at other related video standards in MPEG and the work of other video standards organizations. Earlier we mentioned that novel video is one of the main video trends that everyone should pay attention to. This mainly refers to immersive video. MPEG sees this important trend in the video industry. In addition to formulating a new generation of VVC video compression standards, it has also formulated a complete series of MPEG Immersive immersive media standards, including point cloud compression standards, six degrees of freedom video and audio compression standards, and Some file format standards for immersive media.

 

In addition to the International Video Standards Organization, everyone also knows another influential video standard development alliance, Alliance for OpenMedia (AOM). AOM started with Google’s VP8 and VP9, ​​and introduced the first-generation AOM standard AV1 in 2018. At the same time, AOM recently also began planning to develop the next-generation video standard AV2. From the domestic standards organization, AVS has passed three major generations of standards and is now developing AVS3 secondary devices. The first phase of AVS3 is very consistent with the timeline of VVC. The technical request form was released at the end of 2017. In 2018, the request for technical proposals and responses were collected, and the HPM reference platform was also released. After a period of iteration, the first version of AVS3 was finalized at the end of 2019. At present, AVS is continuing to develop the second version of AVS3, and the goal is to finalize the second version by the end of next year, with performance targets exceeding VVC. The Dharma Academy team is also actively participating in the formulation of the second version of AVS3 and has made important technical contributions to the formulation of the second version of AVS3.

 

Finally, I will share with you what the future of international video standards will look like. When we are making video standards, we will not only look at the modern, let alone look at the past, but also look at the future. In terms of technological trends, video coding based on deep learning is a technical direction that gives everyone a lot of hope. We know that the past 6 generations of international video standards are based on the traditional hybrid coding framework, which has many functional modules, but today this framework has almost reached the performance ceiling. There are two routes for introducing DL technology into coding: one is to combine with the traditional framework, adding DL coding tools to each functional module to make the performance better; the other is to make end-to-end DL video coding structure . In terms of technology trends, these two directions are very worthy of further investigation. Therefore, MPEG established the DNNVC special group in April of this year. What this special group wants to do is to explore the application of deep learning in video coding to break the performance ceiling of the traditional framework and find the future direction of video coding and decoding.

Finally, I would like to introduce to you three parts of the work that the DAMO Academy video technical team is mainly responsible for: The video standard team focuses on hard-core technologies such as VVC, AVS3, AV2, DL encoding, VCM, and DCM. In terms of video hardware implementation, our team has developed an ultra-high-definition real-time 265 encoder with leading compression performance among similar products in the industry. It also provides full hardware-based and efficient video pre-processing capabilities. It currently serves Youku’s live broadcast business. In terms of video software implementation, in addition to leading the Alibaba 266 project just mentioned, our team also cooperates deeply with the Taobao team to provide software and hardware coding optimization solutions based on H264 and H265 for video conferences, reducing business costs and increasing users Experience.

✿ Further   reading

Author| Ye Yan

Edit| Orange

Produced| Alibaba's new retail technology

Guess you like

Origin blog.csdn.net/Taobaojishu/article/details/110412606