Zoe Liu: traditional methods have their own strengths and depth of learning

The first video coding contest on MSU made "subjective ratings" First, the micro-frame innovation team in the end what had been done to try? Deep Learning really is the codec of the future? LiveVideoStack by e-mail interview with micro-frames co-founder Zoe Liu, MSU to unveil participate story behind the video coding contest.

Statement / ZoeLiu

Planning / LiveVideoStack

LiveVideoStack:Zoe Hello, this is the second time by mail and you interview dialogue, and the dialogue is not the same as last time, your identity has been changed.You can introduce the reader to take LiveVideoStack yourself?

Zoe:Ah, the last time I was a software engineer at Google, is now the co-founder of the micro-frames of the team. I was last year Nian 7 2018 moon from Google to leave, with my partner, Zhu Zheng created a micro-frame technology (Visionular) together. Time elapsed still pretty fast, I learned from Google from time job, just open Media Alliance (AOM) a new generation of video coding standard open source AV1 just ended and came out. Our micro-frame, now provided with an R & D team in Hangzhou, and at the same time Silicon Valley, and has focused on products, operations and marketing teams in Beijing. We focused video coding AI + and processing technology to build, deliver enterprise-oriented products and services, trying to video bandwidth demand smaller, clearer picture of the ultimate solution. We do AV1 optimized encoding core, but also to other mainstream H.264 coding standard, the introduction of smart coding and at processing engine, combined with AI technology in many areas of video processing and coding, the introduction of different forms of products, including private cloud deployment solution, and try public cloud SaaS solutions.

Prior to micro-frame, I worked at Google five years, it is one of the main contributors to AOM / AV1 of. From school to now, the image / video coding standards and corresponding codec optimized technology, relatively long period of research and development experience. I have been involved in Apple's FaceTime, TangoMe across mobile platforms VideoCallAPP, and GoogleGlass of VideoCall development and final delivery. Prior he has also worked in a number of laboratories, including BellLabs, NokiaResearchCenter, SunLabs, and HPLabs.

I would be very fate it with LiveVideoStack audio and video community. October 2017 was the launch of the first technical LiveVideoStack great while, when I was on behalf of Google to do AV1 relevant Keynote, the result of chance met my partner, also saw domestic audio and visual rapid development in the field of frequency, then step by step, changed my career path, I began to join the ranks of entrepreneurs. After each of LiveVideoStack General Assembly, I have not missed. LiveVideoStack is also grown into a nationally recognized audio and video field of the most influential force in the technology community. We look forward to growing together with LiveVideoStack. Here, too, we are looking forward to more seniors with industry friends exchanges together to promote the evolution and development of technology.

LiveVideoStack:Visionular of AV1Codec ranked first in the 2019 MSU video coding contest "subjective ratings", the only AV1 also on behalf of the competition. You can talk about the story behind it?

Zoe:Mentioned earlier, the core of our team is to create a video coding and video coding algorithm is applied to the AI and technology. Since I have been out of the team from Google AV1, AV1 through the whole cycle from scratch, we micro-frame is also the Chinese market and at the same time iQIYI join AOM first two members.

Our team was established at the beginning of last year, there is cooperation with Google, currently the most representative for AV1 open source codebase, called optimization libaom, there are more contributions. libaom there are more than 200 code contributions, both from me their team members, both contribute to the AV1 standard itself, but also for early libaom encoder acceleration. At the same time, we are involved in a lot more AOM ecological construction. Mr. MattFrost former chairman of AOM in the last year, twice a year to visit Hangzhou, have with our mission team when face exchange.

AV1 is not only open source coding standards, do not involve copyright, but also have some advanced coding tools to build on, MSU early assessment data some time, that presents a standard AV1 advantages relative to H.265, VP9 other mainstream coding standards. In addition, AOM members, including major overseas video content producers, such as YouTube, Facebook and other UGC platform, Netflix, AmazonPrime Video and other PGC platform; our domestic Internet giant Alibaba, Tencent, also have become an important member of the AOM .

Although AV1 ecological, standard-setting shortly after does take some time to accumulate; in fact, now Chrome and other major browsers already support hold AV1 decoding play, AndroidQ will fully support AV1, mobile terminal, especially Android end decoder chip in efforts to build, plus one member of AOM is Apple. We are relatively optimization of R & D AV1, starting earlier, plus our team in H.265 accumulated on the encoder's optimization efforts on AV1 ahead will be relatively few. We now demand not only for the scene, also includes live, the RTC scene, are trying to achieve product will AV1 floor. Our technology while polished, very concerned about the real needs of customers.

We in the media this year, the world's largest technology show IBC in Amsterdam, was invited GoogleCloud, a special presentation of our AV1 technologies and corresponding products. In September, Ali Yunqi Hangzhou conference, we are also involved in the theme of the video 5G + Roundtable, and other industries to explore all kinds of video coding standard development prospects community of scholars and technical colleagues. Summit AOM in the world for the first time in October, held in San Francisco , we were also invited AOM, we introduced the AV1 technical overview and performance updates. The AOMSummit all speech PPT shared on AOM website (https://aomedia.org/aomedia-research-symposium-2019/), in which AV1 codec optimized development status, the next-generation standard AV2 planning, coding and AI + technology, have some discussion.

We participate in MSU assessment, the most important of the mind, it is to be in the MSU strict, objective assessment of video encoding process, test it I have own performance encoder. MSU has a more stringent test of speed coding requirements. Even slow speed, including subjective evaluation file, they have to ask for them inside a given 1080p video, above a certain models to achieve a coding rate indicator per second. The MSU newspaper name deadline is the end of March this year, when we are still fully optimized AV1 relatively early stage, AV1 relatively complex encoding tools, the speed while maintaining the encoding performance is indeed full of challenges. We can represent the AV1 standard reference, have our own efforts, but also very grateful AV1 open source community. AV1 open source code, including liabom, SVT-AV1 building, providing gave us many lessons. We stand on the shoulders of giants go up.

LiveVideoStack:I noticed Visionular only appear in the "subjective ratings," the report, does not appear in the "objective ratings" report, which is why?

Zoe:Our optimized encoder, including our focus on research and development to optimize the main AV1, and we entered the 264 coding technology and product -step polishing, mainly aimed at advancing customer needs. 2B We are a company, our R & D technology and customer needs are thrust reversers. Our customers are most concerned about, it is subjective quality of the video, so our products as well as behind the development of algorithms, is largely subjective quality optimized for deployment. This should be our outstanding performance in the MSU subjective assessment category for a reason.

This year's MSU "HEVC / AV1VideoCodecsComparison2019" assessment report, divided into Free Edition and Enterprise Edition. Enterprise Edition to provide detailed full report can be obtained from the following payment link.

http://www.compression.ru/video/codec_comparison/hevc_2019/

MSU nearly two years of free public version, only to give an objective assessment based on quality assessment of SSIM index. If you can see the full MSU customer view data reports, you will see three kinds of MSU in accordance with objective indicators of quality assessment, detailed assessment of the results include SSIM / PSNR / VMAF, the package includes an objective quality assessment categories, we have 100 different in 1080p specific data on the video encoding performance, and the corresponding index ranking.

Our performance in objective quality assessment, although not result in subjective quality assessment categories as prominent, but equally there is a considerable competitive edge , especially in the performance of the PSNR data YComponent, remarkable. PSNR evaluation data needs to MSU Enterprise Edition can be seen in.

Front mentioned, we reference, mainly to test it on our own encoder, and no idea for the game and the match. MSU assess divided into categories, this year we participated in two major categories: First, the subjective quality assessment categories, another category called Rippingusecase, mainly refers to the slow speed. These two categories, require the video encoding rate must reach 1fps, i.e. within one second to complete a specified 1080p video encoding of the frame image. For AV1 is concerned, the challenge is still quite large, as compared to AV1 VP9, adds 70+ new coding tools, coding complexity greatly enhance decision-making, the need to achieve the encoding speed while maintaining the advantages of the standard, so that it reflects enough AV1 the standard strengths, not very easy to do.

Our objective data in the index, in some video sequence does appear badcase, some of the RD curve shows abnormal; at MSU subjective category test, we should be relatively lucky, these badcase did not occur as frequently.

MSU year of testing and evaluation, relatively very comprehensive and detailed, the evaluation period is relatively long. MSU now see the results of the evaluation, this is our performance results WZAuroraAV1 encoder submitted by the end of March. We have been in continuous optimization, including coding rate, coding performance to and encoder number of indicators multithreading, memory loss, etc., in our all-round improvement. At the same time, we've added on WZAurora different speed stage, for on-demand, live, RTC real time, such as different application scenarios, we are making efforts to create detailed. We look forward to the future there will be better results, in particular, we look forward to more high-quality products, can offer to our corporate customers.

LiveVideoStack:Do you think, AI compression technology has the potential to catch up and even surpass traditional coding technology?

Zoe:AI multimedia compression technology, is currently in the field of the picture, it should be earlier than the beginning of the video attempts. In particular neural network based on various depth picture of model compression attempts at subversion of the traditional picture coding technique, instead of the wavelet transform, DCT transform, and has made the traditional picture coding similar or equivalent frame encoding performance. Currently full picture coding based on machine learning technology, it has not yet entered coding standard, nor very mature product landing, mainly limited by the complexity of the codec, but it does show some potential.

Machine learning applied to video coding, the industry has been a lot of attempts, including AV1 open source codec codebase, there will be a lot to achieve paradigm cases. We can open codebaselibaom checkAV1 of each contribution CL (ChangeList), in a comment in the query NeuralNetworks such as keywords, you should see a lot of the practical application of machine learning, including optimized code control, and fast RD indicators speed estimate, fast encoding algorithm based on NN and so on. But compared to video images, it adds a time dimension, overall the appropriate encoding algorithm complexity is on the order of magnitude difference. In the latest video coding standard VVC (akaH266), AVS3, AV1 and other coding tools and proposal phase should reference model of open source to achieve, we learned that the AI algorithm based on machine learning tool, mainly in the traditional hybrid coding framework (op movement + 2D transformation estimation), for further optimization of the encoding of each module, and do not subvert the basic structure of the video encoding.

AI emergence of large-scale promotion 5G, it should bring a lot of new video will show, including the introduction of Stadia cloud gaming platform, not to VR panoramic video with the depth of promotion, there should be follow-up video encoding and AI technology is more multi-coupling, especially video coding and analysis at the integrated use of adaptive management. Subdivisions of different video content, there may be more focus on coding tools will appear, such as for screen content, specific coding tools game content, animation content, and based on ROI (region of interest) coding algorithm, can be relatively natural It leads into the video content classification, and detecting a specific object, which gave AI combined with the encoded with a relatively wide space.

We based technology to build the core of the enterprise, very often, we will compare the value of traditional video encoding, processing algorithms, and machine learning algorithms combined. I am here to give you an example: One of my brothers Qinghua line business, the past few years has been focused on grinding eye tracking, FOV detection of wearable integrated hardware and software products, they can now have the time to identify people at school do eye in the text scanning pattern (including a moving speed, movement smoothness, changes of direction, etc.), the eyes can be detected in the large space of interest museum scene objects. Their current core algorithms, and did not use the current depth study of relatively hot technology, but based on traditional computer vision and pattern recognition technology. I use this example, it is to highlight the value and potential of under traditional algorithms. Recognized by the machine learning applications comparative success of several scenes, including computer vision, natural language processing (NLP), big data analysis, computer vision and machine learning in this field of study is so sought after, the traditional algorithm also has many outstanding The advantages. Video encoding, in fact, have more depth learning and tradition count field method can be integrated, it is worth us to explore the.

LiveVideoStack:I recently experienced TutorABC of AV1 client (browser), the delay on the live lesson perfectly acceptable, but only on the PC side application can not be deployed on mobile devices.About Mobile AV1 end hardware codec support for the latest news to share it?

Zoe:Very happy to hear that you have practical experience AV1 displayed on TutorABC platform. We work with TutorABC team, together with the push AV1RTC files to their online platform, it should be the first to deploy AV1 online educational scene in the product line. Online education big lesson scene in general is teacher-end video-based, teacher and PC equipment is mainly used, can fully support AV1 coding; decoding end I have is the use of open source AV1 Decoder dAV1d, supported on Android and iOS mobile devices There is no problem.

AOM beginning of the establishment, the earliest version AV1 is derived from the VP9, a certain sense, VP9 can be seen as a subset of AV1. The complex calculation tool AV1 all off, you can even launch AV1 coded speed real-time file, but with the coding performance VP9 possible only difference the least bit. We will push to the RTC scene AV1, AV1 is to maintain the standard of superiority, the first screen while reducing time and reduce coding required to account for CPU and memory resources used. We strive under the same quality conditions, can significantly reduce the rate to reduce video Caton, or from another at an angle, the same bandwidth, bit rate conditions, can provide more high-definition picture quality, which ultimately serve to enhance the user experience.

Currently video on mobile end practical, indeed accounted for a major. AV1 open source software decoder dAV1d, has shown considerable movement in the end big potential. Of course, we are also looking forward to be the AV1 software coding scheme as soon as possible to push the mobile terminal, but also of the need to polish some time.

Support for hardware codecs, especially hardware decoder supporting mobile terminals, the promotion of eco-AV1 is a very important part. AOM major hardware manufacturers, have in the development of related AV1 codec chip. Which Realtek, Broadcom, etc., is relatively early launch AV1 hardware decoder IP prototype manufacturers. Qualcomm (Qualcomm), MediaTek (Mediatek) and other non-manufacturers AOM members, in response to market trends, also in the build-related chip solution. We understand the message from mobile phone manufacturers, chip decoding the first half of 2020 is expected to launch a large scale, especially for the Android platform. Support on iOS, Apple's style, will always be the product only when it is completely ready made public, but Apple has long been a member of AOM, this reflects their open source coding standard for AOM / AV1 some extent support stand. AV2 coding standards in pre-planning discussions already started, Apple AOM also very active one.

In addition, there are also people in the industry mentioned that TV terminal, likely before the end of the movement, launched AV1 hardware display device.

Encoding chips, including the face of IPC, mobile handset chip products, should be followed in. The industry has long been noticed, including Google, Huawei Hass and other teams to follow up very early in the development of standards AV1, AV1 encoding chips to try to build up. We also direct the solution that other vendors AV1 encoding and IP products, has also been with the appropriate end-user market and cloud vendors, the actual sale transaction phase.

Recommended reading:

???? Chinese battlefield --MSU Video Coding Contest

???? Song Lee: Many experts did not attend MSU Reviews

???? Blue Huafeng: business model driven enterprises to participate in MSU Reviews

???? MSU HD / slow speed comparison Codec: AV1 Slow compression efficiency of the first

???? MSU released 2018 video compression appraisal report

LiveVideoStack Fall Recruitment

LiveVideoStack is recruiting editors / reporters / operators, together with the world's leading multimedia technology experts and LiveVideoStack younger partners, to promote eco-development of multimedia technology. At the same time, you are welcome to make use of spare time, remote participation in content production. Please understand post information directly employed in the BOSS search "LiveVideoStack", or the exchange of research and editor of the package through the micro-letter "Tony_Bao_".

Published 449 original articles · won praise 325 · views 440 000 +

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/103308535