Intelligent production is our opportunity to change the times-the ideal and future of Alibaba Cloud Video Cloud

Zou Juan, Senior Technical Expert of Alibaba Cloud Intelligent Video Cloud

LiveVideoStackCon 2020 Beijing lecturer interview

"I am Zou Juan from Alibaba Cloud Video Cloud. I am now mainly responsible for the architecture design and server development of media production and media processing. Before coming to Alibaba, I have been deeply involved in the audio and video technology field, as media assets in the broadcasting industry. Design and development of audio and video processing, fast editing and other systems.

Recently, I pay more attention to some technologies related to media production, such as ultra-high definition production, how to better integrate AI into the production process, some technologies that are even involved in the creative stage, and how to maintain professionalism and effects At the same time, some tools and technologies that can lower the threshold of use. Finally, there are some more professional scene modes and methods, such as the way of film editing, OB truck/professional studio/post-production process and method, etc. "

"Talking about AI out of the scene is a hooliganism"

LiveVideoStack: Since joining Alibaba Cloud, what is the most memorable project you have participated in? Can you share one or two stories with the readers of LiveVideoStack?

Zou Juan: The most memorable project is the project of the 2018 World Cup stars. I am responsible for the design of technical solutions and core development work. The technical difficulties of this project, first of all, time is particularly tight. From our decision to be a star collection to the final launch, it only takes a week. Because the World Cup is only one month away, you missed it.

In this week, we need to choose the AI ​​algorithm, the algorithm for the main interval of the timeline, the algorithm for the diffusion and convergence of the boundary of the timeline, and how these algorithms cooperate at the engineering level, and the final synthesis effect must be repeatedly tuned

The second difficulty is that in such a short period of time, in addition to the speech recognition that has made some reserves for the World Cup scene before, other algorithms may not have time to do some scene adaptations. At this time, it is necessary to adjust the boundary of the algorithm and also That is, the upper limit of the effect that each algorithm can achieve is very well understood. It is necessary to be able to comprehensively use face recognition, voice recognition, video splitting, and Fifa real-time field records to serve the final timeline generation.

The last point is also very important, that is, you must have a good understanding of the scene. There is a joke, I think it is very interesting, "out of the scene to talk about AI is to play hooligans" , if you do not understand football, do not understand the World Cup, there may be no way to understand which scenes of a star in the whole game are wonderful, or there are The highlight.

So I am very fortunate that I am a senior football fan for more than ten years. I know football, the World Cup, stars, and teams. This is also an important reason for the rapid launch of the project.

LiveVideoStack: What was the biggest difficulty you encountered in the process of exploring cloud media production platforms and related products?

Zou Juan: The biggest difficulty is the foundation and architecture design, because it is related to the subsequent vitality of the entire platform. There are multiple dimensions of collaboration and relationships to consider, such as the collaboration between cloud and terminal, the relationship between professionalism and inclusiveness, and the integration of on-site real-time production and editing synthesis.

For example, cloud and terminal, our design concept is that the cloud can be integrated and seamlessly cooperate, or it can be disassembled to provide services independently, that is, the cloud + terminal can be separated and combined with a PaaS+ architecture , while maintaining the same rendering effect as possible; then It is the balance between professionalism and low threshold. In the design of the timeline, the professionalism of production and production must be retained, but in the packaging and use of the timeline, inclusive value needs to be considered, so that more customers can use the platform or service.

For on-site real-time production and post-editing production, not only the input and output are matched with each other, but also the blessing and reuse of AI capabilities in various links also requires good design.

LiveVideoStack: From the perspective of technology and product planning, how can cloud editing meet the requirements of professional video producers and match the editing habits of this group?

Zou Juan: Cloud editing considers the needs of professional video production from 0 to 1, which has something to do with my experience in the broadcasting industry for many years. I know very well that for professional technology in the video field like video editing, relevant planning needs to be done from the first edition, otherwise the subsequent iteration path will be very painful.

The core design of cloud editing (that is, the design of the timeline/storyboard) , three years ago, fully considered the professional non-editing requirements for tracks, materials, effects, stage layouts, etc., so that from the initial version , The professional video production model is supported at the timeline structure and production protocol level, so the upper-level services and tools can use a step-by-step model, and it will be relatively easy to do.

In 2018, we supported multi-layer image track overlay, multi-track mixing, multiple effects, adaptive stage layout and custom layout, and support for professional video production is relatively early.

LiveVideoStack: What are the key technologies for perfecting cloud editing products, and in what specific scenarios are Alibaba Cloud's strengths?

LiveVideoStack: What are the key technologies for perfecting cloud editing products, and in what specific scenarios are Alibaba Cloud's strengths?

Zou Juan:

There are several key technologies:

1) The richness of editing effects, and the consistency of cloud and end effect rendering

2) Real-time production + post-production integration technology

3) Technology in the field of UHD production

4) The evolution of AI in the field of production, from acting on materials, evolving to acting on editing effects and templates, and finally acting on the production process of finished products

Alibaba Cloud hopes to promote the evolution of most of the content expression and information dissemination medium from graphics to video in this era. This is our strength.

LiveVideoStack: What are the target users and applicable scenarios of Alibaba Cloud Cloud editing?

Zou Juan: The target customers are those with media production requirements. Our goal is to provide customers with scalable and personalized smart media production services.

There may be many specific scenarios, but in abstract terms there are three:

1) The first major cloud integrated production: including multi-track audio and video mixing and overlay, subtitle production, audio and video text mixing, special effects rendering, template factory packaging, etc.;

2) The second scene focuses on intelligently generating videos, and this intelligent scene will be replicated in the field of production;

3) The third scenario is content resource library management, such as smart media asset scenarios in the material library and finished product library.

LiveVideoStack: What is the uniqueness, or highlight, of Alibaba Cloud's editing products?

Zou Juan: The uniqueness of Alibaba Cloud's editing products lies in the openness and professionalism of the product architecture design .

The openness is reflected in that you can use it as a pure cloud service, or you can use only the SDK or tools on the web or mobile side. The output of these two parts is completely equal.

Of course, it can also be integrated in the cloud and used as a PaaS+ or general SaaS form to ensure the consistency of the production process and the final product. In other words, the use of cloud editing can be freely combined according to customer needs.

Regarding professionalism, the previous question has already been discussed, so I won’t repeat it.

Another point-Alibaba Cloud's understanding of production and AI in editing products-we believe that production is the core and AI is the auxiliary . At present, AI has not yet reached the point where we can really create videos with stories. We will integrate AI capabilities into every layer of cloud editing, and then refine and abstract them into intelligent production scenarios. The scene is highly reproducible.

LiveVideoStack: What kind of environmental and technical conditions are needed for the real landing of cloud editing products?

Zou Juan: A more stable network environment and higher bandwidth are required, as well as the accuracy and effect of AI in intelligent production.

"Everyone in Alibaba Cloud runs the baton"

LiveVideoStack: What has inspired you the most on your technological advancement?

Zou Juan: On my way to advance technology, the person who inspired me the most was a technical tycoon I met when I first entered Ali. His technical vision, pattern and mind gave me a great touch.

He told me that when we do technology, we must keep our curiosity about technology, be more idealistic and think about the future, look at some international advanced technologies and concepts, and at the same time be able to down-to-earth what we can use. The implementation of technology to generate business value is to look up at the stars and keep your feet on the ground.

He also told me that everyone in Alibaba Cloud runs the baton. When he can still run, he will run desperately, tossing forward, and find a stronger person to pick it up before he finds that he can't run. A good one, I think this is the heritage of Alibaba Cloud.

LiveVideoStack: In the post-epidemic era, what new understanding do you have about audio and video services/technology?

Zou Juan: In the post-epidemic era, I have two understandings of audio and video technologies and services. The first is that audio and video technologies are quickly familiar and needed by enterprises, and audio and video cloud services will become the infrastructure of cloud computing. Audio and video technology and services are no longer the special needs of certain industries, but the basic needs of the entire industry ;

The second is that during the epidemic, the society and various enterprises have significantly improved the acceptance of online services, especially cloud services. So in the post-epidemic era, many customers hope to quickly precipitate online services, especially those related to audio and video, such as video conferencing, online education, and live broadcasting. This has spawned a demand for various video solutions and tools. Quickly lower the threshold for customers to use video technology and services, allowing customers to spend their precious time on their own business scenarios.

Viewed from another direction, it is equivalent to the entire era with new requirements for content and interaction methods, and this is also an opportunity for our audio and video technicians, an opportunity that may change the era.

LiveVideoStack: What is the specific concept of video cloud media production based on 5G, and how do you see the current specific application status of cloud editing?

Zou Juan: 5G will bring high bandwidth and low latency, so people's requirements for the quality of video content will be further improved. Therefore, on this basis, the cloud video media content production and production methods in itself will bring a very big change more professional, the picture is clearer, ultra-low delay will also give the media production experience and interaction, "what you see is "Income" production methods will become standard.

When cloud editing is currently applied in specific applications, it is also obvious that two needs can be seen. This and the above trend also reflect each other: the first is the professionalism of editing the film, the richness of the editing tools, and the content The quality requirements are getting higher and higher; the second is the participation of all people in content production, and the threshold for editing is getting lower and lower.

Editor: Coco Liang

LiveVideoStackCon 2020 Beijing

October 31-November 1, 2020

Click [read original text] for more detailed information

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/109252184