Application of web multimedia technology in video editing scene

Good morning everyone, I am very happy to come to our Tencent LIVE Developer Conference. The application of web multimedia technology in video editing scenes that I shared today is a very interesting topic, I hope everyone can gain something.

First introduce myself. I am Yuan Yunhui. I joined Tencent in 2010. I am currently in charge of the web team at Tencent Cloud Video Center.


The web team of Video Cloud is a multi-media front-end technology team, committed to providing customers with more innovative scene-based cloud services. Recently we have some practical experience in the field of video production. I think it is related to the front-end technology and the theme of the conference. It fits well, so I am very willing to come here to share with everyone.

My sharing is mainly divided into four parts: the first part introduces the business background; the second part introduces the cloud clip, which is a web-side video production platform; the third part introduces the micro clip, which is a video editing plug-in for a small program; Share some follow-up plans and front-end technical challenges.

First, let me introduce the production cloud framework that Tencent Video Cloud is vigorously developing. With the continuous advancement of digitization, we can see that the content media industry has become a trend of cloudification and intelligent transformation. The production cloud is based on Tencent Cloud's powerful live broadcast, on-demand, video AI and other capabilities to provide integrated content production and distribution for the industry. cloud service. Including rich scene-based applications such as cloud editing, cloud directing, cloud retweet, cloud media resources, etc., to help customers realize full-link cloud transformation and upgrades such as "collecting, editing, broadcasting, distributing, and storing", and improving content production for practitioners Distribution efficiency. The cloud clip and micro clip I shared today are an important part of this big framework.

First introduce the cloud clip, you can see that the interface is very cool, this is an online video editing tool on the web, similar to the Mac client iMovie, providing audio and video cutting, screen adjustment, adding titles, music, subtitles, stickers, special effects, Filters, transitions and other functions can meet most of the video non-linear editing needs.

When it comes to video editing, people generally think of software apps on the native client. How is it implemented in the native client technology? Here I take the flow chart of Android as an example to introduce. The basic principles of other platforms are similar.

First, the video needs to be unpacked, the video track is extracted for decoding, and the picture data is drawn using opengl. This is because the post-processing of the video requires the use of opengl's powerful graphics rendering capabilities to facilitate the addition of special effects, filters and more multimedia elements. Then encode the OpenGL drawing content, and finally encapsulate it with the audio into a media file. This is only a basic principle, and of course it will be much more complicated in practical applications.

So how to implement it on the browser side, the browser does not natively support the high-level interface of codec, and mediaExtractor and mediaMuxer correspond to mediaStream. Can it be used for it? How to design the overall?

With questions, I share three questions: how to render the video frame, as in the above process how to draw the video in the browser webgl; how to operate the preview in real time, that is, how to design the code structure to facilitate real-time preview of various operations of video clips ; And several ways to share and export.

The first solution for rendering video frames, we can use the webassembly version of ffmpeg to decode the video, decode the yuv data, convert it to rgb, and then draw it on the canvas. There are also many in the industry that use this solution to play H265 encoding and flv that do not support it. Format video. There are two points to note: First, although the performance of wasm is better than js, it is also a soft solution, which takes about 25-30ms per frame; second, it is necessary to implement a video and audio synchronization solution similar to browser video, which will be very complicated. .

The second solution is what we are using. In the webgl interface, texImage2D specifies that the texture can receive video objects, so we can draw to webgl through off-screen video. Browser video is decoded by native hardware and has better performance. The wasm solution can only achieve 30 frames. It is not smooth to the naked eye in the video editing scene, and it is more complicated, so the video texture solution is more cost-effective. Of course, everyone will say that 2d canvas drawImage can also be used, but this is not convenient for us to use more capabilities of webgl.

Looking at how to operate the preview in real time, students who have played games may realize that in fact, video editing software is similar to games or animation production tools. All have timeline sequences, drag and drop to generate real-time preview, and have a main timer to drive. Our current player adopts the idea of ​​game development. It inputs track data in real time by operating the time axis, and the player manages and dispatches the editing objects in the scene to finally form a movie sequence.

Let's talk about the most important export, whether the front end can export, the answer is yes!

I will introduce two options:

The first is through the ffmpeg wasm version, which can completely simulate the processing flow of the native end. ffmpeg run refers to the well-known ffmpeg command line interface, which is very powerful;

The second type can be implemented through browser native interfaces such as captureStream and mediaStream.

These two interfaces are generally used for screen recording applications, and the cloud clip also uses screen recording materials. Both the picture and the sound can get the mediaStream object through captureStream, and then combine it through its addTrack method. But this method can only be combined into one audio stream, and it is a bit stretched when there are multiple audios. At this time, some advanced techniques of web audio can be used, such as dynamic connect to maintain a stream, which can actually be achieved.

Overall, for the front-end export, the captureStream method is simple, but it needs to be further verified in complex scenarios. Export quality is a problem. ffmpeg will be more powerful. Their common problem is that ffmpeg is limited by the performance of the browser, while captureStream must wait for the canvas to finish playing normally, so the export performance may only be 1:1.

The third method is to export the back-end, ffmpeg, opencv, and opengl are all mature back-end components. The advantage of the back-end is that the native compiled program has better performance and is convenient for mass production through API calls or ai. What's not very convenient is that it is troublesome to align the effects of the front and back ends. We are considering the overall business, and currently we have chosen this option.

In terms of overall architecture, we have also fully considered various customer usage scenarios. It can support multiple levels of paas integration solutions. For example, using our SDK, the cloud clip editor can be fully embedded in iframe mode, and the developer's business process can be opened up with the perfect cloud api interface. For capable developers, we can also provide core components such as players for secondary development. At the same time, it also supports the use of saas to change the skin. In the saas version, you can use more scene-based applications such as broadcast station and media asset management to open up the entire production link.

In terms of performance, we can see that our editor has a rendering efficiency of close to 60 frames in the core preview playback scene. In terms of material import, the browser’s local file capabilities are also utilized as much as possible, that is, upload and edit, and cloud synchronization to ensure the efficiency of editing and use everywhere.

Next, let’s introduce micro-cutting . Micro Clip is a video editing plug-in for a small terminal. Developers can embed it into their own small program to complete the video editing business logic. You can experience it by searching for micro clips or scanning the QR code. At present, a simplified version of the native short video editing capability is realized.

Let me talk about the technical principles that you are interested in. From the figure, you can see that it is very similar to the native processing flow introduced earlier. The core here is that WeChat recently provided a small program interface for the decoder module. The edit preview can be obtained in this way. The video frame picture is drawn to webgl.

On export, a MediaRecorder is provided to record the canvas screen, and finally it is combined and packaged with audio. This is a basic principle of small programs, which can be seen in official documents.

In the front-end solution, it is also a web solution, based on webgl, which is in the same line as Cloud Clip. However, things are not that simple.

We have encountered various challenges, and we are quite impressed: the first point is environmental adaptation. Our type of game is in the form of a small program plug-in. WeChat small programs and small games are completely isolated, which means The interfaces and practices that have been developed for several years in small games cannot be used here. For example, canvas and video cannot be dynamically created, workers cannot be used, etc., and various methods are needed to avoid and be compatible; the second point is rendering and playing Above, WeChat's low-level interface such as decoder is not perfect for the time being, and it will encounter many problems such as seek confusion and audio and video synchronization. The third point is that if you are doing video editing, you need to use special effects like micro-vision vibrato. This needs to be used. When it comes to shaders, this area is also an area where the front-end has less contact. Traditional designers can’t do it at all. They need to spend a lot of energy to solve problems. The fourth point is complicated export, multiple video image merging, multiple audio locations, etc. WeChat is not ready yet. For the time being, it is only given the most basic atomic ability, which requires a lot of skills to achieve it. We have basically solved these problems through our efforts.

In terms of results, we are the first to implement a complete set of core processes from media selection, camera, to cutting and merging of multiple video images, adding text, special effects, filters, and music, and client-side export. Continue to enhance capabilities, and the performance experience will continue to be polished with colleagues on WeChat.

In terms of micro-cut plug-in architecture design, developers can use a clip component that we use out of the box, which can configure rich parameters such as cutting time, UI style, watermark, etc. At the same time, you can also use our sub-components for secondary development, such as cameras, croppers, players, etc., which can be used on demand to develop editing tools that meet your own business scenarios. We are a set of components. Even if we don’t make editing tools, there are many business scenarios, such as using the player component as a multimedia display solution. Welcome everyone to try it out.

Finally, share the planning outlook. In this area of ​​technology, I think there may be several directions:

The first point: editing experience

Our current editing tool capabilities are not very complete. How to make it comparable to professional native tools and make more users willing to use it requires a lot of effort. Editing ability, ease of use, and some innovations in technology and experience are very important.

The second point: content ecology

As a service provider of to b, how do we provide customers with material solutions, such as some graphic animations, dynamic stickers, etc. In fact, this piece is quite difficult. First of all, it is not a standard technically. There are more details. It is the integration of design and platform technology architecture. It’s hard for us to do it ourselves, but even harder for customers, so I think there will be a lot of user value here.

The third point: Intelligent

The combination of front-end and AI/AR is already very common on the mobile side, so whether it is possible to use it on a large scale on the web and small programs requires continuous exploration of scenarios, and there will also be great performance challenges here. And some scenario-based templates, back-end intelligent production, etc. are also essential.

Fourth point: online collaboration

Because of the epidemic, the term is particularly popular now. Any industry needs collaboration, including the content media industry. We can see that there are already some workflow applications in the industry such as review, modification and annotation, which may involve more complex collaboration, multi-terminal synchronization and Many media technical details etc.

These are big challenges for front-end technology, but also full of opportunities. I found that many front-ends are now researching serverless. This is the future, then forward, it may be webgl, media technology, webassembly, etc., such as webgl, the front-end AI, VR, AR, games, etc. we are familiar with are inseparable from it. Shader alone is complex enough, it can be used to do many advanced special effects, animations and other visual things, involving programming language knowledge, mathematics knowledge, graphics knowledge, and even a certain aesthetic. Therefore, learning is endless, and the ceiling of front-end technology has always been high.

Finally, thank you all, this is what I have shared today.

Backstage reply keywords [TLC] can get guests to share PPT.

Guess you like

Origin blog.csdn.net/Tencent_TEG/article/details/108480265