Smart Storage: Multimedia Lab's AIGC Capabilities Help Data Vientiane Open the Door to Smart Editing

introduction

AIGC is innovating the content production process in terms of efficiency, quality, creativity, and diversity. With the emergence of phenomenal products such as firely and midjourney, AIGC will gradually serve various scenarios and content producers in content production. The demand scenarios for content production are constantly increasing, and the multimedia laboratory is also continuing to make efforts in the field of AIGC, and has successfully applied its capabilities to multiple industries such as media, social networking, and entertainment through Data Vientiane, gradually consolidating the productivity of full-scenario content.

1) Football Highlights

Data Vientiane and media customers have a large amount of sports video data processing needs, especially during the game, football games as the most popular sports event content consumption has been high, and fans have high requirements for the viewability of the content It is also quite high. Due to the length of time, unedited football games cannot be directly put into sports news, short videos and other scenes for event promotion. To this end, based on the self-developed AIGC series technology, Tencent Multimedia Lab brings smart editing capabilities of football games to customers through Tencent Cloud Data Vientiane products, and automatically generates game highlight content without the

Technology Introduction

In terms of data, we collected more than 1,200 representative football matches and labeled high-precision data sets. The data set covers various famous leagues and cups, accumulating more than 600 hours. At the same time, we set up 19 key event categories, and accurately marked the start point, end point and corresponding category of all key events for each video. The distribution of key events is shown in the figure below, which can basically reflect the data distribution of real games.

Dataset key event distribution

In terms of algorithms, we designed a multi-modal solution based on smart stripping and goal detection to complete smart editing.

Multi-modal intelligent editing solution

Based on the image sequence and audio information of the video, the intelligent stripping module extracts multi-modal features through subtasks, and then uses the event detection model to locate various key events. For single-frame images, audio information, and image sequences, we supervised training of three feature extractors based on event labels to extract multi-modal features such as images, sounds, and actions .

Subtasks

data

Label

Accuracy

Image classification

800,000 sheets

Goal, clearance, free kick, corner kick, penalty kick, red and yellow cards, substitution, treatment, pop-up information, interview, game start, game end, entry, playing the national anthem, handshake, opening/ending, offside, hydration, others

90.8%

sound classification

7000

The audience cheers, the commentary is calm, the commentary is excited, the whistle, other

84.1%

action recognition

30,000 steps

Sports goal, sports clearance, replay, free kick, corner kick, penalty kick, red and yellow cards, substitution, treatment, bullet frame information, interview, game start, game end, entry, playing the national anthem, handshake, opening/ending, offside , moisturizing, other

87.4%

The event detection model accepts multi-modal fusion features reassembled by time-series convolution coding as input, evaluates the probability of each moment belonging to the beginning, end, and process of an event, and constructs candidate intervals and corresponding time-series features. In the candidate interval evaluation stage, time series features are used to evaluate the intersection ratio between the corresponding interval and the real event interval, so as to realize event location. Finally, by combining the event tags obtained in the subtask stage and the post-processing algorithm, accurate event splitting results can be obtained.

Event Detection Model

The effect index mAP of smart stripping has reached 82%, and the corresponding index of positioning ball and playback and other events has reached more than 90%.

media2.mp4

Examples of sports highlights

The goal detection module supplements the goal event by judging the score change. In the above scheme, the recall rate of the goal event is not high because we have not made full use of the score information in the video. The most intuitive solution is to use the score information to help judge the current goal situation. In practical applications, the goal detection module detects the position of the scoreboard for the video frame sequence, and then expands the edge appropriately before doing text detection to obtain the score position of the current frame, and obtains the score position of the current game through multi-frame clustering, and finally uses The text recognition technology obtains the score sequence, and confirms the goal moment according to the unilateral increase rule of the score.

Goal detection module

The goal detection module increases the recall rate of goal events to 96%, improving the integrity and excitement of highlights.

Goal.mp4

Goal Highlights Example

In the smart editing solution for football scenes, we have also added a star recognition module to edit specific player clips. This module uses the face detection algorithm to locate all the faces in the video, then uses the face recognition model to encode the face features, obtains similar face sequences through the clustering algorithm, and finally finds the matching star in the star face database . Currently our program can identify more than 500 human balloon stars.

Now this function has been implemented in multiple sports customer cases of Data Vientiane, and the average daily processing volume has continued to rise, greatly improving the post-production rate of Data Vientiane customers around events.

2) Video Highlights

In addition to sports, the self-developed smart editing also supports TV dramas, animations and other film and television scenes, and can automatically generate required highlight videos based on information such as characters, actions, emotions, and keywords specified by users.

Taking character direction as an example, the user only needs to provide 2~3 photos of a certain character, and the system can register the corresponding character. When editing, specify the character (or list) to be edited to generate a character collection. Pair it with a music card, and it will be popular!

People to Highlights

In terms of clue collection, users can specify a certain keyword as a clue according to the content of the plot, and the system will automatically retrieve the relevant content in the drama and generate a collection of keywords to create the clearest story line on the entire network.

Blue Silver Grass Collection.mp4

clue word collection

At present, this function has obtained data from Vientiane Experience Center, Smart Toolbox, and consoles with a high click-through rate, attracting many users to use it. This scene can flexibly adjust the key actions and characters required for the collection according to user needs. Users are welcome to test it.

Summarize

In the future, Tencent Multimedia Lab will continue to invest in the construction of AIGC's core capabilities including intelligent editing . Ye Jialiang, head of Data Vientiane, said that Data Vientiane will rely on the multimedia lab's accumulation and construction of the underlying AI algorithm for many years, combined with its own knowledge of the industry and business The ability to help customer businesses realize intelligent content production was quickly launched, and the rich algorithms also make the business more diverse and flexible. You can go to Data Vientiane to experience related capabilities and create data stored on COS. Data Vientiane will continue to cooperate with the multimedia laboratory to provide customers with more intelligent services such as automatic video editing, intelligent composition, and music scoring, so as to improve the efficiency of content production and creation.

Guess you like

Origin blog.csdn.net/Tencent_COS/article/details/129872910