AI vocal cloning with realistic timbre and natural rhythm is a limited-time benefit!

Sound injects soul into digital people.

At the 2023 Yunqi Conference, Alibaba Cloud Video Cloud was interviewed by CCTV-2 Financial Channel, sharing and demonstrating how to use cloud intelligent editing to complete digital human rendering and video editing in one stop.

As shown at the beginning of the video, AI reproduces the "original voices" of actors. In recent years, with the development of AI technology, more and more AI voices have been used Scenarios such as virtual digital people, voice social networking, and consultation broadcast.

Video: Generative large models enter the field of video to expand the application scenarios of "digital humans"icon-default.png?t=N7T8https://w.yangshipin.cn/video?type=0&vid=u00005703dc< /span>

(Video source: CCTV-2 Financial Channel)

#01 AI voice, how to reproduce vividness

Stiffness, machine feeling, electronic sounds, and unnatural flow and intonation... These were the main effect problems faced by speech synthesis technology in the past.

How to achieve AI reproduction effects with realistic timbre and natural rhythm?

First, the Alibaba Cloud Video Cloud technical team will perform pre-processing such as noise reduction and repair on the audio of the user corpus data, Reduce the quality requirements of corpus and improve the clarity and sound quality of corpus;

At the same time, the basic material training model based on video cloud's multi-scenario, multi-source, and multi-language can simulate the natural effect of dialogue and dialect ability, and supports custom adjustment of emotions and colors;

After the audio is synthesized, it is then subjected to post-processing such as super-resolution repair, so as to improve the sound quality and naturalness of the broadcast of the synthesized audio in the entire process and in multiple dimensions .

This also means that the restrictions on the recording conditions of real voices are reduced. With a very small amount of corpus data, voice cloning can be completed with a simple recording of as short as 20 sentences.

On this basis, the intelligent media service voice cloning also fully considers the convenience, safety and efficiency of independent and flexible customization.

In terms of convenience, based on the video cloud's years of algorithm accumulation to improve the pre-processing of the original sound quality, users can collect daily broadcast corpus to form training materials to ensure the emotional scene fit and sound naturalness of the training corpus.

In terms of content security, users can restrict the input content from the process by recording according to the system settings, or they can freely use the methods of overlaying directional recording and voiceprint comparison.Avoid the risk of infringement.

After the material recording is completed, a cloned human voice code that captures key voiceprint features will be efficiently generated and can be quickly put into speech synthesis applications.

#02 High-fidelity sound restoration to meet various scenarios

The current vocal cloning customization service is divided into three levels of customization plans: advanced customization, light customization and basic version.

lBasic version:

is available online. The system automatically assigns 20 sentences of copywriting for simple recording, covering three subdivided scenes: story, interaction, and navigation, and can easily and quickly reproduce human voices. Through the recording content uploaded by users, key voiceprint features can be captured quickly and at low cost to clone human voices in 30 minutes, achieving user-level entertainment effects. The basic version is suitable for interactive entertainment applications that quickly capture typical voiceprint features.

Basic version-20 sentence corpus recording interface and steps

l Popular version (light customization):

Submit rich, clear and high-quality voice materials by yourself, integrating multi-dimensional algorithms of sound quality detection, audio noise reduction and digital cloning. Based on 15-30min of effective audio, the sound can be restored with high fidelity. At the same time, you can also specify the timbre and mood according to different use needs, flexibly adapting the sound to meet different application scenarios. The public version (lightly customized) is suitable for public Internet-level high-definition vocal applications. (NEW Recommended)

Original training sound (intercepted)icon-default.png?t=N7T8https://v.youku.com/v_show/id_XNjAxOTM1NzU5Mg==.html

Voice cloning resultsicon-default.png?t=N7T8https://v.youku.com/v_show/id_XNjAyMjA1NTc0NA==.html

(Voice clone public version effect)

l Advanced customized version:

Alibaba Cloud provides professional recording tutors, training algorithms and result optimization full-process services. It customizes training algorithms for individual voices and is expected to achieve radio and television media level simulation. True effect, achieving personalized high-standard vocal restoration. The advanced version is suitable for radio and television media-level ultra-high-definition vocal applications.

#03 Product Power and Vitality

After the human voice cloning is completed, dubbing can be intelligently generated through text-to-speech TTS. Digital human video synthesis can also be completed through text-driven methods.

Alibaba Cloud's "Cloud Intelligent Editing" provides professional audio and video editing (multi-level elements, professional subtitles, transitions, special effects filters) capabilities and complete video template tools.

Browser clipping interface

Use the browser non-linear editor or AE to create a template library, combine the reproduced sound with the digital human automatic rendering, and realize digital human business cards, digital human MOOCs and other standard template combination solutions to meet the needsRequirements for various types of digital human video production. Short videos, teaching videos, advertising, etc.

#04 Create a “sound-moving” digital person

"Limited time benefit" countdown! If you initiate a customized digital human and vocal cloning public version (light customization) service within the validity period of the event, you can enjoy the privilege of lifetime free renewal!

Activity validity period

The limited-time welfare activity is valid until 24:00 on December 31, 2023. Please complete the digital human customization task submission operation within the specified period.

Applicable platforms

The current limited-time benefit is only available to users who initiate digital human customization through Alibaba Cloud Intelligent Media Service.

Are digital people that have been customized before the event applicable to the current event?

Be applicable. As long as you initiate a customized task before the end of the event, you will meet the event conditions and automatically enjoy the welfare privileges.

Benefit page details link:"Limited time benefit" digital human customized lifetime free renewalicon-default.png?t=N7T8https://www.aliyun.com/activity /cdn/avatar_free_renewal_activity

Welcome to joinOfficial Q&A "DingTalk Group"Consultation and communication: 48335001108

Guess you like

Origin blog.csdn.net/VideoCloudTech/article/details/134570688