How to experience and use the AI sound generator VALL-E launched by Microsoft?

VALL-E is an ai sound generation tool launched by Microsoft. Different from general ai sound generation tools, VALL-E can imitate the speaker's emotion and tone to make speech, which is more intelligent and interesting.

VALL-E official website

VALL-E (valle-demo.github.io)

Introduction of VALL-E

Microsoft recently released an artificial intelligence tool called VALL-E, which can imitate human speech with just 3 seconds of audio.

The tool is trained on 60,000 hours of English speech data and uses 3-second clips of specific speech to generate content. Unlike many current AI tools, VALL-E can replicate a speaker's mood and tone, even words that the speaker himself has never said.

A Cornell University paper synthesized several sounds using VALL-E, and you can listen to these AI-synthesized audios on GitHub.

 

VALL-E function

In many cases, Vall-E outperforms current text-to-speech models, the researchers noted. However, the study also writes that there are currently several problems with the AI ​​model. For example, some words in the text hints may be unpronounced, missed entirely, or appear twice in the output. Additionally, the model currently has difficulty imitating certain voices, especially those with accents.

Like other new AI technologies, VALL-E has raised concerns about safety and ethics. Microsoft has issued an ethics statement regarding the use of VALL-E, but there is no clear indication of future uses.

Currently, Microsoft Vall-E is not yet open source. Microsoft has created a Vall-E repository on GitHub, but currently only contains a description file.

Guess you like

Origin blog.csdn.net/qqerrr/article/details/129643733