AI virtual anchor digital human technology realizes Wav2Lip [with full version tutorial] and [effect evaluation]

Preface
It is recommended to read the Feishu documentation directly: Docs icon-default.png?t=N4P3https://yv2c3kamh3y.feishu.cn/docx/S5AldFeZUoMpU5x8JAuctgPsnfg


Recently, many full-fledged people have sent private messages, wanting to know about the technical realization of AI digital human anchors. This article now implements and evaluates the Wav2Lip technology of the AI ​​digital human virtual anchor, and there will be other related technology implementations and evaluations in the future.

This article mainly implements picture speech (the Mona Lisa in the picture below), video fusion voice (the core here is the lip synchronization of the character's mouth shape and the voice in the audio). Mainly by using the Wav2Lip technology to combine the video and audio of two unrelated people, a complete video file is finally obtained, and the mouth shape of the video is consistent with the audio content. For example: Xiaohong's voice and Xiaohua's selfie video are combined into a final video; then when Xiaohong makes a sound of "ah", Xiaohua's mouth should be open, the following is a rendering), The fourth part of this article has a complete effect evaluation video !

This paper mainly expands through the following five parts:

Part 1 : Overview of Deep Forgery Technology

Part II : Overview of Wav2lip Technology

Part 3 : In-depth practice of AI anchor virtual human using Wav2Lip

Part Four : Effect Evaluation

Part V : Download the full version of Wav2Lip tutorial
Note: This case involves all content, including tutorials, pictures, videos, Wav2Lip, etc., are packaged and shared with everyone, and can be reproduced by yourself.

The following is the text


Part 1 : Overview of Deep Forgery Technology

The word deepfake is translated from English "Deepfake" (a combination of "deep learning" and "fake"). It is a technique for creating synthetic media using deep learning, a subfield of machine learning.
Deepfake Deepfake can be changed in three directions according to the media's concerns, namely forged vision (such as forged pictures or videos), forged audio (such as forged voice content, etc.), forged visual and audio (that is, a combination of the former two) , are completely fake).

One of the most important technologies of Deepfake Deepfake is expression reproduction, so that the expression of the target identity imitates the expression of the source identity (extremely consistent, which is naturally consistent with the expression of the original target person). This has great applications in the film and video game industries, such as post-processing adjustments to actors' facial expressions. The pictures and videos in this article are all self-generated, you can refer to: [Shocking Attack] AI video animation production "Wolf is Coming" reveals all the secrets! [A complete tutorial is attached] , the sound is generated by clipping (all materials have been placed in the tutorial).

Part II: Overview of Wav2lip Technology

Wav2Lip technology is a GAN-based lip movement migration algorithm, which realizes the synchronization of the generated video character's mouth shape with the input voice. Wav2Lip can not only output a lip-sync video that matches the target voice based on a static image, but also directly convert a dynamic video to a lip-shape, and output a video that matches the input voice, commonly known as "lip-sync". The main function of this technology is to make the mouth shape natural when synthesizing audio and pictures, audio and video.
If you want to use which model file, the readme.md in the github repository points out the key properties of each model, as shown in the screenshot below.

Project address: https://github.com/baoxueyuan/DeepFake

Model Description
Wav2Lip Highly accurate lip sync
Wav2Lip + GAN Slightly worse lip sync, but better visual quality
Expert Discriminator Expert Discriminator Weights
Visual Quality Discriminator Optic disc weights trained in the GAN setting


This article focuses on demonstrating: the final effect comparison of Wav2Lip and Wav2Lip + GAN Wav2Lip and Wav2Lip + GAN model, see the fourth part.

Part 3 : Use Wav2Lip to carry out in-depth practice of AI anchor avatar
Directly download the tutorial, complete and detailed, space is limited, here are only some screenshots:

Part Four: Effect Evaluation

The effect of AI digital human virtual anchor technology

Part V : Download of the full version of Wav2Lip tutorial

Follow the public account [脚学猪] and reply the number [ 5301 ] to get the download link.
All content involved in this case, including tutorials, models, pictures, videos, Wav2Lip, etc., are packaged and shared with everyone, and can be reproduced by themselves.

Guess you like

Origin blog.csdn.net/baoxueyuan/article/details/130954780