AI digital human video based on SadTalker (taking the deployment of AutoDL computing power cloud platform as an example)

Table of contents

1. Introduction of SadTalker

2. Preparation

3. Digital human case (picture to video)

4. Display effect

5. References


1. Introduction of SadTalker

SadTalker is an open source virtual digital human production tool, which can generate a digital human broadcast video with a picture. SadTalker generates the 3D (head pose, expression) coefficients of 3DMM, and uses the 3D face renderer for video generation. SadTalker also provides some new modes, such as static mode, reference mode, resize mode, etc., for better customizing the application.

2. Preparation

Deploy the AutoDL image and open the terminal;

Deployment tutorial: AI digital human video based on Wav2Lip+GFPGAN (Take the deployment of AutoDL computing power cloud platform as an example)

Download my source code to AutoDL from my Baidu network disk (the source code includes weights, a must for lazy people, highly recommended!)

Link: https://pan.baidu.com/s/1etXmmJ_ftwVSaqIe1EK37g?pwd=i2on 

Extract code: i2on 
 

You can also run the following command to download the source code. (Downloading the source code requires additional download weights, which is not recommended!)

(Additionally, this SadTalker version is v0.0.2)

git clone https://github.com/Winfredy/SadTalker.git

First cd to the SadTalker directory, and then run the following commands step by step.

sudo apt update

sudo apt install ffmpeg

pip install -r requirements.txt

3. Digital human case (picture to video)

Enter the following commands on the command line to run the model.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --still \
                    --preprocess full \
                    --enhancer gfpgan

The following command is an example of my input, for reference only, those paths need to be modified.

python inference.py --driven_audio AIHuman/audio/AIHuman.mp3 --source_image AIHuman/images/03.jpeg --result_dir AIHuman/results --still --preprocess full --enhancer gfpgan

Parameter Description

--driven_audio: Input audio file path.
--source_image: input image file path, support audio file and video MP4 format.
--checkpoint_dir: Model storage path.
--result_dir: Data export path.
--enhancer: HD model, choose gfpgan or RestoreFormer

4. Display effect

 

5. References

Reference project: SadTalker - GitHub

Reference materials: AI anchor based on SadTalker, Stable Diffusion is also available_Mr Data Yang's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/Little_Carter/article/details/131360227