AI digital human video based on Wav2Lip+GFPGAN (taking the deployment of AutoDL computing power cloud platform as an example)

Table of contents

foreword

1. Introduction to AutoDL cloud computing platform

2. Deploy the Wav2Lip-GFPGAN code on the AutoDL cloud computing platform

2.1. Create an instance of AutoDL cloud computing power

2.2. Import the source code into the instance

2.3. Remote AutoDL cloud service

2.4. Installation dependencies

2.5. Import video and audio directory files

2.6. Configuration parameters

2.7. Acceleration of academic resources

2.8, run run.py

2.9. Export video

3. Conclusion

4. References and further reading


foreword

In recent years, the rapid development of artificial intelligence has greatly changed our lives and brought infinite possibilities. Among them, AI digital human is one of the important technologies. They are generated by computers, which can simulate human behavior and appearance, and even produce video content that is almost indistinguishable from real people. All of this is inseparable from the support of advanced artificial intelligence algorithms and powerful computing platforms. In this article, our topic is how to deploy and use two artificial intelligence models, Wav2Lip and GFPGAN, on the AutoDL cloud computing platform to create AI digital human videos. Our goal is to use the Wav2Lip model to synchronize the input audio with the digital human's mouth shape, and then use the GFPGAN model to perform high-quality reconstruction of the generated digital human's facial image, thereby creating a realistic AI digital human video.

1. Introduction to AutoDL cloud computing platform

The AutoDL cloud computing platform is a powerful cloud computing platform that focuses on providing one-click solutions for large-scale parallel computing resources and AI model deployment. It provides an efficient, reliable and easy-to-use environment for researchers, developers and enterprises to implement complex computing tasks and deployment of AI models. One of the main functions of the AutoDL platform is to provide massively parallel computing resources. It has powerful computing clusters and high-performance computing nodes, which can quickly process large-scale data and complex computing tasks. This enables users to complete large-scale calculations and model training in a short period of time, improving work efficiency. In addition, the AutoDL platform also provides a one-click AI model deployment function . Users can easily upload and configure their own AI models, and then use the tools and interfaces provided by the platform to deploy them to computing clusters for inference and application. This greatly simplifies the process of AI model deployment, saving users time and effort. One of the strengths of the AutoDL platform is its ease of use. It provides an intuitive user interface and easy-to-understand operation guide, so that even non-professional users can quickly get started and use the functions of the platform. In addition, the platform also provides a wealth of software support and development tools, users can customize and expand according to their needs. The benefits of using the AutoDL cloud computing platform are not limited to the convenience of computing resources and AI model deployment. It is also scalable and flexible, and can be expanded horizontally and vertically according to user needs to adapt to growing computing needs and emerging technical challenges.

2. Deploy the Wav2Lip-GFPGAN code on the AutoDL cloud computing platform

2.1. Create an instance of AutoDL cloud computing power

First register on the AutoDL official website ( AutoDL-Quality GPU Rental Platform-Rent GPU on AutoDL ), and then select the GPU on the "computing power market".

At this time, we choose the GPU version of RTX3090 , and the billing method is pay-as-you-go , which saves money.

 Then select "Basic Image", the number of GPUs is 1, and select the Pytorch image in the figure below, because the source code environment will require Pytorch later, just click "Create Now", and wait for a while to create successfully.

2.2. Import the source code into the instance

Baidu network disk link: https://pan.baidu.com/s/1einWK_uy-HdpZ4xOgEK0YA?pwd=oshu 
Extraction code: oshu

First download the source code to a local folder, then import the compressed package of the source code to the Alibaba Cloud Disk, click " AutoPanel " to enter this page, scan the code for authorization , and import the code into the instance through the Alibaba Cloud Disk .

2.3. Remote AutoDL cloud service

It is recommended to use VScode to remotely use ssh.

First download and install VScode ( Visual Studio Code - Code Editing. Redefined );

Then enter VScode, click "Remote Explorer", and then click "+".

Then enter the login command and password respectively.

 Entering the following state is almost the same. 

 

2.4. Installation dependencies

Open the VScode terminal interface and follow the steps to install the following commands.

sudo apt update
sudo apt install ffmpeg
pip install -r requirements.txt

2.5. Import video and audio directory files

  • inputs/{custom file name}/source_video: the basic video of the digital human being made.
  • inputs/{custom file name}/source_audio: produced audio files.
  • outputs: output the produced synthetic video.

Note: This custom file name must be a combination of pure letters or numbers or a combination of letters and numbers!

The digital human basic video can go to Heygen ( HeyGen - AI Video Generator ) to make a digital human;

Audio files can go here ( Free Microsoft Speech Generation Tool ) to generate audio.

 

2.6. Configuration parameters

After importing the file, open the run.py file and change the environment path.

Then change the folder path under inputs. (i.e. {custom filename} above)

 

2.7. Acceleration of academic resources

Open the terminal and enter the following commands to speed up access to academic resources, such as github. This is very necessary, and it is related to automatically downloading those weight files when running the code later.

Setting Up Academic Resources to Accelerate

source /etc/network_turbo

Cancel Academic Resource Acceleration (you can cancel it if you don't use it)

unset http_proxy && unset https_proxy

2.8, run run.py

After ensuring that the above configuration is complete, run run.py on the command line.

python run.py

2.9. Export video

After the execution is completed, it will be generated in the corresponding outputs folder, and the file in the red frame is the final synthesized high-definition video file.

 

2.10, effect demonstration

 

3. Conclusion

With the rapid development of artificial intelligence technology, AI digital human has become an increasingly important research and application field. Two powerful AI models, Wav2Lip and GFPGAN, can help us create realistic AI digital human videos. Among them, the Wav2Lip model is responsible for synchronizing the audio with the digital human's mouth shape, while the GFPGAN model can perform high-quality reconstruction of the generated digital human facial image. The AutoDL cloud computing platform facilitates the deployment and operation of these models. This platform not only has large-scale parallel computing resources, but also provides a one-click AI model deployment function, allowing us to quickly and efficiently complete the deployment and operation of the model. We have discussed how to deploy and use the Wav2Lip and GFPGAN models on the AutoDL cloud computing platform to create AI digital human videos, and demonstrated the generated video effects through examples. Although there is still room for improvement in these techniques, such as improving the quality of the generated images and optimizing the effect of lip-syncing, they have already shown strong potential. In the future, we look forward to seeing more innovative applications and developments. With the continuous advancement of artificial intelligence technology, AI digital humans may become more realistic, more intelligent, and play a role in more fields. And we will continue to pay attention to the new developments in this field, looking forward to it bringing more possibilities to our lives.

4. References and further reading

(If you have any questions, you can consult in the comment area (づ ̄3 ̄)づ╭❤~)

Reference source code ①: Wav2Lip

Reference source code ②: GFPGAN

Reference source code ③: GitHub - ajay-sainy/Wav2Lip-GFPGAN: High quality Lip sync

Reference source code ④: https://github.com/jecklianhuo/Wav2Lip-GFPGAN-main

Reference blog ①: AI anchor based on Wav2Lip_c# Digital Human_Mr Data Yang's Blog-CSDN Blog

Reference blog ②: High-definition AI anchor based on Wav2Lip+GFPGAN_Mr Data Yang's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/Little_Carter/article/details/131265986