With the help of So-vits, we can train a variety of timbre models by ourselves , and then reproduce any song we want to enjoy , realizing the freedom of ordering songs, but sometimes we always feel that something is missing. This time we let AI Trump's singing voice and his stalwart image appear at the same time, and based on PaddleGAN, we built the "Knowing King" with "beautiful voice and beautiful image".
PaddlePaddle is Baidu's open source deep learning framework. Its functions are all-encompassing, covering a total of 40 models in the three major fields of text, image, and video. It can be said that it can be seen in the field of deep learning.
Wav2lip, a sub-module in the PaddleGAN visual effect model, is the secondary packaging and optimization of the open source library Wav2lip. It realizes the synchronization of the character's mouth shape and the input lyrics and voice. It sounded like it was singing.
In addition, Wav2lip can also directly replace the dynamic video with lip shapes, and output a video that matches the target voice. In this way, we can directly customize our own oral image through AI.
Configure CUDA and cudnn natively
It is not easy to run the PaddlePaddle framework locally, but fortunately, it is endorsed by Baidu, a giant in the field of deep learning in China, and the document resources are very rich. As long as you follow the steps step by step, there will be no major problems.
First, configure the Python3.10 development environment locally, see: All-in-one coverage, install and configure Python3 on different development platforms (Win10/Win11/Mac/Ubuntu) with different architectures (Intel x86/Apple m1 silicon). 10 development environment
Then, you need to configure CUDA and cudnn locally. cudnn is a CUDA-based deep learning GPU acceleration library. Only with it can you complete deep learning calculations on the GPU. It is equivalent to a working tool, and CUDA, as a computing platform, requires the cooperation of cudnn, and the two versions must be matched.
First click on the N card control center program to check the CUDA version supported by the native N card driver:
It can be seen from the picture that the author's graphics card is RTX4060, and the current driver supports the maximum version of CUDA12.1. In other words, as long as it is less than or equal to CUDA 12.1, it is supported.
Then check the official documentation of the PaddlePaddle framework to see the framework version supported by Python3.10:
https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#ciwhls-release
According to the document, for Python3.10, the highest supported version of PaddlePaddle is win-cuda11.6-cudnn8.4-mkl-vs2017-avx, that is, the version of CUDA is 11.6, and the version of cudnn is 8.4. Not supported anymore.
So this machine needs to install CUDA11.6 and cudnn8.4.
Note that the versions must match, otherwise the program cannot be started later.
Knowing the version number, we only need to go to the official website of the N card to download the installation package.
CUDA11.6 installation package download address:
https://developer.nvidia.com/cuda-toolkit-archive
cudnn8.4 installation package download address:
https://developer.nvidia.com/rdp/cudnn-archive
First install CUDA11.6. After the installation is complete, decompress the cudnn8.4 compressed package and copy the decompressed file to the CUDA11.6 installation directory. The CUDA installation path is:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6
Then you need to add the bin directory to the environment variables of the system:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin
Then enter the demo folder in the terminal:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite
Execute the bandwidthTest.exe command and return:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>bandwidthTest.exe
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: NVIDIA GeForce RTX 4060 Laptop GPU
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12477.8
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12337.3
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 179907.9
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
It means that the installation is successful, and then the GPU device can be queried through deviceQuery.exe:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>deviceQuery.exe
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 4060 Laptop GPU"
CUDA Driver Version / Runtime Version 12.1 / 11.6
CUDA Capability Major/Minor version number: 8.9
Total amount of global memory: 8188 MBytes (8585216000 bytes)
MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
(24) Multiprocessors, (128) CUDA Cores/MP: 3072 CUDA Cores
GPU Max Clock rate: 2370 MHz (2.37 GHz)
Memory Clock rate: 8001 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 33554432 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: zu bytes
Total amount of shared memory per block: zu bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: zu bytes
Texture alignment: zu bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.1, CUDA Runtime Version = 11.6, NumDevs = 1, Device0 = NVIDIA GeForce RTX 4060 Laptop GPU
Result = PASS
At this point, CUDA and cudnn are configured.
Configure the PaddlePaddle framework
After configuring CUDA, let's install the PaddlePaddle framework:
python -m pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
Install the gpu version of paddlepaddle here, the version number is 2.4.2.post116, 2.4 is the latest version, and 116 represents the version of Cuda. Note that the version must not be mistaken.
Then clone the PaddleGan project:
git clone https://gitee.com/PaddlePaddle/PaddleGAN
Run the command to compile and install the PaddleGan project locally:
pip install -v -e .
Then install other dependencies:
pip install -r requirements.txt
Here are a few pits that need to be explained:
First of all, the numpy library that PaddleGan relies on is still an old version. It does not support the latest version 1.24, so if your numpy version is 1.24, you need to uninstall numpy first:
pip uninstall numpy
Then install version 1.21:
pip install numpy==1.21
Then verify that PaddleGan is installed successfully in the Python terminal:
import paddle
paddle.utils.run_check()
If this error is reported:
PreconditionNotMetError: The third-party dynamic library (cudnn64_7.dll) that Paddle depends on is not configured correctly. (error code is 126)
Suggestions:
1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
2. Configure third-party dynamic library environment variables as follows:
- Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`
- Windows: set PATH by `set PATH=XXX; (at ..\paddle\phi\backends\dynload\dynamic_loader.cc:305)
[operator < fill_constant > error]
You need to download the cudnn64_7.dll dynamic library, and then copy it to the bin directory of CUDA11.6, and the dynamic library address will be posted later.
Running the validator again returns:
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
W0517 20:15:34.881800 31592 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.6
W0517 20:15:34.889958 31592 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
It means that you're done and the installation is successful.
local reasoning
Next, let's add dynamic pictures to Trump's songs. First, generate a static picture of the king through Stable-Diffusion:
About Stable-Diffusion, please move to: Artificial intelligence, Danqing master, full platform (native/Docker) build Stable-Diffusion-Webui AI painting library tutorial (Python3.10/Pytorch1.13.0), due to space limitations, no longer here repeat.
Then enter the tools directory of the project:
\PaddleGAN\applications\tools>
Put Trump's static pictures and song files into the tools directory.
Then run the command for local reasoning:
python .\wav2lip.py --face .\Trump.jpg --audio test.wav --outfile pp_put.mp4 --face_enhancement
Here --face is the target image, --audio is the song that needs to match the lip shape, and the --outfile parameter is the output video.
face_enhancement: The parameter can add face enhancement. If no parameter is added, the enhancement function will not be used by default.
But adding this parameter requires a separate download of the model file.
The key to Wav2Lip's breakthrough in precise lip sync with speech is that it uses a lip sync discriminator to force the generator to continuously produce accurate and realistic lip movements. Furthermore, it improves visual quality by using multiple consecutive frames instead of a single frame in the discriminator, and uses a visual quality loss (rather than just a contrastive loss) to account for temporal correlations.
Specific effect:
epilogue
Sometimes, the development of artificial intelligence (AI) technology will really make people feel like a world away, hearing may not be true, and seeing may not be true. Finally, the finished video can be searched on the Youtube platform (B station): Liu Yue's technical blog, welcome everyone to taste, all the installation packages and dynamic libraries involved in this article can be found at:
https://pan.baidu.com/s/1-6NA2uAOSRlT4O0FGEKUGA?pwd=oo0d
提取码:oo0d