The voice is nice, the face value can be played, based on PaddleGAN to match the artificial intelligence AI voice model with dynamic pictures (Python3.10)

With the help of So-vits, we can train a variety of timbre models by ourselves , and then reproduce any song we want to enjoy , realizing the freedom of ordering songs, but sometimes we always feel that something is missing. This time we let AI Trump's singing voice and his stalwart image appear at the same time, and based on PaddleGAN, we built the "Knowing King" with "beautiful voice and beautiful image".

PaddlePaddle is Baidu's open source deep learning framework. Its functions are all-encompassing, covering a total of 40 models in the three major fields of text, image, and video. It can be said that it can be seen in the field of deep learning.

Wav2lip, a sub-module in the PaddleGAN visual effect model, is the secondary packaging and optimization of the open source library Wav2lip. It realizes the synchronization of the character's mouth shape and the input lyrics and voice. It sounded like it was singing.

In addition, Wav2lip can also directly replace the dynamic video with lip shapes, and output a video that matches the target voice. In this way, we can directly customize our own oral image through AI.

Configure CUDA and cudnn natively

It is not easy to run the PaddlePaddle framework locally, but fortunately, it is endorsed by Baidu, a giant in the field of deep learning in China, and the document resources are very rich. As long as you follow the steps step by step, there will be no major problems.

First, configure the Python3.10 development environment locally, see: All-in-one coverage, install and configure Python3 on different development platforms (Win10/Win11/Mac/Ubuntu) with different architectures (Intel x86/Apple m1 silicon). 10 development environment

Then, you need to configure CUDA and cudnn locally. cudnn is a CUDA-based deep learning GPU acceleration library. Only with it can you complete deep learning calculations on the GPU. It is equivalent to a working tool, and CUDA, as a computing platform, requires the cooperation of cudnn, and the two versions must be matched.

First click on the N card control center program to check the CUDA version supported by the native N card driver:

It can be seen from the picture that the author's graphics card is RTX4060, and the current driver supports the maximum version of CUDA12.1. In other words, as long as it is less than or equal to CUDA 12.1, it is supported.

Then check the official documentation of the PaddlePaddle framework to see the framework version supported by Python3.10:

https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#ciwhls-release

According to the document, for Python3.10, the highest supported version of PaddlePaddle is win-cuda11.6-cudnn8.4-mkl-vs2017-avx, that is, the version of CUDA is 11.6, and the version of cudnn is 8.4. Not supported anymore.

So this machine needs to install CUDA11.6 and cudnn8.4.

Note that the versions must match, otherwise the program cannot be started later.

Knowing the version number, we only need to go to the official website of the N card to download the installation package.

CUDA11.6 installation package download address:

https://developer.nvidia.com/cuda-toolkit-archive

cudnn8.4 installation package download address:

https://developer.nvidia.com/rdp/cudnn-archive

First install CUDA11.6. After the installation is complete, decompress the cudnn8.4 compressed package and copy the decompressed file to the CUDA11.6 installation directory. The CUDA installation path is:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6

Then you need to add the bin directory to the environment variables of the system:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin

Then enter the demo folder in the terminal:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite

Execute the bandwidthTest.exe command and return:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>bandwidthTest.exe  
[CUDA Bandwidth Test] - Starting...  
Running on...  
  
 Device 0: NVIDIA GeForce RTX 4060 Laptop GPU  
 Quick Mode  
  
 Host to Device Bandwidth, 1 Device(s)  
 PINNED Memory Transfers  
   Transfer Size (Bytes)        Bandwidth(MB/s)  
   33554432                     12477.8  
  
 Device to Host Bandwidth, 1 Device(s)  
 PINNED Memory Transfers  
   Transfer Size (Bytes)        Bandwidth(MB/s)  
   33554432                     12337.3  
  
 Device to Device Bandwidth, 1 Device(s)  
 PINNED Memory Transfers  
   Transfer Size (Bytes)        Bandwidth(MB/s)  
   33554432                     179907.9  
  
Result = PASS  
  
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

It means that the installation is successful, and then the GPU device can be queried through deviceQuery.exe:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>deviceQuery.exe  
deviceQuery.exe Starting...  
  
 CUDA Device Query (Runtime API) version (CUDART static linking)  
  
Detected 1 CUDA Capable device(s)  
  
Device 0: "NVIDIA GeForce RTX 4060 Laptop GPU"  
  CUDA Driver Version / Runtime Version          12.1 / 11.6  
  CUDA Capability Major/Minor version number:    8.9  
  Total amount of global memory:                 8188 MBytes (8585216000 bytes)  
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM  
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM  
  (24) Multiprocessors, (128) CUDA Cores/MP:     3072 CUDA Cores  
  GPU Max Clock rate:                            2370 MHz (2.37 GHz)  
  Memory Clock rate:                             8001 Mhz  
  Memory Bus Width:                              128-bit  
  L2 Cache Size:                                 33554432 bytes  
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)  
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers  
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers  
  Total amount of constant memory:               zu bytes  
  Total amount of shared memory per block:       zu bytes  
  Total number of registers available per block: 65536  
  Warp size:                                     32  
  Maximum number of threads per multiprocessor:  1536  
  Maximum number of threads per block:           1024  
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)  
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)  
  Maximum memory pitch:                          zu bytes  
  Texture alignment:                             zu bytes  
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)  
  Run time limit on kernels:                     Yes  
  Integrated GPU sharing Host Memory:            No  
  Support host page-locked memory mapping:       Yes  
  Alignment requirement for Surfaces:            Yes  
  Device has ECC support:                        Disabled  
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)  
  Device supports Unified Addressing (UVA):      Yes  
  Device supports Compute Preemption:            Yes  
  Supports Cooperative Kernel Launch:            Yes  
  Supports MultiDevice Co-op Kernel Launch:      No  
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0  
  Compute Mode:  
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >  
  
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.1, CUDA Runtime Version = 11.6, NumDevs = 1, Device0 = NVIDIA GeForce RTX 4060 Laptop GPU  
Result = PASS

At this point, CUDA and cudnn are configured.

Configure the PaddlePaddle framework

After configuring CUDA, let's install the PaddlePaddle framework:

python -m pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html

Install the gpu version of paddlepaddle here, the version number is 2.4.2.post116, 2.4 is the latest version, and 116 represents the version of Cuda. Note that the version must not be mistaken.

Then clone the PaddleGan project:

git clone https://gitee.com/PaddlePaddle/PaddleGAN

Run the command to compile and install the PaddleGan project locally:

pip install -v -e .

Then install other dependencies:

pip install -r requirements.txt

Here are a few pits that need to be explained:

First of all, the numpy library that PaddleGan relies on is still an old version. It does not support the latest version 1.24, so if your numpy version is 1.24, you need to uninstall numpy first:

pip uninstall numpy

Then install version 1.21:

pip install numpy==1.21

Then verify that PaddleGan is installed successfully in the Python terminal:

import paddle  
paddle.utils.run_check()

If this error is reported:

PreconditionNotMetError: The third-party dynamic library (cudnn64_7.dll) that Paddle depends on is not configured correctly. (error code is 126)  
      Suggestions:  
      1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.  
      2. Configure third-party dynamic library environment variables as follows:  
      - Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`  
      - Windows: set PATH by `set PATH=XXX; (at ..\paddle\phi\backends\dynload\dynamic_loader.cc:305)  
      [operator < fill_constant > error]

You need to download the cudnn64_7.dll dynamic library, and then copy it to the bin directory of CUDA11.6, and the dynamic library address will be posted later.

Running the validator again returns:

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32  
Type "help", "copyright", "credits" or "license" for more information.  
>>> import paddle  
>>> paddle.utils.run_check()  
Running verify PaddlePaddle program ...  
W0517 20:15:34.881800 31592 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.6  
W0517 20:15:34.889958 31592 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.  
PaddlePaddle works well on 1 GPU.  
PaddlePaddle works well on 1 GPUs.  
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

It means that you're done and the installation is successful.

local reasoning

Next, let's add dynamic pictures to Trump's songs. First, generate a static picture of the king through Stable-Diffusion:

About Stable-Diffusion, please move to: Artificial intelligence, Danqing master, full platform (native/Docker) build Stable-Diffusion-Webui AI painting library tutorial (Python3.10/Pytorch1.13.0), due to space limitations, no longer here repeat.

Then enter the tools directory of the project:

\PaddleGAN\applications\tools>

Put Trump's static pictures and song files into the tools directory.

Then run the command for local reasoning:

python .\wav2lip.py --face .\Trump.jpg --audio test.wav --outfile pp_put.mp4 --face_enhancement

Here --face is the target image, --audio is the song that needs to match the lip shape, and the --outfile parameter is the output video.

face_enhancement: The parameter can add face enhancement. If no parameter is added, the enhancement function will not be used by default.

But adding this parameter requires a separate download of the model file.

The key to Wav2Lip's breakthrough in precise lip sync with speech is that it uses a lip sync discriminator to force the generator to continuously produce accurate and realistic lip movements. Furthermore, it improves visual quality by using multiple consecutive frames instead of a single frame in the discriminator, and uses a visual quality loss (rather than just a contrastive loss) to account for temporal correlations.

Specific effect:

epilogue

Sometimes, the development of artificial intelligence (AI) technology will really make people feel like a world away, hearing may not be true, and seeing may not be true. Finally, the finished video can be searched on the Youtube platform (B station): Liu Yue's technical blog, welcome everyone to taste, all the installation packages and dynamic libraries involved in this article can be found at:

https://pan.baidu.com/s/1-6NA2uAOSRlT4O0FGEKUGA?pwd=oo0d   
提取码：oo0d