AI Voice - Character Voice Training

Recap

2023-07-02 Hangzhou will be cloudy and sunny on Sunday

There are three major items for getting started with AI, basic learning of AI painting, AI speech synthesis, and AI intelligent dialogue training. It has entered the stage of AI speech synthesis. I am rubbing my little hands together very excitedly. For a tone-deaf person like me, this is simply (get rid of the low-level Interesting, just keep a little lustful);

** Timeline **
a. 2021 Function Computing Programming Competition;
b. 2022 Eastern and Western Computational Architecture Design;
c. 2023 Artificial Intelligence Era;

Practice manual

1. Install UVR_v5.5.0
2. Vocal extraction

2.a Mix separation

2.b Dry sound extraction

2.c Extract your own voice

Complaints: bilibili is so inhumane. The uploaded videos cannot be downloaded. You can only use other tools to download them. Fortunately, you only need to use the video to extract the voice. It is really annoying to waste time.

Video analysis of station B: https://bilibili.iiilab.com/

** Prepare your own voice and video resources **
Duration: 10-30min
Clip: 3-10
Notes: I didn’t go to the recording studio to record the audio specifically, so the final effect should be unsatisfactory, but it’s just hard work. Fortunately, I saved it before. I need some video resources, otherwise I would be really sad this time.

2.d audio slicing

Note: Rename audio files (same as BAT)

3. Vocal training

3.a Human voice training set

3.b Start so-vits-svc

双击启动webui.bat

3.c Data preprocessing

Note: When preprocessing data, you need to be careful about the problem of video memory. You can perform data preprocessing by switching the predictor. This general configuration is only suitable for starting with 8G video memory. Otherwise, the data preprocessing will not pass;

Solution: Delete the audio whose audio time exceeds 20S or is less than 10s. After the data preprocessing is completed, continue to the next step of the training process. The normal end should be as shown in the figure below:

3.d timbre training

Note: The training process needs to last about 1-7 days, mainly because my computing power is average, and it cannot be said that the computer's computing power is too poor. It's just that AI requires computing power. Fortunately, after training the timbre model, it can be used for any song. nesting, so the timbre training needs to last at least about 1 week. It is best not to turn off the computer during this process, but turn it off in time. The training should also be based on the last time;

4. Training monitoring

Open the monitoring panel:

Summarize

There are too many pitfalls, and you will fall into them if you are not careful. You cannot talk about this thing in a eloquent manner, you have to do it by yourself, but generally you can basically master it after going through the process completely. I am not that interested in getting familiar with the code, the GPU fragmentation recycling mechanism, and the code. There is still a lot that can be done for optimization, but there is no rush this time, and it will not happen in a hurry. For children studying artificial intelligence, although laboratories at the graduate level are generally equipped with computing equipment and can be reimbursed, it is best to keep up with the performance of your own computer. AI is really not a poor man’s game, and the configuration is 20,000 yuan. The gaming laptops on the left and right are just entry-level configurations, but you still have to spend the money. Otherwise, if the equipment is one level lower, you may be a year or even several years behind others. Stop talking about talent. If you're not in the same environment, you won't even have a chance to compete.

appendix

Open source mp3 download address: https://tonzhon.com/playlists/6442733bc6d7bdf6d5155629
Video to audio: https://www.aigei.com/tool/video/audioAudio
conversion: https://app.xunjieshipin.com/mp3- to-wav/
Bilibili decoder: https://bilibili.iiilab.com/

Supongo que te gusta

Origin blog.csdn.net/weixin_36532747/article/details/131544851
Recomendado
Clasificación