Cloud alchemy, computing power for nothing, based on cloud GPU (Colab) using So-vits library to make AI Trump sing "The Internationale"

Artificial intelligence AI technology has already penetrated into every corner of people's lives. Don't you see the singing of AI Stefanie Sun one after another, but not everyone has an N card. Life without GPU is always difficult, but it doesn't matter, Shanren There is a clever plan. This time we will build a deep learning environment based on Google's Colab free cloud server, make AI Trump, and let him sing "The Internationale".

Colab (full name Colaboratory), it is a basic free server product based on the cloud of Google, which can write and execute Python code on the B side, that is, the browser, which is very convenient. What's more, Colab can assign users Free GPU to use, for friends without N card, this has gone far beyond the scope of the conscience of the industry, it is simply doing charity.

Configure Colab

Colab is a product based on Google cloud disk. We can directly store data such as deep learning Python scripts, trained models, and training sets in the cloud disk, and then execute it through Colab.

First visit Google Cloud Disk: drive.google.com

Then click New and choose to associate more applications:

Then install Colab:

So far, the cloud disk and Colab have been associated. Now we can create a new script file my_sovits.ipynb and type the code:

hello colab

Then, press the shortcut key ctrl + enter to run the code:

It should be noted here that Colab uses Python code in ipynb format based on Jupyter Notebook.

Jupyter Notebook is opened in the form of a web page, and you can directly write and run code on the web page, and the running result of the code will also be displayed directly under the code block. If you need to write an instruction document during the programming process, you can write it directly on the same page, which is convenient for timely explanation and explanation.

Then set the graphics card type:

Then run the command to check the GPU version:

!/usr/local/cuda/bin/nvcc --version  
  
!nvidia-smi

The program returns:

nvcc: NVIDIA (R) Cuda compiler driver  
Copyright (c) 2005-2022 NVIDIA Corporation  
Built on Wed_Sep_21_10:33:58_PDT_2022  
Cuda compilation tools, release 11.8, V11.8.89  
Build cuda_11.8.r11.8/compiler.31833905_0  
Tue May 16 04:49:23 2023         
+-----------------------------------------------------------------------------+  
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |  
|-------------------------------+----------------------+----------------------+  
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |  
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |  
|                               |                      |               MIG M. |  
|===============================+======================+======================|  
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |  
| N/A   65C    P8    13W /  70W |      0MiB / 15360MiB |      0%      Default |  
|                               |                      |                  N/A |  
+-------------------------------+----------------------+----------------------+  
                                                                                 
+-----------------------------------------------------------------------------+  
| Processes:                                                                  |  
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |  
|        ID   ID                                                   Usage      |  
|=============================================================================|  
|  No running processes found                                                 |  
+-----------------------------------------------------------------------------+

Here it is recommended to choose the graphics card type of Tesla T4, which has more outstanding performance.

So far Colab is configured.

Configure So-vits

Next, we configure the so-vits environment, and we can install some basic dependencies through the pip command:

!pip install pyworld==0.3.2  
!pip install numpy==1.23.5

Note that the jupyter language uses exclamation points to run commands.

Note that because it is not a local environment, sometimes colab will remind:

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/  
Collecting numpy==1.23.5  
  Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)  
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 80.1 MB/s eta 0:00:00  
Installing collected packages: numpy  
  Attempting uninstall: numpy  
    Found existing installation: numpy 1.22.4  
    Uninstalling numpy-1.22.4:  
      Successfully uninstalled numpy-1.22.4  
Successfully installed numpy-1.23.5  
WARNING: The following packages were previously imported in this runtime:  
  [numpy]  
You must restart the runtime in order to use newly installed versions.

At this time, the numpy library needs to restart the runtime before it can be imported.

After restarting the runtime, you need to reinstall it again until the system prompts that the dependencies already exist:

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/  
Requirement already satisfied: numpy==1.23.5 in /usr/local/lib/python3.10/dist-packages (1.23.5)

Then, clone the so-vits project and install the project's dependencies:

import os  
import glob  
!git clone https://github.com/effusiveperiscope/so-vits-svc -b eff-4.0  
os.chdir('/content/so-vits-svc')  
# install requirements one-at-a-time to ignore exceptions  
!cat requirements.txt | xargs -n 1 pip install --extra-index-url https://download.pytorch.org/whl/cu117  
!pip install praat-parselmouth  
!pip install ipywidgets  
!pip install huggingface_hub  
!pip install pip==23.0.1 # fix pip version for fairseq install  
!pip install fairseq==0.12.2  
!jupyter nbextension enable --py widgetsnbextension  
existing_files = glob.glob('/content/**/*.*', recursive=True)  
!pip install --upgrade protobuf==3.9.2  
!pip uninstall -y tensorflow  
!pip install tensorflow==2.11.0

After installing the dependencies, define some pre-tool methods:

os.chdir('/content/so-vits-svc') # force working-directory to so-vits-svc - this line is just for safety and is probably not required  
  
import tarfile  
import os  
from zipfile import ZipFile  
# taken from https://github.com/CookiePPP/cookietts/blob/master/CookieTTS/utils/dataset/extract_unknown.py  
def extract(path):  
    if path.endswith(".zip"):  
        with ZipFile(path, 'r') as zipObj:  
           zipObj.extractall(os.path.split(path)[0])  
    elif path.endswith(".tar.bz2"):  
        tar = tarfile.open(path, "r:bz2")  
        tar.extractall(os.path.split(path)[0])  
        tar.close()  
    elif path.endswith(".tar.gz"):  
        tar = tarfile.open(path, "r:gz")  
        tar.extractall(os.path.split(path)[0])  
        tar.close()  
    elif path.endswith(".tar"):  
        tar = tarfile.open(path, "r:")  
        tar.extractall(os.path.split(path)[0])  
        tar.close()  
    elif path.endswith(".7z"):  
        import py7zr  
        archive = py7zr.SevenZipFile(path, mode='r')  
        archive.extractall(path=os.path.split(path)[0])  
        archive.close()  
    else:  
        raise NotImplementedError(f"{path} extension not implemented.")  
  
# taken from https://github.com/CookiePPP/cookietts/tree/master/CookieTTS/_0_download/scripts  
  
# megatools download urls  
win64_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win64.zip"  
win32_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win32.zip"  
linux_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-linux-x86_64.tar.gz"  
# download megatools  
from sys import platform  
import os  
import urllib.request  
import subprocess  
from time import sleep  
  
if platform == "linux" or platform == "linux2":  
        dl_url = linux_url  
elif platform == "darwin":  
    raise NotImplementedError('MacOS not supported.')  
elif platform == "win32":  
        dl_url = win64_url  
else:  
    raise NotImplementedError ('Unknown Operating System.')  
  
dlname = dl_url.split("/")[-1]  
if dlname.endswith(".zip"):  
    binary_folder = dlname[:-4] # remove .zip  
elif dlname.endswith(".tar.gz"):  
    binary_folder = dlname[:-7] # remove .tar.gz  
else:  
    raise NameError('downloaded megatools has unknown archive file extension!')  
  
if not os.path.exists(binary_folder):  
    print('"megatools" not found. Downloading...')  
    if not os.path.exists(dlname):  
        urllib.request.urlretrieve(dl_url, dlname)  
    assert os.path.exists(dlname), 'failed to download.'  
    extract(dlname)  
    sleep(0.10)  
    os.unlink(dlname)  
    print("Done!")  
  
  
binary_folder = os.path.abspath(binary_folder)  
  
def megadown(download_link, filename='.', verbose=False):  
    """Use megatools binary executable to download files and folders from MEGA.nz ."""  
    filename = ' --path "'+os.path.abspath(filename)+'"' if filename else ""  
    wd_old = os.getcwd()  
    os.chdir(binary_folder)  
    try:  
        if platform == "linux" or platform == "linux2":  
            subprocess.call(f'./megatools dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)  
        elif platform == "win32":  
            subprocess.call(f'megatools.exe dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)  
    except:  
        os.chdir(wd_old) # don't let user stop download without going back to correct directory first  
        raise  
    os.chdir(wd_old)  
    return filename  
  
import urllib.request  
from tqdm import tqdm  
import gdown  
from os.path import exists  
  
def request_url_with_progress_bar(url, filename):  
    class DownloadProgressBar(tqdm):  
        def update_to(self, b=1, bsize=1, tsize=None):  
            if tsize is not None:  
                self.total = tsize  
            self.update(b * bsize - self.n)  
      
    def download_url(url, filename):  
        with DownloadProgressBar(unit='B', unit_scale=True,  
                                 miniters=1, desc=url.split('/')[-1]) as t:  
            filename, headers = urllib.request.urlretrieve(url, filename=filename, reporthook=t.update_to)  
            print("Downloaded to "+filename)  
    download_url(url, filename)  
  
  
def download(urls, dataset='', filenames=None, force_dl=False, username='', password='', auth_needed=False):  
    assert filenames is None or len(urls) == len(filenames), f"number of urls does not match filenames. Expected {len(filenames)} urls, containing the files listed below.\n{filenames}"  
    assert not auth_needed or (len(username) and len(password)), f"username and password needed for {dataset} Dataset"  
    if filenames is None:  
        filenames = [None,]*len(urls)  
    for i, (url, filename) in enumerate(zip(urls, filenames)):  
        print(f"Downloading File from {url}")  
        #if filename is None:  
        #    filename = url.split("/")[-1]  
        if filename and (not force_dl) and exists(filename):  
            print(f"{filename} Already Exists, Skipping.")  
            continue  
        if 'drive.google.com' in url:  
            assert 'https://drive.google.com/uc?id=' in url, 'Google Drive links should follow the format "https://drive.google.com/uc?id=1eQAnaoDBGQZldPVk-nzgYzRbcPSmnpv6".\nWhere id=XXXXXXXXXXXXXXXXX is the Google Drive Share ID.'  
            gdown.download(url, filename, quiet=False)  
        elif 'mega.nz' in url:  
            megadown(url, filename)  
        else:  
            #urllib.request.urlretrieve(url, filename=filename) # no progress bar  
            request_url_with_progress_bar(url, filename) # with progress bar  
  
import huggingface_hub  
import os  
import shutil  
  
class HFModels:  
    def __init__(self, repo = "therealvul/so-vits-svc-4.0",   
            model_dir = "hf_vul_models"):  
        self.model_repo = huggingface_hub.Repository(local_dir=model_dir,  
            clone_from=repo, skip_lfs_files=True)  
        self.repo = repo  
        self.model_dir = model_dir  
  
        self.model_folders = os.listdir(model_dir)  
        self.model_folders.remove('.git')  
        self.model_folders.remove('.gitattributes')  
  
    def list_models(self):  
        return self.model_folders  
  
    # Downloads model;  
    # copies config to target_dir and moves model to target_dir  
    def download_model(self, model_name, target_dir):  
        if not model_name in self.model_folders:  
            raise Exception(model_name + " not found")  
        model_dir = self.model_dir  
        charpath = os.path.join(model_dir,model_name)  
  
        gen_pt = next(x for x in os.listdir(charpath) if x.startswith("G_"))  
        cfg = next(x for x in os.listdir(charpath) if x.endswith("json"))  
        try:  
          clust = next(x for x in os.listdir(charpath) if x.endswith("pt"))  
        except StopIteration as e:  
          print("Note - no cluster model for "+model_name)  
          clust = None  
  
        if not os.path.exists(target_dir):  
            os.makedirs(target_dir, exist_ok=True)  
  
        gen_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,  
            filename = model_name + "/" + gen_pt) # this is a symlink  
          
        if clust is not None:  
          clust_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,  
              filename = model_name + "/" + clust) # this is a symlink  
          shutil.move(os.path.realpath(clust_dir), os.path.join(target_dir, clust))  
          clust_out = os.path.join(target_dir, clust)  
        else:  
          clust_out = None  
  
        shutil.copy(os.path.join(charpath,cfg),os.path.join(target_dir, cfg))  
        shutil.move(os.path.realpath(gen_dir), os.path.join(target_dir, gen_pt))  
  
        return {"config_path": os.path.join(target_dir,cfg),  
            "generator_path": os.path.join(target_dir,gen_pt),  
            "cluster_path": clust_out}  
  
# Example usage  
# vul_models = HFModels()  
# print(vul_models.list_models())  
# print("Applejack (singing)" in vul_models.list_models())  
# vul_models.download_model("Applejack (singing)","models/Applejack (singing)")  
  
    print("Finished!")

These methods help us download, decompress and load models.

Timbre model download and online reasoning

Then download Trump's tone model and configuration file, the download address is:

https://huggingface.co/Nardicality/so-vits-svc-4.0-models/tree/main/Trump18.5k

Then the model file is placed in the models folder of the project, and the configuration file is placed in the config folder.

Then upload the songs to be converted to a directory parallel to the project.

Run the code:

import os  
import glob  
import json  
import copy  
import logging  
import io  
from ipywidgets import widgets  
from pathlib import Path  
from IPython.display import Audio, display  
  
os.chdir('/content/so-vits-svc')  
  
import torch  
from inference import infer_tool  
from inference import slicer  
from inference.infer_tool import Svc  
import soundfile  
import numpy as np  
  
MODELS_DIR = "models"  
  
def get_speakers():  
  speakers = []  
  for _,dirs,_ in os.walk(MODELS_DIR):  
    for folder in dirs:  
      cur_speaker = {}  
      # Look for G_****.pth  
      g = glob.glob(os.path.join(MODELS_DIR,folder,'G_*.pth'))  
      if not len(g):  
        print("Skipping "+folder+", no G_*.pth")  
        continue  
      cur_speaker["model_path"] = g[0]  
      cur_speaker["model_folder"] = folder  
  
      # Look for *.pt (clustering model)  
      clst = glob.glob(os.path.join(MODELS_DIR,folder,'*.pt'))  
      if not len(clst):  
        print("Note: No clustering model found for "+folder)  
        cur_speaker["cluster_path"] = ""  
      else:  
        cur_speaker["cluster_path"] = clst[0]  
  
      # Look for config.json  
      cfg = glob.glob(os.path.join(MODELS_DIR,folder,'*.json'))  
      if not len(cfg):  
        print("Skipping "+folder+", no config json")  
        continue  
      cur_speaker["cfg_path"] = cfg[0]  
      with open(cur_speaker["cfg_path"]) as f:  
        try:  
          cfg_json = json.loads(f.read())  
        except Exception as e:  
          print("Malformed config json in "+folder)  
        for name, i in cfg_json["spk"].items():  
          cur_speaker["name"] = name  
          cur_speaker["id"] = i  
          if not name.startswith('.'):  
            speakers.append(copy.copy(cur_speaker))  
  
    return sorted(speakers, key=lambda x:x["name"].lower())  
  
logging.getLogger('numba').setLevel(logging.WARNING)  
chunks_dict = infer_tool.read_temp("inference/chunks_temp.json")  
existing_files = []  
slice_db = -40  
wav_format = 'wav'  
  
class InferenceGui():  
  def __init__(self):  
    self.speakers = get_speakers()  
    self.speaker_list = [x["name"] for x in self.speakers]  
    self.speaker_box = widgets.Dropdown(  
        options = self.speaker_list  
    )  
    display(self.speaker_box)  
  
    def convert_cb(btn):  
      self.convert()  
    def clean_cb(btn):  
      self.clean()  
  
    self.convert_btn = widgets.Button(description="Convert")  
    self.convert_btn.on_click(convert_cb)  
    self.clean_btn = widgets.Button(description="Delete all audio files")  
    self.clean_btn.on_click(clean_cb)  
  
    self.trans_tx = widgets.IntText(value=0, description='Transpose')  
    self.cluster_ratio_tx = widgets.FloatText(value=0.0,   
      description='Clustering Ratio')  
    self.noise_scale_tx = widgets.FloatText(value=0.4,   
      description='Noise Scale')  
    self.auto_pitch_ck = widgets.Checkbox(value=False, description=  
      'Auto pitch f0 (do not use for singing)')  
  
    display(self.trans_tx)  
    display(self.cluster_ratio_tx)  
    display(self.noise_scale_tx)  
    display(self.auto_pitch_ck)  
    display(self.convert_btn)  
    display(self.clean_btn)  
  
  def convert(self):  
    trans = int(self.trans_tx.value)  
    speaker = next(x for x in self.speakers if x["name"] ==   
          self.speaker_box.value)  
    spkpth2 = os.path.join(os.getcwd(),speaker["model_path"])  
    print(spkpth2)  
    print(os.path.exists(spkpth2))  
  
    svc_model = Svc(speaker["model_path"], speaker["cfg_path"],   
      cluster_model_path=speaker["cluster_path"])  
      
    input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)  
     if f not in existing_files and   
     any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]  
    for name in input_filepaths:  
      print("Converting "+os.path.split(name)[-1])  
      infer_tool.format_wav(name)  
  
      wav_path = str(Path(name).with_suffix('.wav'))  
      wav_name = Path(name).stem  
      chunks = slicer.cut(wav_path, db_thresh=slice_db)  
      audio_data, audio_sr = slicer.chunks2audio(wav_path, chunks)  
  
      audio = []  
      for (slice_tag, data) in audio_data:  
          print(f'#=====segment start, '  
              f'{round(len(data)/audio_sr, 3)}s======')  
            
          length = int(np.ceil(len(data) / audio_sr *  
              svc_model.target_sample))  
            
          if slice_tag:  
              print('jump empty segment')  
              _audio = np.zeros(length)  
          else:  
              # Padding "fix" for noise  
              pad_len = int(audio_sr * 0.5)  
              data = np.concatenate([np.zeros([pad_len]),  
                  data, np.zeros([pad_len])])  
              raw_path = io.BytesIO()  
              soundfile.write(raw_path, data, audio_sr, format="wav")  
              raw_path.seek(0)  
              _cluster_ratio = 0.0  
              if speaker["cluster_path"] != "":  
                _cluster_ratio = float(self.cluster_ratio_tx.value)  
              out_audio, out_sr = svc_model.infer(  
                  speaker["name"], trans, raw_path,  
                  cluster_infer_ratio = _cluster_ratio,  
                  auto_predict_f0 = bool(self.auto_pitch_ck.value),  
                  noice_scale = float(self.noise_scale_tx.value))  
              _audio = out_audio.cpu().numpy()  
              pad_len = int(svc_model.target_sample * 0.5)  
              _audio = _audio[pad_len:-pad_len]  
          audio.extend(list(infer_tool.pad_array(_audio, length)))  
            
      res_path = os.path.join('/content/',  
          f'{wav_name}_{trans}_key_'  
          f'{speaker["name"]}.{wav_format}')  
      soundfile.write(res_path, audio, svc_model.target_sample,  
          format=wav_format)  
      display(Audio(res_path, autoplay=True)) # display audio file  
    pass  
  
  def clean(self):  
     input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)  
     if f not in existing_files and   
     any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]  
     for f in input_filepaths:  
       os.remove(f)  
  
inference_gui = InferenceGui()

At this time, the system will automatically search for music files in the root directory, that is, content, including but not limited to wav, flac, mp3, etc., and then perform inference based on the downloaded model. Before inference, the background sound separation and noise reduction will be automatically performed on the file and slicing operations.

After the reasoning is over, the converted song will be played automatically.

epilogue

If you are just starting to use Colab, the default allocated video memory is about 15G, which is fully capable of most training and inference tasks. However, if you often use it to perform on-hook operations, the allocated graphics card configuration will gradually decrease. If it takes a long time And the relatively stable GPU resources still require a paid subscription to the Colab pro service. In addition, the free space of the Google cloud disk is also 15G. If you download too many models, the cloud disk space will be insufficient, and the code will report an error. Therefore, it is best to clean up Google regularly. Cloud disk to ensure the normal operation of deep learning tasks.

Guess you like

Origin blog.csdn.net/zcxey2911/article/details/130707037