Open source TTS+gtx1080+cuda11.7+conda+python3.9 beats Baidu TTS

1. Introduction 

Open source project, generative audio model for text prompts

https://github.com/suno-ai/bark

 Bark is a transformer-based text-to-audio model created by Suno. Bark can generate extremely realistic multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce non-verbal communication such as laughter, sighs and cries. To support the research community, we provide pretrained model checkpoints, ready for inference, and available for commercial use.

2. Demo link:

https://pan.baidu.com/s/1O9_la6TBar75NfI1yut4Lg?pwd=utqg Extraction code: utqg 

3. Supported languages

Language Status
English (en)
German (de)
Spanish (es)
French (fr)
no (hi)
Italian (it)
Japanese (and)
Korean
Polish (pl)
Portuguese (pt)
Russian (ru)
Turkish (tr)
Chinese, simplified (zh)

Graphics card information

4. Installation steps

1.Install conda

2. Install python3.9

conda create --name brakAI python=3.9

3. Activate brakAI environment

conda activate barkAI

4.Install pytorc

conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia

5. Checked the version

import torch
print(torch.cuda.is_available())
print(torch.__version__)

 

6. Clone bark 

git clone https://github.com/suno-ai/bark
cd bark && pip install . 

7.Testing

from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
from IPython.display import Audio

# download and load all models
preload_models()

# generate audio from text
text_prompt = """
    CSDN是全球知名中文IT技术交流平台,创建于1999年,包含原创博客、精品问答、职业培训、技术论坛、资源下载等产品服务,提供原创、优质、完整内容的专业IT技术开发社区.。
"""
audio_array = generate_audio(text_prompt)

# save audio to disk
write_wav("bark_generation22.wav", SAMPLE_RATE, audio_array)
  
# play text in notebook
Audio(audio_array, rate=SAMPLE_RATE)

The model file text_2.pt will be automatically downloaded, or you can download suno/bark at main yourself. 

Model path  bark/generation.py

 

 Change the temporary directory to the bark root directory and download the model file to this directory.

5. Services provided by the web version

 

backendmain.pyp

# -*- coding: utf-8 -*-
from flask import Flask, request, send_file, render_template_string ,jsonify
from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
import tempfile
import time
import os

app = Flask(__name__)

# 下载和加载所有模型
preload_models()

@app.route('/')
def index():
    return render_template_string(open('templates/index.html').read())

@app.route('/generate', methods=['POST'])
def generate():
    text_prompt = request.form.get('text')
    if text_prompt:
        text_prompt = request.form['text']
        audio_array = generate_audio(text_prompt)
        timestamp = str(int(time.time()))
        filename = timestamp + "times.wav"
        filepath = os.path.join('wavfile', filename)
        write_wav(filepath, SAMPLE_RATE, audio_array)
        file_url = '/wavfile/' + filename
        return jsonify({"file_url": file_url})
    else:
        return "No text provided!", 400

if __name__ == '__main__':
    app.run(host='0.0.0.0' ,debug=True)

 frontendindex.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Text to Audio</title>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
</head>
<body>
    <div class="container mt-5">
        <h1>Text to Audio Converter By 3yuan 2023.8.22 23.15.00</h1>
        <div class="form-group">
            <label for="text">Enter your text:</label>
            <textarea class="form-control" id="text" rows="4" required></textarea>
        </div>
        <button id="convert" class="btn btn-primary">Convert</button>
                <div  class="mt-3"><a href="https://blog.csdn.net/jxyk2007/article/details/132425993?">Open Source TTS+gtx1080+cuda11.7+conda+python3.9 ,Beat Baidu TTS</a></div>
        <img id="loading" class="img-responsive mt-3" src="{
   
   { url_for('static', filename='loading.gif') }}" style="display: none;" alt="Loading...">
        <div id="result" class="mt-3"></div>
        <div id="result2" class="mt-3">

        </div>

    </div>
    <script src="https://code.jquery.com/jquery-3.3.1.min.js"></script>
    <script>
        $("#convert").click(function() {
            var text = $("#text").val();
            if (text) {
                $("#loading").show();
                $.post("/generate", { text: text }, function(data) {
                    $("#loading").hide();
                    var link = $('<a href="' + data.file_url + '" download="' + data.file_url + '">Download the audio file</a>');
                    $("#result").html(link);
                    var link2 = $(" <video src="+ data.file_url +" data-canonical-src="+ data.file_url + " controls='controls'   autoplay='autoplay' style='max-height:200px; min-height: 100px'></video>");
                    $("#result2").html(link2);



                });
            } else {
                alert("Please enter some text!");
            }
        });
    </script>
</body>
</html>

 

Download other models, convert text to language

Models - Hugging Face

おすすめ

転載: blog.csdn.net/jxyk2007/article/details/132425993