Python calls Tencent speech synthesis interface

1. Install Tencent Cloud Development Kit

pip install tencentcloud -i https://mirrors.cloud.tencent.com/pypi/simple/

It should be noted that the source must be specified here: https://mirrors.cloud.tencent.com/pypi/simple/. Otherwise the installation is likely to fail.

2. Activate Tencent voice service

2.1 Login to Tencent Cloud Platform

Address: https://cloud.tencent.com/
In the main menu, select [Product] | [Artificial Intelligence and Machine Learning] | [Speech Synthesis]
insert image description here
insert image description here
to receive a free resource pack.
The first time you receive it, you can use 8 million speech synthesis for free.
insert image description here

2.2 Generate SecretKey

Go to [Cloud Resource Management] | [Access Management] and
insert image description here
[New Key] on the API key management page
insert image description here
will get three values: APPID, SecretId and SecretKey
Please write them down.

3. Write code

3.1 Import development package

# -*- coding:utf-8 -*-
import json, uuid
import base64
# 语音合成包客户端
from tencentcloud.tts.v20190823.tts_client import TtsClient
# 语音合成数据模型
from tencentcloud.tts.v20190823.models import TextToVoiceRequest
# 腾讯云异常处理
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
# 参数处理工具
from configparser import ConfigParser
# 安全验证
from tencentcloud.common.credential import Credential
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile

3.2 Configuration file tcloud_auth.ini

Configure the key information obtained in 2.2 in this file.

#用户鉴权参数
#测试账号
[authorization]
AppId=你的AppId
SecretId=你的SecretId
SecretKey=你的SecretKey

[expired]
ExpiredTime=3600

3.3 Call the interface to generate voice

Create a class voice_generation, the function of the function text_to_voice in the class is to synthesize text into speech and write the speech data into a speech file.

code show as below:

auth_file_path = "./voice/conf/tcloud_auth.ini"

class voice_generation():
    def __init__(self) -> None:
        conf = ConfigParser()
        conf.read(auth_file_path)
        self.appid = conf.getint("authorization","AppId")
        self.secretId = conf.get("authorization", "SecretId")
        self.secretKey = conf.get("authorization", "SecretKey")

    def text_to_voice(self,text):
        try:
            # 实例化一个认证对象,入参需要传入腾讯云账户 SecretId 和 SecretKey,此处还需注意密钥对的保密
            # 代码泄露可能会导致 SecretId 和 SecretKey 泄露,并威胁账号下所有资源的安全性。以下代码示例仅供参考,建议采用更安全的方式来使用密钥,请参见:https://cloud.tencent.com/document/product/1278/85305
            # 密钥可前往官网控制台 https://console.cloud.tencent.com/cam/capi 进行获取
            cred = Credential(self.secretId, self.secretKey)
            # 实例化一个http选项,可选的,没有特殊需求可以跳过
            httpProfile = HttpProfile()
            httpProfile.endpoint = "tts.tencentcloudapi.com"

            # 实例化一个client选项,可选的,没有特殊需求可以跳过
            clientProfile = ClientProfile()
            clientProfile.httpProfile = httpProfile
            # 实例化要请求产品的client对象,clientProfile是可选的
            client = TtsClient(cred, "ap-shenzhen-fsi", clientProfile)

            # 实例化一个请求对象,每个接口都会对应一个request对象
            req = TextToVoiceRequest()
            sessionid = uuid.uuid4().hex
            params = {
    
    
                "Text": text,
                "SessionId": sessionid,
                "Volume": 0,
                "Speed": 0,
                "ProjectId": 0,
                "ModelType": 1,
                "VoiceType": 1009,
                "PrimaryLanguage": 1,
                "SampleRate": 16000,
                "Codec": "mp3",
                "SegmentRate": 0,
                "EmotionCategory": "neutral",
                "EmotionIntensity": 100
            }
            req.from_json_string(json.dumps(params))

            # 返回的resp是一个TextToVoiceResponse的实例,与请求对象对应
            resp = client.TextToVoice(req)
            # 输出json格式的字符串回包
            print(resp.RequestId)
            audio = resp.Audio.encode()
            file_path = f"static/voice/{
      
      sessionid}.mp3"
            with open(file_path, "wb") as f:
                f.write(base64.decodebytes(audio))
                f.close()
            return f"{
      
      sessionid}.mp3"
        except TencentCloudSDKException as err:
            print(err)

3.4 Initialization function

3.4.1 Read parameters

Load the parameter configuration file created in 3.2, read the configuration information in the file, that is, AppId, SecretId, SecretKey and store them in variables.

def __init__(self) -> None:
        conf = ConfigParser()
        conf.read(auth_file_path)
        self.appid = conf.getint("authorization","AppId")
        self.secretId = conf.get("authorization", "SecretId")
        self.secretKey = conf.get("authorization", "SecretKey")

3.5 Description of important parameters

3.5.1 Create authentication information

	cred = Credential(self.secretId, self.secretKey)

3.5.2 Interface address

tts.tencentcloudapi.com is the address of Tencent speech synthesis interface

    	httpProfile = HttpProfile()
        httpProfile.endpoint = "tts.tencentcloudapi.com"

3.5.3 Speech Synthesis Parameters

            req = TextToVoiceRequest()
            sessionid = uuid.uuid4().hex
            params = {
    
    
                "Text": text,
                "SessionId": sessionid,
                "Volume": 0,
                "Speed": 0,
                "ProjectId": 0,
                "ModelType": 1,
                "VoiceType": 1009,
                "PrimaryLanguage": 1,
                "SampleRate": 16000,
                "Codec": "mp3",
                "SegmentRate": 0,
                "EmotionCategory": "neutral",
                "EmotionIntensity": 100
            }
            req.from_json_string(json.dumps(params))

Required parameters:

parameter value
Text Text to be converted to speech
SessionId a string, returned as-is

3.6 Output voice file

Speech synthesis interface, returns the synthesized speech in base64 format. Therefore, when storing files, the data needs to be base64-decoded.

3.6.1 Generate voice and save it as a voice file

code:

# 返回的resp是一个TextToVoiceResponse的实例,与请求对象对应
   resp = client.TextToVoice(req)
   # 输出json格式的字符串回包
    print(resp.RequestId)
    # 返回Audio为字符串型,因此需要先进行二进制编码
    audio = resp.Audio.encode()
    file_path = f"static/voice/{
      
      sessionid}.mp3"
    with open(file_path, "wb") as f:
        f.write(base64.decodebytes(audio))
        f.close()
    return f"{
      
      sessionid}.mp3"

3.6.2 Return data structure description

parameter name type describe
Audio String base64
SessionId String A request corresponds to a SessionId
Subtitles Array of Subtitle Timestamp information, if the timestamp is not enabled, an empty array will be returned.
RequestId String Unique request ID, which will be returned for each request. The RequestId of the request needs to be provided when locating the problem.

4. References

Tencent speech synthesis API document: https://cloud.tencent.com/document/product/1073/37995

Guess you like

Origin blog.csdn.net/all_night_in/article/details/131206522