Python 调用Windows内置的语音合成，并生成wav文件

Python 语言能说话吗? 本文介绍Python如何调用系统自带的语音合成, 及生成语音wav文件的技巧。

安装pythonnet库

按Win+R键, 输入cmd, 再输入pip install pythonnet, 回车即可。

(如果下载太慢, 可以升级pip, 因为新版的pip在国内默认使用国内的镜像源下载, 速度较快)

调用系统语音合成

System.Speech 这个库包含了调用系统语音合成的函数和类, 其中SpeechSynthesizer()类用于合成语音, 包含以下方法:

Speak(string): 合成string对应的语音, 也就是朗读string。
SelectVoice(type): 选择不同的语音音色, 必须是系统中已有的。注意有时一种语音音色只能对应一种语言, 比如你选了英文的语音音色, 又Speak()中文, 就会读不出声音。

以下是操作方法:

所需文件: 音频 · qfcy_ / Python · GitCode (点击"克隆"按钮, 再下载源代码)

新建一个.py文件, 输入以下代码。然后, 将下载的System.Speech.dll放到.py文件所在的同一目录下, 即可。(另外这个dll的运行需要.net 4.0库。)

import clr,ctypes
clr.AddReference("System.Speech") # 需要调用System.Speech.dll
from System.Speech.Synthesis import *

speak = SpeechSynthesizer()
speak.SelectVoice('Microsoft Zira Desktop')
speak.Speak("Hello world")
speak.SelectVoice('Microsoft Huihui Desktop')
speak.Speak("中文")

自定义语音音色

前面介绍了SelectVoice()方法可以选择不同的语音音色。这些系统的音色在哪里？

使用SpeechSynthesizer对象的GetInstalledVoices()方法，可以获取所有已安装的音色，包含名称等信息。SelectVoice()方法传入的是音色的名称。

import clr,ctypes
clr.AddReference("System.Speech") # 需要调用System.Speech.dll
from System.Speech.Synthesis import *

speak = SpeechSynthesizer()
voices = speak.GetInstalledVoices() # 注: voices不是列表类型
for v in voices:
    i = v.get_VoiceInfo()
    print("名称:",i.get_Name())
    print("年龄:",i.get_Age())
    print("性别:",i.get_Gender())
    print("文化:",i.get_Culture())
    print("描述:",i.get_Description())
    print("附加信息:",i.get_AdditionalInfo())
    print("ID:",i.get_Id())
    print("支持音频格式:",list(i.get_SupportedAudioFormats()))
    print()

运行结果(部分)：

名称: Microsoft Huihui Desktop
年龄: Adult
性别: Female
文化: zh-CN
描述: Microsoft Huihui Desktop - Chinese (Simplified)
附加信息: System.Collections.ObjectModel.ReadOnlyDictionary`2[System.String,System.String]
ID: TTS_MS_ZH-CN_HUIHUI_11.0
支持音频格式: []

将语音合成到wav文件

使用SpeechSynthesizer对象的SetOutputToWaveFile()和SetOutputToDefaultAudioDevice()方法, 可以设置输出到电脑的声音, 还是输出到wav文件。利用这个功能, 可以方便地制作带语音的视频, 等等。

import clr,ctypes
clr.AddReference("System.Speech") # 需要调用System.Speech.dll
from System.Speech.Synthesis import *

speak = SpeechSynthesizer()
speak.SelectVoice('Microsoft Zira Desktop')
speak.SetOutputToWaveFile("输出.wav") # 可输出到wav文件
speak.Speak("Hello world")
speak.SelectVoice('Microsoft Huihui Desktop')
speak.Speak("中文")
speak.SetOutputToDefaultAudioDevice()
speak.Speak("成功完成")

结果是生成了一个wav文件, 包含我们的输出。

总结，前面介绍了Python调用系统内置语音合成的方法。语音合成技术在实际生活中已具有广泛的应用。