Text reading using Microsoft TTS speech engine

TTS (Text-To-Speech) refers to the abbreviation of text-to-speech, that is, the text is converted into speech output through the TTS engine. TTS voice engines include Microsoft TTS voice engine, iFLYTEK voice engine, etc. iFLYTEK tts sdk refer to this page http://www.xfyun.cn/sdk/dispatcher

The text mainly introduces how to use the Microsoft TTS speech engine to realize text reading and generate sound files in wav format.

1. Installation of voice engine and voice library

Microsoft TTS speech engine provides Windows Speech SDK development kit for programmers to use. Windows Speech SDK includes two kinds of speech synthesis SS engine and speech recognition SR engine. The speech synthesis engine is used to convert text into speech output, and the speech recognition engine is used to recognize speech commands.

The Windows Speech SDK can be downloaded for free on Microsoft's official website at: http://www.microsoft.com/download/en/details.aspx?id=10121

In the download interface, choose to download SpeechSDK51.exe, SpeechSDK51LangPach.exe and sapi.chm.

SpeechSDK51.exe

speech synthesis engine

SpeechSDK51LangPach.exe

Voice library, support for Japanese and Simplified Chinese requires this support.

sapi.chm

help documentation

speechsdk51MSM.exe

The speech engine is integrated into your product and released with the product. Unzip three folders 1033, 1041 and 2052. Among them, 1033 is mainly used for English TTS and SR .msm files, 1041 is mainly used for Japanese SR .msm files, 2052 is used for Chinese TTS and SR msm files.

Sp5TTintXP.exe

Mike and Mary voices under XP.

After the download is complete, first install the speech engine SpeechSDK51.exe, and then install the Chinese speech library SpeechSDK51LangPach.exe.

There are currently three most commonly used versions of the Windows Speech SDK: 5.1, 5.3, and 5.4.

       Windows Speech SDK 5.1 supports xp system and server 2003 system, and needs to be downloaded and installed. The XP system only comes with a Microsoft Sam English male voice library by default. If you want a Chinese engine, you need to install Windows Speech SDK 5.1.

       Windows Speech SDK version 5.3 supports Vista system and Server 2008 system and has been integrated into the system. Vista and Server 2003 come with Microsoft Lili Chinese female voice library and Microsoft Anna English female voice library by default.

       Windows Speech SDK version 5.4 supports Windows 7 system and has been integrated into the system, no need to download and install. Win7 system also brings Microsoft Lili Chinese female voice library and Microsoft Anna English female voice library. Microsoft lili supports mixed reading of Chinese and English.

2. Instructions for using the SAPI interface

       1), the realization of the basic reading process

Initialize before using the speech engine:

       ISpVoice *pSpVoice; // Important COM interface
       ::CoInitialize ( NULL ) ; // COM initialization
       // Get the ISpVoice interface
       CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_INPROC_SERVER, IID_ISpVoice, (void**)&pSpVoice);

After obtaining the ISpVoice interface, we can call the SAPI interface through the pSpVoice pointer.

We can set the volume: pSpVoice->SetVolume(80);. The parameter of SetVolume is the range of volume between 0 and 100.

The content of the string can be read aloud like this: pSpVoice->Speak(string, SPF_DEFAULT, NULL);. In this way, the content in the string will be read aloud. The second parameter, SPF_DEFAULT, indicates that the default settings are used, including the settings for synchronous reading. Asynchronous reading can be set to SPF_ASYNC. Synchronous reading means that the speak function will return after reading the content of the string, while asynchronous reading means that the string will be sent in and it will return without blocking.

After using the speech engine you should execute:

       pSpVoice->Release();
       ::CoUninitialize();      

In this way, the resources are released, and the speech reading process ends.

The above completes a simple speech synthesis function.

2) Member functions of ISpVoice

Chicken pecking rice briefly explains several member functions of the ISpVoice interface:

       HRESULT Speak(LPCWSTR *pwcs, DWORD dwFlags, ULONG *pulStreamNumber);

Used to read the contents of the string pwcs. The parameter pwcs is the string to be read aloud. dwFlags is a flag used to control the reading method. For the specific meaning, see the enumeration SPEAKFLAGS in the document. pulStreamNumber is the output parameter, which points to the current input stream number corresponding to this reading request. Each time a string is read aloud, a stream number will be returned, which is used for asynchronous reading.

       HRESULT SetRate (  long    RateAdjust ) ; // Set the reading speed, the value range: - 10 to 10 
       HRESULT GetRate ( long  *pRateAdjust ) ; // Get the reading speed   
       HRESULT SetVoice ( ISpObjectToken *pToken ) ; // Set the voice library used
       HRESULT GetVoice ( ISpObjectToken** ppToken ) ; // Get the voice library
       HRESULT Pause  (  void  ) ; // Pause reading
       HRESULT Resume  (  void  ) ; // resume reading
       // Skip the specified number of sentences (the absolute value of lNumItems) forward or backward in the current reading text according to the sign of lNumItems.
       HRESULT Skip(LPCWSTR  *pItemType, long  lNumItems, ULONG *pulNumSkipped);
       // Play WAV file
       HRESULT SpeakStream(IStream   *pStream, DWORD      dwFlags, ULONG     *pulStreamNumber);
       // output sound to WAV file
       HRESULT SetOutput(IUnknown *pUnkOutput,BOOL fAllowFormatChanges); 
       HRESULT SetVolume ( USHORT usVolume ) ; // Set the volume, range: 0 to 100 
       HRESULT GetVolume ( USHORT *pusVolume ) ; // Get the volume
       HRESULT SetSyncSpeakTimeout ( ULONG msTimeout ) ; // Set the synchronous reading timeout in milliseconds
       HRESULT GetSyncSpeakTimeout ( ULONG *pmsTimeout ) ; // Get the synchronous reading timeout time

Because the speak function is blocked during synchronous reading, if the voice output device is occupied by other programs, the speak will wait all the time, so it is best to set the timeout time, and the speak function will return by itself after the timeout.

3), use XML to read aloud

XML can be used in TTS development, SAPI can analyze XML tags, and some ISpVoice member functions can be implemented through XML. For example, set the voice library, volume, speech rate, etc. At this time, the dwFlags parameter of the speak function should be set to include SPF_IS_XML. Such as:

       // Select the voice library Microsoft Sam
       pSpVoice->speak(L"<VOICE REQUIRED='NAME=Microsoft Sam'/>鸡啄米", SPF_DEFAULT | SPF_IS_XML, NULL);
       // set volume
       <VOLUME LEVEL = ' 90 '>Chicken pecking rice</VOLUME>
       // 设置语言
       <lang langid='804'>鸡啄米</lang>

804代表中文,409代表英文。如果用函数SpGetLanguageFromToken获取语言时,0x804表示中文,0x409表示英文。

4)、设置SAPI通知消息。

      SAPI在朗读的过程中,会给指定窗口发送消息,窗口收到消息后,可以主动获取SAPI的事件,根据事件的不同,用户可以得到当前SAPI的一些信息,比如正在朗读的单词的位置,当前的朗读口型值(用于显示动画口型,中文语音的情况下并不提供这个事件)等等。要获取SAPI的通知,首先要注册一个消息:
  m_cpVoice->SetNotifyWindowMessage( hWnd,WM_TTSAPPCUSTOMEVENT, 0, 0 );
  这个代码一般是在主窗口初始化的时候调用,hWnd是主窗口(或者接收消息的窗口)句柄。WM_TTSAPPCUSTOMEVENT是用户自定义消息。在窗口响应WM_TTSAPPCUSTOMEVENT消息的函数中,通过如下代码获取sapi的通知事件:

   CSpEvent       event;  // 使用这个类,比用 SPEVENT结构更方便

    while(event.GetFrom(m_cpVoice) == S_OK )
    {
       switch( event.eEventId )
       {
         ...
       }
    }

  eEventID有很多种,比如SPEI_START_INPUT_STREAM表示开始朗读,SPEI_END_INPUT_STREAM表示朗读结束等。
   可以根据需要进行判断使用。

5)、speech sdk语音识别,识别语音生成英文/中文等字符串。

具体参考这篇文章:http://blog.csdn.net/artemisrj/article/details/8723095

3、编程实例

1)、首先将需要将Windows Speech SDK开发包的头文件和库文件所在路径添加到编译器中。


2)、封装tts操作类。

//TextToSpeech.h文件

//tts

#pragma once

#include <sapi.h>	//包含TTS语音引擎头文件和库文件
#include <sphelper.h>
#include <string.h>
#pragma comment(lib, "sapi.lib")

class TextToSpeech
{
public:
	TextToSpeech(void);
	virtual ~TextToSpeech(void);

	int Init();
	int UnInit();

	//枚举所有语音Token
	int EnumAudioToken(CString arrayVoicePackageName[],int nVoicePackageNameCount);

	//创建SpVoice
	int CreateSpVoice();
	//释放SpVoice
	int DeleteSpVoice();
	//重置SpVoice(用于临时清除朗读数据)
	int ResetSpVoice();

	 //设置朗读速度(取值范围:-10到10)
	int  SetRate( long RateAdjust);
	 //获取朗读速度
	int  GetRate(long *pRateAdjust);

	//设置使用的语音库
	int  SetVoice(ISpObjectToken   *pToken);
	//获取语音库
	int  GetVoice(unsigned int nIndex,ISpObjectToken** ppToken);

	//设置音量(取值范围:0到100)
	int  SetVolume(USHORT usVolume);
	//获取音量
	int  GetVolume(USHORT *pusVolume); 

	//朗读
	int Speak(CString strContent,DWORD dwFlags=SPF_DEFAULT);
	//朗读生成文件
	int SpeakToWaveFile(CString strContent,char *pFilePathName,DWORD dwFlags=SPF_DEFAULT);
	//暂停朗读
	int Pause();
	//继续朗读
	int Resume(); 
//跳过部分朗读
 int Skip(CString strItemType="Sentence",long lNumItems=65535, ULONG *pulNumSkipped=NULL);

protected:
	IEnumSpObjectTokens * m_pIEnumSpObjectTokens;
	ISpObjectToken * m_pISpObjectToken;
	ISpVoice * m_pISpVoice;
	BOOL m_bComInit;
};


//TextToSpeech.cpp文件

#include "StdAfx.h"
#include "TextToSpeech.h"

TextToSpeech::TextToSpeech(void)
{
	m_pIEnumSpObjectTokens  = NULL;
	m_pISpObjectToken = NULL;
	m_pISpVoice = NULL;
	m_bComInit = FALSE;
}

TextToSpeech::~TextToSpeech(void)
{
}

int TextToSpeech::Init()
{
	//初始化COM组件
	if(FAILED(::CoInitializeEx(NULL,0)))
	{
		//MessageBox("初始化COM组件失败!", "提示", MB_OK|MB_ICONWARNING);
		return -1;
	}

	m_bComInit = TRUE;
	return 0;
}

int TextToSpeech::UnInit()
{
	if(m_bComInit)
	{
		::CoUninitialize();
	}

	return 0;
}

int TextToSpeech::EnumAudioToken(CString arrayVoicePackageName[],int nVoicePackageNameCount)
{
	//枚举所有语音Token
	if(SUCCEEDED(SpEnumTokens(SPCAT_VOICES, NULL, NULL, &m_pIEnumSpObjectTokens)))
	{
		//得到所有语音Token的个数
		ULONG ulTokensNumber = 0;
		m_pIEnumSpObjectTokens->GetCount(&ulTokensNumber);

		//检测该机器是否安装有语音包
		if(ulTokensNumber == 0)
		{
			//MessageBox("该机器没有安装语音包!", "提示", MB_OK|MB_ICONWARNING);
			return -1;    
		}
		if(ulTokensNumber > nVoicePackageNameCount)
		{
			//缓冲区过小
			return 0;
		}

		//将语音包的名字加入数组中
		CString strVoicePackageName = _T("");
		CString strTokenPrefixText = _T("HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Speech\\Voices\\Tokens\\");
		for(ULONG i=0; i<ulTokensNumber; i++)
		{
			m_pIEnumSpObjectTokens->Item(i, &m_pISpObjectToken);
			WCHAR* pChar;
			m_pISpObjectToken->GetId(&pChar);
			strVoicePackageName = pChar;
			strVoicePackageName.Delete(0, strTokenPrefixText.GetLength());
			arrayVoicePackageName[i] = strVoicePackageName;
		}

		return ulTokensNumber;
	}

	return -1;
}

//创建SpVoice
int TextToSpeech::CreateSpVoice()
{
	//获取ISpVoice接口
	if(FAILED(CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_INPROC_SERVER, IID_ISpVoice, (void**)&m_pISpVoice)))
	{
		//MessageBox("获取ISpVoice接口失败!", "提示", MB_OK|MB_ICONWARNING);
		return -1;
	}

	return 0;
}
//释放SpVoice
int TextToSpeech::DeleteSpVoice()
{
	if(m_pISpVoice != NULL)
	{
		m_pISpVoice->Release();
	}
	m_pISpVoice = NULL;

	return 0;
}
//重置SpVoice
int TextToSpeech::ResetSpVoice()
{
	DeleteSpVoice();
	return CreateSpVoice();
}

//设置朗读速度(取值范围:-10到10)
int  TextToSpeech::SetRate( long RateAdjust)
{
	if(m_pISpVoice == NULL)
		return -1;

	//设置播放速度
	m_pISpVoice->SetRate(RateAdjust);
	return 0;
}
//获取朗读速度
int  TextToSpeech::GetRate(long *pRateAdjust)
{
	if(m_pISpVoice == NULL)
		return -1;

	m_pISpVoice->GetRate(pRateAdjust);
	return 0;
}

//设置使用的语音库
int  TextToSpeech::SetVoice(ISpObjectToken  *pToken)
{
	if(m_pISpVoice == NULL)
		return -1;

	m_pISpVoice->SetVoice(pToken);
	return 0;
}
//获取语音库
int  TextToSpeech::GetVoice(unsigned int nIndex,ISpObjectToken** ppToken)
{
	if(m_pIEnumSpObjectTokens == NULL)
		return -1;

	//设置语言
	m_pIEnumSpObjectTokens->Item(nIndex, ppToken);
	m_pISpObjectToken = *ppToken;
	return 0;
}

//设置音量(取值范围:0到100)
int  TextToSpeech::SetVolume(USHORT usVolume)
{
	if(m_pISpVoice == NULL)
		return -1;

	//设置音量大小
	m_pISpVoice->SetVolume(usVolume);
	return 0;
}
//获取音量
int  TextToSpeech::GetVolume(USHORT *pusVolume)
{
	if(m_pISpVoice == NULL)
		return -1;

	//设置音量大小
	m_pISpVoice->GetVolume(pusVolume);
	return 0;
}

//开始朗读
int TextToSpeech::Speak(CString strContent, DWORD dwFlags)
{
	if(m_pISpVoice == NULL)
		return -1;

	//开始进行朗读
	HRESULT hSucess = m_pISpVoice->Speak(strContent.AllocSysString(), dwFlags, NULL);

	return 0;
}
//朗读生成文件
int TextToSpeech::SpeakToWaveFile(CString strContent,char *pFilePathName,DWORD dwFlags)
{
	if(m_pISpVoice == NULL || pFilePathName == NULL)
		return -1;

	//生成WAV文件
	CComPtr<ISpStream> cpISpStream;
	CComPtr<ISpStreamFormat> cpISpStreamFormat;
	CSpStreamFormat spStreamFormat;
	m_pISpVoice->GetOutputStream(&cpISpStreamFormat);
	spStreamFormat.AssignFormat(cpISpStreamFormat);
	HRESULT hResult = SPBindToFile(pFilePathName, SPFM_CREATE_ALWAYS, 
		&cpISpStream, &spStreamFormat.FormatId(), spStreamFormat.WaveFormatExPtr());
	if(SUCCEEDED(hResult))
	{
		m_pISpVoice->SetOutput(cpISpStream, TRUE);
		m_pISpVoice->Speak(strContent.AllocSysString(), dwFlags, NULL);
		return 0;
		//MessageBox("生成WAV文件成功!", "提示", MB_OK);
	}
	else
	{
		//MessageBox("生成WAV文件失败!", "提示", MB_OK|MB_ICONWARNING);
		return 1;
	}
}

//暂停朗读
int TextToSpeech::Pause()
{
	if(m_pISpVoice != NULL)
	{
		m_pISpVoice->Pause();
	}

	return 0;
}
//继续朗读
int TextToSpeech::Resume()
{
	if(m_pISpVoice != NULL)
	{
		m_pISpVoice->Resume();
	}

	return 0;
}
//跳过部分朗读 int TextToSpeech::Skip(CString strItemType,long lNumItems, ULONG *pulNumSkipped) {  if(m_pISpVoice == NULL || strItemType.GetLength() == 0)   return -1;
 m_pISpVoice->Skip(strItemType.AllocSysString(), lNumItems,pulNumSkipped);  return 0; }


3)调用实例代码。

	TextToSpeech ttsSpeech;
	ttsSpeech.Init();
	CString arrayVoicePackageName[50] = {0};
	int nVoicePackageNameCount = 50;
	int nCount = ttsSpeech.EnumAudioToken(arrayVoicePackageName,nVoicePackageNameCount);
	ttsSpeech.CreateSpVoice();
	ISpObjectToken* ppToken = NULL;
	ttsSpeech.GetVoice(0,&ppToken);
	ttsSpeech.SetVoice(ppToken);
	ttsSpeech.SetRate(0);
	ttsSpeech.SetVolume(100);
	ttsSpeech.Speak("我是中国人");
	//ttsSpeech.SpeakToWaveFile("我是中国人","d:\\11.wav");
	ttsSpeech.DeleteSpVoice();
	ttsSpeech.UnInit();

4、注意事项

1)、sphelper.h编译错误解决方案

SAPI 包含sphelper.h编译错误解决方案 在使用Microsoft Speech SDK 5.1开发语音识别程序时,包含了头文件“sphelper.h”和库文件“sapi.lib”。编译时出错: 1>c:\program files\microsoft speech sdk 5.1\include\sphelper.h(769): error C4430: missing type specifier - int assumed. Note: C++ does not supportdefault-int 1>c:\program files\microsoft speech sdk5.1\include\sphelper.h(1419) : error C4430: missing type specifier - intassumed. Note: C++ does not support default-int 1>c:\program files\microsoftspeech sdk 5.1\include\sphelper.h(2373) : error C2065: 'psz' : undeclaredidentifier 1>c:\program files\microsoft speech sdk5.1\include\sphelper.h(2559) : error C2440: 'initializing' : cannot convert from'CSpDynamicString' to 'SPPHONEID *' 1> No user-defined-conversion operatoravailable that can perform this conversion, or the operator cannot be called1>c:\program files\microsoft speech sdk 5.1\include\sphelper.h(2633) : errorC2664: 'wcslen' : cannot convert parameter 1 from 'SPPHONEID *' to 'constwchar_t *' 1> Types pointed to are unrelated; conversion requiresreinterpret_cast, C-style cast or function-style cast 搜索了一圈,根据大家的经验汇总,应该是Speech代码编写时间太早,语法不严密。而VS2008对于语法检查非常严格,导致编译无法通过。修改头文件中的以下行即可正常编译:

 Ln769 const ulLenVendorPreferred = wcslen(pszVendorPreferred);

             const unsigned long ulLenVendorPreferred = wcslen(pszVendorPreferred);

Ln 1418static CoMemCopyWFEX(const WAVEFORMATEX * pSrc, WAVEFORMATEX ** ppCoMemWFEX)

              static HRESULT CoMemCopyWFEX(const WAVEFORMATEX * pSrc, WAVEFORMATEX ** ppCoMemWFEX)

Ln 2372for (const WCHAR * psz = (const WCHAR *)lParam; *psz; psz++) {}

             const WCHAR * psz; for (psz = (const WCHAR *)lParam; *psz; psz++) {}

Ln 2559SPPHONEID* pphoneId = dsPhoneId;

              SPPHONEID* pphoneId = (SPPHONEID*)((WCHAR *)dsPhoneId);

Ln 2633pphoneId += wcslen(pphoneId) + 1;

              pphoneId+= wcslen((const wchar_t *)pphoneId) + 1; 

2)、Speak指定为SPF_ASYNC(异步)时,不要过早的释放ISpVoice对象,否则就没有声音,因为ISpVoice生命周期结束了,就不会播放。一般将ISpVoice对象放到类的成员变量中,类析构时才释放ISpVoice对象。

3)、Speak第一次朗读时很慢,因为加载引擎需要一段时间,可以使用线程预先Speak("",SPF_ASYNC)而加载引擎,但需要注意的是在初始化COM的时候使用CoInitializeEx,而不要使用CoInitialize。


代码下载

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325519462&siteId=291194637