Unity语音识别[GVoiceSDK的使用]

1.语音识别是什么？

对用户的输入录音->对音频进行识别->判断出用户的输入具体是什么

例如：你对着电脑说：你好-> 生成你好.wav->识别你好.wav->返回string s=“你好”

2.Unity有什么对应的解决方案？

遗憾的是，Unity原生并没有这种解决方案（PC端其实还是可以考虑.net的speech库，移动端就不行了）。所以这里我们使用了腾讯云新出的GVoice SDK。

3.SDK配置

注册账号并申请一个云端应用。

云端应用设置：

比较重要的是将语言识别打勾。

Unity配置：

其实跟着官网配也是行的通的，不过比较快速的方式是下载他的Unity Demo，把他工程拷过来，然后将演示程序删掉。

4.代码怎么写？

一共5步：初始化->开始录音->结束录音->上传音频->语言识别。

初始化：

IGCloudVoice m_voice;
m_voice=GCloudVoice.GetEngine();
			m_voice.SetAppInfo ("156********","f5*******************",s);
			m_voice.Init ();
			m_voice.SetMode (GCloudVoiceMode.Translation);
			m_voice.OnApplyMessageKeyComplete += (IGCloudVoice.GCloudVoiceCompleteCode code)
				=> {
				Debug.Log ("OnApplyMessageKeyComplete c# callback");
				if (code == IGCloudVoice.GCloudVoiceCompleteCode.GV_ON_MESSAGE_KEY_APPLIED_SUCC)
				{
					Debug.Log ("OnApplyMessageKeyComplete succ11");
				} 
				else
				{
					Debug.Log ("OnApplyMessageKeyComplete error");
				}
			};
			m_voice.ApplyMessageKey (60000);

SetAppInfo中的三个参数分别是：appid（对应云端应用的appid），appkey（对应云端应用的appkey），最后一个是openid，对应用户的识别id，一般可以使用用户账号信息进行加密作为openid。

setmode()一共可以设置3种模式，语言识别、实时语言、语言消息。这里我们设置为语言识别。

OnApplyMessageKeyComplete作为ApplyMessageKey的事件回调，用于检查是否初始化成功。

开始录音：

m_voice.StartRecording (s);

录音一行代码就可以完成，需要注意的是s是一个录音文件的路径，可以随便写，但是，文件的后缀是.dat而不是.wav。

结束录音：

m_voice.StopRecording ();

没什么好说，一行。

上传音频文件：

string s="";
m_voice.OnUploadReccordFileComplete += (IGCloudVoice.GCloudVoiceCompleteCode code,string filepath,string fileid) =>
			{
				s=fileid;
			};
			m_voice.UploadRecordedFile (temp,60000);

这里需要注意的是这个回调，因为语言翻译依赖的参数，是云端上的一个字符串id，而不是本地的文件路径，所以我们得将这个fileid缓存下来，等一下使用。

语言识别：

string s1="";
			m_voice.OnSpeechToText += (IGCloudVoice.GCloudVoiceCompleteCode code, string fileID, string result) => 
			{
				s1=result;
			};
			m_voice.SpeechToText (s,0,60000);

speechtotext传入的参数就是刚才拿到的fileid了，0是一个标识，代表着翻译成中文，这里的回调中要将result缓存下来，因为这是最后的结果。

最后测试一下：

首先我先对着电脑数说了一个你好，接着经过了这5步，云端最终给我返回了一个“你好。”。

中间可以看到一个很长的字符串，那个就是fileid，感觉像是加密过的，不得不说腾讯这方面厉害。

测试完整源码：

using UnityEngine;
using System.Collections;
using gcloud_voice;
using GVoice_Sound;
public class voice_exp : MonoBehaviour 
{
	private IGCloudVoice m_Voice=GCloudVoice.GetEngine();
	private string _result;
	private string _fileID;
	void Start () 
	{
		m_Voice.SetAppInfo ("1563611570","f57ec395f9f97eda9534a98e3fa793db","E81DCA1782C5CE8B0722A366D7ECB41F");
		m_Voice.Init ();
		m_Voice.SetMode (GCloudVoiceMode.Translation);
		m_Voice.OnSpeechToText += (IGCloudVoice.GCloudVoiceCompleteCode code, string fileID, string result) => 
		{
			_result=result;
			Debug.Log("speech:"+code.ToString());
			Debug.Log(fileID);
			Debug.Log(_result);
		};
		m_Voice.OnUploadReccordFileComplete += (IGCloudVoice.GCloudVoiceCompleteCode code,string filepath,string fileid) =>
		{
			Debug.Log("Upload:"+code.ToString());
			_fileID=fileid;
			m_Voice.SpeechToText(_fileID,0,6000);
			Debug.Log(_fileID);
		};
		m_Voice.OnApplyMessageKeyComplete += (IGCloudVoice.GCloudVoiceCompleteCode code)
			=> {
			Debug.Log ("OnApplyMessageKeyComplete c# callback");
			if (code == IGCloudVoice.GCloudVoiceCompleteCode.GV_ON_MESSAGE_KEY_APPLIED_SUCC) {
				Debug.Log ("OnApplyMessageKeyComplete succ11");
			} else {
				Debug.Log ("OnApplyMessageKeyComplete error");
			}
		};
		m_Voice.ApplyMessageKey (60000);
		string s="safdjioasfvgfwsefa";
		string _result1=MD5.digests (s);
		Debug.Log (_result1);
	}

	void Update ()
	{
		if (m_Voice != null) 
		{
			m_Voice.Poll();
		}
		if(Input.GetKeyDown(KeyCode.W))
		{
			m_Voice.UploadRecordedFile (Application.dataPath+"/testcapture.dat",6000);
		}
		if(Input.GetKeyDown(KeyCode.Space))
		{
			m_Voice.StartRecording (Application.dataPath+"/testcapture.dat");
			Debug.Log ("safasfasf");
		}
		if(Input.GetKeyDown(KeyCode.A))
		{
			m_Voice.StopRecording ();
			m_Voice.PlayRecordedFile (Application.dataPath+"/testcapture.dat");
		}
	}
}

接下来一篇写针对openid的加密，使用MD5算法，敬请期待。

Unity语音识别[GVoiceSDK的使用]

猜你喜欢