Unity speech recognition [use of GVoiceSDK]

1. What is speech recognition?

Record the user's input -> identify the audio -> determine what the user's input is

For example: You say to the computer: hello -> generate hello.wav->recognize hello.wav->return string s="hello"

2. What is the corresponding solution for Unity?

Unfortunately, Unity does not have such a solution natively (in fact, the .net speech library can still be considered on the PC side, but not on the mobile side). So here we use the new GVoice SDK from Tencent Cloud.

3.SDK configuration

Register an account and apply for a cloud application.

Cloud application settings:


More important is to tick language recognition.

Unity configuration:

In fact, it is also feasible to follow the official website, but the faster way is to download his Unity Demo, copy his project, and then delete the demo program.

4. How to write the code?

A total of 5 steps: initialization -> start recording -> end recording -> upload audio -> language recognition.

initialization: 

IGCloudVoice m_voice;
m_voice=GCloudVoice.GetEngine();
			m_voice.SetAppInfo ("156********","f5*******************",s);
			m_voice.Init ();
			m_voice.SetMode (GCloudVoiceMode.Translation);
			m_voice.OnApplyMessageKeyComplete += (IGCloudVoice.GCloudVoiceCompleteCode code)
				=> {
				Debug.Log ("OnApplyMessageKeyComplete c# callback");
				if (code == IGCloudVoice.GCloudVoiceCompleteCode.GV_ON_MESSAGE_KEY_APPLIED_SUCC)
				{
					Debug.Log ("OnApplyMessageKeyComplete succ11");
				}
				else
				{
					Debug.Log ("OnApplyMessageKeyComplete error");
				}
			};
			m_voice.ApplyMessageKey (60000);

The three parameters in SetAppInfo are: appid (corresponding to the appid of the cloud application), appkey (corresponding to the appkey of the cloud application), and the last one is openid, which corresponds to the user's identification id. Generally, the user account information can be encrypted as openid.

setmode() can set a total of 3 modes, language recognition, real-time language, and language messages. Here we set to language recognition.

OnApplyMessageKeyComplete is used as the event callback of ApplyMessageKey to check whether the initialization is successful.

start recording:

m_voice.StartRecording (s);

Recording can be done with one line of code. It should be noted that s is the path of a recording file, which can be written casually. However, the suffix of the file is .dat instead of .wav.

To end the recording:

m_voice.StopRecording ();

Not much to say, one line.

Upload audio files:

string s="";
m_voice.OnUploadReccordFileComplete += (IGCloudVoice.GCloudVoiceCompleteCode code,string filepath,string fileid) =>
			{
				s=fileid;
			};
			m_voice.UploadRecordedFile (temp,60000);
What needs to be paid attention to here is this callback, because the parameter that language translation depends on is a string id on the cloud, not a local file path, so we have to cache this fileid and use it later.

Language recognition:

string s1="";
			m_voice.OnSpeechToText += (IGCloudVoice.GCloudVoiceCompleteCode code, string fileID, string result) =>
			{
				s1=result;
			};
			m_voice.SpeechToText (s,0,60000);

The parameter passed in by speechtotext is the fileid you just got. 0 is an identifier, which means that it is translated into Chinese. The result should be cached in the callback here, because this is the final result.

A final test:



First of all, I said a hello to the computer, and then after these 5 steps, the cloud finally returned me a "Hello.".

You can see a very long string in the middle, that is fileid, it feels like encrypted, I have to say that Tencent is very good in this regard.

Test the complete source code:

using UnityEngine;
using System.Collections;
using gcloud_voice;
using GVoice_Sound;
public class voice_exp : MonoBehaviour
{
	private IGCloudVoice m_Voice=GCloudVoice.GetEngine();
	private string _result;
	private string _fileID;
	void Start ()
	{
		m_Voice.SetAppInfo ("1563611570","f57ec395f9f97eda9534a98e3fa793db","E81DCA1782C5CE8B0722A366D7ECB41F");
		m_Voice.Init ();
		m_Voice.SetMode (GCloudVoiceMode.Translation);
		m_Voice.OnSpeechToText += (IGCloudVoice.GCloudVoiceCompleteCode code, string fileID, string result) =>
		{
			_result=result;
			Debug.Log("speech:"+code.ToString());
			Debug.Log(fileID);
			Debug.Log(_result);
		};
		m_Voice.OnUploadReccordFileComplete += (IGCloudVoice.GCloudVoiceCompleteCode code,string filepath,string fileid) =>
		{
			Debug.Log("Upload:"+code.ToString());
			_fileID=fileid;
			m_Voice.SpeechToText(_fileID,0,6000);
			Debug.Log(_fileID);
		};
		m_Voice.OnApplyMessageKeyComplete += (IGCloudVoice.GCloudVoiceCompleteCode code)
			=> {
			Debug.Log ("OnApplyMessageKeyComplete c# callback");
			if (code == IGCloudVoice.GCloudVoiceCompleteCode.GV_ON_MESSAGE_KEY_APPLIED_SUCC) {
				Debug.Log ("OnApplyMessageKeyComplete succ11");
			} else {
				Debug.Log ("OnApplyMessageKeyComplete error");
			}
		};
		m_Voice.ApplyMessageKey (60000);
		string s="safdjioasfvgfwsefa";
		string _result1=MD5.digests (s);
		Debug.Log (_result1);
	}

	void Update ()
	{
		if (m_Voice != null)
		{
			m_Voice.Poll();
		}
		if(Input.GetKeyDown(KeyCode.W))
		{
			m_Voice.UploadRecordedFile (Application.dataPath+"/testcapture.dat",6000);
		}
		if(Input.GetKeyDown(KeyCode.Space))
		{
			m_Voice.StartRecording (Application.dataPath+"/testcapture.dat");
			Debug.Log ("safasfasf");
		}
		if(Input.GetKeyDown(KeyCode.A))
		{
			m_Voice.StopRecording ();
			m_Voice.PlayRecordedFile (Application.dataPath+"/testcapture.dat");
		}
	}
}

The next article will write encryption for openid, using MD5 algorithm, so stay tuned.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325803659&siteId=291194637