Unity accesses Baidu Speech Recognition SDK windows platform

1. First register an account on Baidu Open Platform, then apply for trial qualification and create an application according to the document: https://ai.baidu.com/ai-doc/SPEECH/qknh9i8ed

 For the Windows platform, select "Do not need".

2. Download the C# SDK package

After decompression, it contains these two folders. I use net45 in my project, and put the folder into the unity Assets/Plugins folder:

The Api Compatibility Level property setting corresponding to unity should be changed to .NET 4.X, and the setting path is PlayerSetting-OtherSetting-Configuration-Api Compatibility Level

So far, the Baidu Speech SDK has been imported. At present, I have summarized two ways to realize speech recognition:

The first is to use unity's UnityWebRequest to achieve it. It is a blogger's tutorial. The link address is: Unity Baidu Speech Recognition-CSDN Blog

The second is to obtain the SDK interface after reading the official document. The specific implementation method is as follows:

I downloaded the SDK source code on Git for learning. There are corresponding interfaces for specific functions in the code. Source code address: GitHub - Baidu-AIP/dotnet-sdk: Baidu AI Open Platform .Net SDK

After downloading and decompressing, as shown in the figure below, it can be seen from the Git documentation that the voice recognition code is in the speech folder:

In the Asr class, find the method interface JObject Recognize of "recognize voice data"

byte[] data: audio data;

string format: audio format;

int rate: sampling frequency;

options: Language type, the default is 1537 Mandarin, Cantonese, Sichuanese, English, etc. are also supported, see the official documentation for details.

 

1. Recording collection and storage

 void StartRecord()
    {
        Debug.LogError("开始");
        saveAudioClip = Microphone.Start(currentDeviceName, false, recordMaxTime, recordFrequency);
    }

2. Convert the recording and convert the audio to a Byte file

 

 public byte[] ConvertClipToBytes(AudioClip audioClip)
    {
        float[] samples = new float[audioClip.samples];
        audioClip.GetData(samples, 0);
        short[] intData = new short[samples.Length];
        byte[] bytesData = new byte[samples.Length * 2];
        int rescaleFactor = 32767;

        for (int i = 0; i < samples.Length; i++)
        {
            intData[i] = (short)(samples[i] * rescaleFactor);
            byte[] byteArr = new byte[2];
            byteArr = BitConverter.GetBytes(intData[i]);
            byteArr.CopyTo(bytesData, i * 2);
        }
        return bytesData;
    }

 3. After the conversion is completed, send it through the SDK interface

var result = asr.Recognize(ConvertClipToBytes(saveAudioClip),"pcm", recordFrequency,languageType);

4. Convert the returned data to a string, and use regular expressions to extract the information you want

string str = JsonConvert.SerializeObject(result, Formatting.None);
       
        if (!string.IsNullOrEmpty(str))
        {
            if (Regex.IsMatch(str, @"err_msg.:.success"))
            {
                Match match = Regex.Match(str, "result.:..(.*?)..]");
                if (match.Success)
                {
                    str = match.Groups[1].ToString();
                }
            }
            else
            {
                str = "识别结果为空";
            }
            tex.text = str;
        }

The complete code is as follows: 

 The code written lazily is not solid, and the result may have a NULL value when it is obtained, so you can use Try Catch for fault tolerance

using System.Collections;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using UnityEngine;
using UnityEngine.UI;
using UnityEngine.Networking;
using System;
using Baidu.Aip.Speech;
using Newtonsoft.Json;

public class Test : MonoBehaviour
{
    public string app_id;
    public string api_key;
    public string secret_Key;
    public Asr asr;
    string accessToken = string.Empty;
    int recordFrequency = 8000; //录音频率
    int recordMaxTime = 20;//最大录音时长
    AudioClip saveAudioClip;//存储当前录音的片段
    AudioSource source;
    string currentDeviceName = string.Empty;
    Text tex;
    Dictionary<string, object> languageType = new Dictionary<string, object>();

    // Start is called before the first frame update
    void Start()
    {
        saveAudioClip = this.transform.GetComponent<AudioClip>();
        source = this.transform.GetComponent<AudioSource>();
        tex = GameObject.Find("Canvas/ResultTex").GetComponent<Text>();
        asr = new Asr(app_id, api_key, secret_Key);
        languageType.Add("dev_pid", 1537);
    }

    // Update is called once per frame
    void Update()
    {
        if (Input.GetKeyDown(KeyCode.Space))
        {
            StartRecord();
        } else if (Input.GetKeyUp(KeyCode.Space))
        {
            EndRecord();
        }
    }

    public byte[] ConvertClipToBytes(AudioClip audioClip)
    {
        float[] samples = new float[audioClip.samples];
        audioClip.GetData(samples, 0);
        short[] intData = new short[samples.Length];
        byte[] bytesData = new byte[samples.Length * 2];
        int rescaleFactor = 32767;

        for (int i = 0; i < samples.Length; i++)
        {
            intData[i] = (short)(samples[i] * rescaleFactor);
            byte[] byteArr = new byte[2];
            byteArr = BitConverter.GetBytes(intData[i]);
            byteArr.CopyTo(bytesData, i * 2);
        }
        return bytesData;
    }

    /// <summary>
    /// 开始录音
    /// </summary>
    void StartRecord()
    {
        Debug.LogError("开始");
        saveAudioClip = Microphone.Start(currentDeviceName, false, recordMaxTime, recordFrequency);
    }

    /// <summary>
    /// 结束录音
    /// </summary>
    void EndRecord()
    {
        Debug.LogError("结束");
        Microphone.End(currentDeviceName);
        source.PlayOneShot(saveAudioClip);
        var result = asr.Recognize(ConvertClipToBytes(saveAudioClip),"pcm", recordFrequency,languageType);
        string str = JsonConvert.SerializeObject(result, Formatting.None);
       
        if (!string.IsNullOrEmpty(str))
        {
            if (Regex.IsMatch(str, @"err_msg.:.success"))
            {
                Match match = Regex.Match(str, "result.:..(.*?)..]");
                if (match.Success)
                {
                    str = match.Groups[1].ToString();
                }
            }
            else
            {
                str = "识别结果为空";
            }
            tex.text = str;
        }
        // StartCoroutine(RequestASR());//请求语音识别
    }

  
}

 

Guess you like

Origin blog.csdn.net/Abel02/article/details/129532909