Unity uses a string matching algorithm for speech recognition

This reprint draws on the code written by a big guy in our lab who wrote a game based on speech recognition.

After realizing the conversion of audio files into strings with Baidu Voice API, we started to develop specific functions for characteristic characters. But here are some problems first.

1. You can't expect every translation to be so accurate, or in other words, word for word.

2. Colloquial expressions often make a lot of sentences actually mean one thing.

      So it is not difficult for us to find the fact that if speech recognition is to be used, it is best not to map this correspondence one by one. For example, if I only recognize that the player said "attack", then let our game object If the corresponding operation is made, then the problem comes, what if the player actually said "attack", or a more colloquial expression, or said attack but was identified as "rooster", what should we do? Woolen cloth? Shouldn't this be done? This obviously doesn't work.

     So there are two ideas: establish an equivalence table, which is a many-to-one mapping table, multiple inputs can achieve the same output, and secondly, for a string, only the character similarity needs to reach a certain degree. are considered to be the same sentence.

    With the idea, you will soon be able to create a small system for string pairing. The following is the code:

using UnityEngine;  
using System.Collections;  
using System.Collections.Generic;  
  
namespace STRINGTOINSTRUCT  
{  
    public class stringsystem  
    {  
        public float EndureWrongRate;  
        public int MAX_INSTRUCT_COUNT;  
        public int MAX_DEPENCE_COUNT;  
        private int Temp_Instruct_Count;  
        private int Temp_Depence_Count;  
        private int[] InstructTable;  
        private string[] Find_Table_String;  
        private int[] Find_Table_Instruct;  
        public stringsystem()  
        {  
            EndureWrongRate = 0.5f;  
            MAX_INSTRUCT_COUNT = 15;  
            MAX_DEPENCE_COUNT = 1000;  
            Temp_Instruct_Count = 0;  
            Temp_Depence_Count = 0;  
            InstructTable = new int[MAX_INSTRUCT_COUNT];  
            Find_Table_String=new string[MAX_DEPENCE_COUNT];  
            Find_Table_Instruct = new int[MAX_DEPENCE_COUNT];  
        }  
        public int Find(string s,ref float pairrate)  
        {  
            float Temp_Max_PairRate=0;  
            int index=0;  
            for(int i=0;i<Temp_Depence_Count;i++)  
            {  
                float temp_pair=Pair(Find_Table_String[i],s);  
                if(temp_pair>Temp_Max_PairRate)  
                {  
                    Temp_Max_PairRate = temp_pair;  
                    index = i;  
                }  
            }  
            if (Temp_Max_PairRate >= EndureWrongRate)  
            {  
                pairrate = Temp_Max_PairRate;  
                return Find_Table_Instruct [index];  
            }  
            else  
            {  
                pairrate = Temp_Max_PairRate;  
                return -1;    
            }  
        }  
        public int InsertInstruct()  
        {  
            int instructcode = Random.Range (0,10000000);  
            InstructTable [Temp_Instruct_Count] = instructcode;  
            Temp_Instruct_Count++;  
            return instructcode;  
        }  
        public void Insert_Index(int instruct,string word)  
        {  
            Find_Table_Instruct [Temp_Depence_Count] = instruct;  
            Find_Table_String [Temp_Depence_Count] = word;  
            Temp_Depence_Count++;  
        }  
        private float Pair(string str1,string str2)  
        {  
            int len1 = str1.Length;  
            int len2 = str2.Length;  
            int[,] dif = new int[len1+1, len2+1];  
            for(int i=0;i<len1+1;i++)  
            {  
                dif [i, 0] = i;  
            }  
            for(int i=0;i<len2+1;i++)  
            {  
                dif [0, i] = i;  
            }  
            int temp;  
            for(int i=1;i<len1+1;i++)  
            {  
                for(int j=1;j<len2+1;j++)  
                {  
                    if (str1 [i-1] == str2 [j-1])   
                    {  
                        temp = 0;  
                    }  
                    else  
                    {  
                        temp = 1;  
                    }  
                    dif [i, j] = min (dif[i-1,j-1]+temp,dif[i,j-1]+1,dif[i-1,j]+1);  
                }  
            }  
            float similarity = 1 - (float)dif [len1, len2]/Mathf.Max(str1.Length,str2.Length);  
            return similarity;  
        }  
        private static int min(params int [] arr)  
        {  
            int min = int.MaxValue;  
            foreach(int ar in arr)  
            {  
                if (ar < min)   
                {  
                    min = ar;  
                }  
            }  
            return min;  
        }  
    }     
};  
Mainly talk about the pair method:

It is essentially the similarity calculated by calculating the ratio of how many different characters there are to the total number of characters.

At the beginning, a two-dimensional array is maintained. This two-dimensional array is meaningful. The data with index [i][j] stores the number of differences between the first i characters of string 1 and the first j characters of string 2.

So at the time of initialization, what we can determine is d[0][i]=i;d[i][0]=i;

Then a two-layer loop is a very simple dynamic programming, and the state transition equation is

dif [i, j] = min (dif[i-1,j-1]+temp,dif[i,j-1]+1,dif[i-1,j]+1);  

It is easy to understand that the value of dif[i][j] has and only three sources, whether dif[i-1][j-1] plus the i-th is equal to the j-th, or [i-1][j] +1 (because the extra one must be different, there is no character to match with him), or [i][j-1]+1 (similarly)

The last thing is to output dif[len1][len2].

Combining the reading data from excel mentioned in the previous article, we can first write the equivalence table we want to create in excel, read it into the program, and then query the string of each speech recognition in it. , if the match is successful, let the game object perform the corresponding operation, in this way, our speech recognition system is basically completed.

Here are some results from testing the string matching system:


I didn't finish the screenshot, anyway, it probably output the result of pairing similarity.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325517365&siteId=291194637