Dictionary tree string matching

topic

Oh no! You accidentally deleted the spaces and punctuation in a long article, and changed the uppercase to lowercase. A sentence like "I reset the computer. It still didn't boot!" has become "iresetthecomputeritstilldidntboot". Before dealing with punctuation and capitalization, you have to break it into words. Of course, you have a thick dictionary, but some words are not in the dictionary. Suppose the article is represented by sentence, design an algorithm to break the article, require the least unrecognized characters, and return the number of unrecognized characters. link

class Solution {
    
    
    static class Tire {
    
    
        Tire[] next;
        boolean isEnd;

        public Tire() {
    
    
        	//以数组下标表示26个字母
            next = new Tire[26];
        }

        public void insert(String s) {
    
    
            Tire cur = this;
            for (int i = s.length() - 1; i >= 0; i--) {
    
    
            	//对每个单词进行倒序插入,方便后面查找
                int idx = s.charAt(i) - 'a';
                if (cur.next[idx] == null) {
    
    
                    cur.next[idx] = new Tire();
                }
                //循环构建树
                cur = cur.next[idx];
            }
            cur.isEnd = true;
        }
    }

    public int respace(String[] dictionary, String sentence) {
    
    
        Tire root = new Tire();
        for (String s : dictionary) {
    
    
            root.insert(s);
        }
        
        int len = sentence.length();
        //dp[i]表示sentence中以第i个字母为止
        //未匹配的字符数
        int[] dp = new int[len + 1];
        
        for (int i = 1; i <= len; i++) {
    
    
        	//初始dp[i]为前一个字符处未匹配的个数加一
            dp[i] = dp[i - 1] + 1;
            //从根开始查找
            Tire cur = root;
			//以当前字符结尾开始向前查找
            for (int j = i; j >= 1; j--) {
    
    
            	//通过字符确定next数组下标
                int idx = sentence.charAt(j - 1) - 'a';
                //没有字符匹配跳出循环
                if (cur.next[idx] == null) {
    
    
                    break;
                }//查找到末尾, 
                else if (cur.next[idx].isEnd) {
    
    
                	//dp[j-1]处为匹配完一个单词后,再往前一个字符
                	//所以比较当前未匹配字符数和dp[j-1]处未匹配字符数
                    dp[i] = Math.min(dp[i], dp[j - 1]);
                }
                //否则继续向下查找
                cur = cur.next[idx];
            }
        }
		//最后一个字符未匹配的个数
        return dp[len];
    }
}

Guess you like

Origin blog.csdn.net/qq_42007742/article/details/107223639