[LeetCode] 1032. Character stream (hard) - dictionary tree, automaton, prefix tree, Trie tree

[topic link]

https://leetcode.cn/problems/stream-of-characters/

【The main idea of the topic】

Given a word array and a character stream, check whether there is a matching word in the word array according to the suffix of the real-time character stream, and return $t r u e$ , otherwise return $f a l se$ 。

【Input example】

Input:
["StreamChecker", "query", "query", "query", "query", "query", "query", "query", "query", "query", "query", "query" , "query"]
[[["cd", "f", "kl"]], ["a"], ["b"], ["c"], ["d"], ["e" ], ["f"], ["g"], ["h"], ["i"], ["j"], ["k"], ["l"]] output: [null
,
false , false, false, true, false, true, false, false, false, false, false, true]
Explanation:
StreamChecker streamChecker = new StreamChecker([“cd”, “f”, “kl”]);
streamChecker.query( "a"); // Returns False
streamChecker.query("b"); // Returns False
streamChecker.query("c"); // Returns False
streamChecker.query("d"); // Returns True, because 'cd' is in words
streamChecker.query("e"); // returns False
streamChecker.query("f"); // returns True because 'f' is in words
streamChecker.query("g"); // Returns False
streamChecker.query("h"); // Returns False
streamChecker.query("i"); // Returns False
streamChecker.query("j"); // Returns False
streamChecker.query("k"); // Returns False
streamChecker.query("l"); // Returns True because 'kl' is in words

【data range】

1 <= words.length <= 2000

1 <= words[i].length <= 200

words[i] consists of lowercase English letters

letter is a lowercase English letter

Call query at most $4 * 10^4$ times

Analog TLE

Since it is a suffix match, the condition for a correct match must be that the current character in the character stream is the end letter of a certain word. Therefore, we first classify and store words by ending letter, and each time use the current letter of the character stream to build a word forward, and look for a list of words ending with the current letter. We can record the maximum word length of each word list, and limit the maximum length of each forward construction word. Ignoring the initialization and lookup time of the hash table, theoretically only need $200*4 * 10^4$ , but still not within the time limit, TLE as expected.

class StreamChecker {
    
    
public:
    map<char,set<string>> st;//根据尾字符将单词分组
    map<char,int> mp;//记录每个组里面单词的最大长度，也可以不要这个，默认单词最大长度200
    string ss;
    StreamChecker(vector<string>& words) {
    
    
        for(auto v:words){
    
    
            char c=v[v.length()-1];
            st[c].insert(v);
            if(mp[c]<=v.length()) mp[c]=v.length();
        }
    }
    
    bool query(char letter) {
    
    
        ss+=letter;
        auto s=st[letter];
        if(!s.empty()){
    
    //有以这个字符结尾的单词
        string w="";
            for(int i=0;i<mp[letter]&&i<ss.length();i++){
    
    //向前拼接新单词并搜索
                w=ss[ss.length()-1-i]+w;
                if(s.count(w)) return true;
            }
        }
        return false;
    }
};

/**
 * Your StreamChecker object will be instantiated and called as such:
 * StreamChecker* obj = new StreamChecker(words);
 * bool param_1 = obj->query(letter);
 */

optimization

The problem-solving idea remains the same, but the search process is optimized. We create an automaton, or prefix tree. The presence or absence of a word is judged by the reachability of the tree. This prefix tree contains two attributes, one is the child node $c hi l d re n [26]$ and the end flag $i s E n d$ . First traverse each word, each word extends downward from the root node, creates its own path in the dictionary tree and sets the end flag to $t r u e$ . Every time you search, you also go down from the root node of the tree according to the current character. If you go to an unreachable place, it will not match. If you encounter $i s E n d = t r u e$ , the match is successful. This is very similar to a finite automatic state machine, so it is also called an AC automaton. See code comments for details.

class StreamChecker {
    
    
public:
    struct Node{
    
    
        vector<Node*> children;
        bool isEnd;//重点标识
        Node():children(26),isEnd(false){
    
    }//初始化属性值，LeetCode不做这个会报错
    }; 
    string s;//字符流
    Node* trie=new Node();//树的根节点
    StreamChecker(vector<string>& words) {
    
    
        for(auto w:words){
    
    
            Node* node=trie;
            for(int i=w.size()-1;i>=0;i--){
    
    
                int idx=w[i]-'a';
                if(node->children[idx]==NULL) node->children[idx]=new Node();//如果不存在就创建一个节点
                node=node->children[idx];//存在则往下走
            }
            node->isEnd=true;//单词结束，设置终点
        }
    }
    
    bool query(char letter) {
    
    
        s+=letter;
        Node* node=trie;
        for(int i=0,j=s.size()-1;i<200&&j>=0;i++,j--){
    
    //结束条件为达到单词上限200或字符流遍历结束
            int idx=s[j]-'a';
            if(node->children[idx]==NULL) return false;//进入了不可达状态
            node=node->children[idx];
            if(node->isEnd) return true;//匹配成功
        }
        return false;
    }
};

/**
 * Your StreamChecker object will be instantiated and called as such:
 * StreamChecker* obj = new StreamChecker(words);
 * bool param_1 = obj->query(letter);
 */