[Daily Practice] - trie (Trie)

Title Description

Given a list of words, this list will be encoded as a string S index and an index list A.
For example, if the list is [ "time", "me" , "bell"], we can be expressed as S = "time # bell #" and indexes = [0, 2, 5 ].
For each index, we can start by reading the string index from the string S in position until the "#" end, to restore our previous list of words.
Then the minimum length of the string to the success of a given word list for encoding is how much?
The title comes from the leetcode, click to enter

Read title

[ "Time", "me", "bell"] is to be encoded list, "time # bell #" is the result of the coding, indexes = [0, 2, 5] corresponding to three words in the coding results starting position.

t i m e # b e l l #
0 1 2 3 4 5 6 7 8 9

Examples in the "me" because it is "time" suffix, so me words do not appear in the final coding results in, so we can just find the word list, which word is another word suffix to contain it. So here could be considered trie to solve.
Trie Detailed poke here
The following illustration shows a preserved structure of seven words of the dictionary tree, seven words are: "a", "to" , "ten", "be", "by", "bee" , "bye"
Here Insert Picture Description
how to understand Fengyun trees? You go leaf node from the root node, try to take it all the paths. You will find that each path from the root to leaf nodes constitute the word (and some do not need to go to a leaf node is the word, such as "be" and "by"). trie tree each node only needs to save the current character on it.

answer

public class Solution {
    public static void main(String[] args) {
        String[] strings = new String[]{"time","me","bell","la","ll"};
        System.out.println(minimumLengthEncoding(strings));
    }
    public static int minimumLengthEncoding(String[] words) {
        int len = 0;
        Trie trie = new Trie();
        // 先对单词列表根据单词长度由长到短排序
        // 排序的意义:比如示例中的["time", "me", "bell"]的逆序就是["emit", "em", "lleb"]。
        // 我们可以发现em是emit的前缀。所以"em"就可以忽略了。我们必须要先插入单词长的数组,否则会有问题。
        // 比如如果我先插入了"em",再插入"emit",会发现两个都可以插入进去,很显然是不对的,所以在插入之前需要先根据单词的长度由长到短排序
        Arrays.sort(words, (s1, s2) -> s2.length() - s1.length());
        // 单词插入trie,返回该单词增加的编码长度
        for (String word: words) {
            len += trie.insert(word);
        }
        return len;
    }
}
// 定义tire
class Trie {

    TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

    public int insert(String word) {
        TrieNode cur = root;
        boolean isNew = false;
        // 因为题中要求的是某个单词为另外一个单词的后缀则不用计数,所以这里倒着插入单词
        for (int i = word.length() - 1; i >= 0; i--) {
            int c = word.charAt(i) - 'a';
            if (cur.children[c] == null) {
                isNew = true; // 是新单词
                cur.children[c] = new TrieNode();
            }
            // 这里很重要,如果读取到这个节点,取出这个节点的子节点用于后面循环查找
            cur = cur.children[c];
        }
        // 如果是新单词的话编码长度增加新单词的长度+1(多的"#"号),否则不变。
        return isNew? word.length() + 1: 0;
    }
}

class TrieNode {
    char val;
    // 字母一共26个,所以定义长度为26
    TrieNode[] children = new TrieNode[26];

    public TrieNode() {}
}

Application of the trie

Search engines
such as input in the Baidu search "new crown", it will give you some new crown beginning of the search term for your search
Here Insert Picture Description
word
common points thesaurus, dictionary use more or less trees, or other similar storage string the tree data structure (such as the "double array trie tree"). The reason is because it provides a good prefix query (some of the segmentation algorithm needs to call a lot of the method).

Published 26 original articles · won praise 6 · views 2939

Guess you like

Origin blog.csdn.net/weixin_45676630/article/details/105169585