Trie (dictionary tree, prefix tree)

What is Trie?

  Trie is a multi-fork tree, and Trie is specifically designed to handle strings. Use the binary search tree we implemented earlier to query the words in the dictionary. The time complexity of the query is O (logn). If there are 1 million (2 20 ) words, logn is approximately equal to 20, but the data structure of Trie is used. , Query the time complexity of each entry, regardless of how many entries there are! The time complexity is O (w), w is the length of the word being queried! Most words are less than 10.

  Trie splits the entire string in letters, one by one, and traverses from the root node to the leaf node to form a word. Trie in the figure below stores four words (cat, dog, deer, panda)

  Each node has a pointer of 26 letters to the next node. Consider different languages ​​and different situations. For example, the 26 characters do not contain capital letters. If you need to include capital letters, you need to make each node There are 52 pointers to the next node. What if I want to join the mailbox now? So it is described here that each node has several pointers to the next node.

  Since many words may be the prefix of another word, for example, pan is the prefix of panda, then how to store it in Trie? Therefore, we should add an identifier to the node to determine whether the node is the end of a certain word. The end of a word cannot be distinguished by the leaf node. Therefore, when we design the Node node, we should add an IsWord to determine Whether the node is the end of a word.

Create a Trie

  Before creating Trie, we need to design Trie's node class. According to the above, each node has several pointers to the next node, and an isWord is needed to determine whether it is the end of the word. The code implementation is as follows:

    //设计Trie的节点类
    private class Node{
        
        //判断是否是一个单词
        public boolean isWord;
        //每个节点有若干个指向下个节点的指针
        public TreeMap<Character,Node> next;

        //有参构造:对该节点进行初始化
        public Node(boolean isWord){
            this.isWord = isWord;
            next = new TreeMap<>();
        }
        
        //无参构造:默认当前节点不是单词的结尾
        public Node(){
            this(false);
        }

    }

  Now let's implement a Trie

public class Trie {

    //设计Trie的节点类
    private class Node{

        //判断是否是一个单词
        public boolean isWord;
        //每个节点有若干个指向下个节点的指针
        public TreeMap<Character,Node> next;

        //有参构造:对该节点进行初始化
        public Node(boolean isWord){
            this.isWord = isWord;
            next = new TreeMap<>();
        }

        //无参构造:默认当前节点不是单词的结尾
        public Node(){
            this(false);
        }

    }

    private Node root;
    private int size;

    public Trie() {
        root = new Node();
        size = 0;
    }

    // 获得Trie中存储的单词数量
    public int getSize(){
        return size;
    }
}

Add elements to Trie

  Trie's add operation: what is added is a string. We need to split this string into a character, and use this character as a node to store it in Trie.

    //向Trie中添加一个新的单词word
    public void add(String word){
        Node cur = root;
        for (int i = 0 ;i < word.length(); i++){
            //将这个新单词,拆成一个一个字符
            char c = word.charAt(i);
            //如果当前节点的若干个子节点中,没有存储当前字符的节点,则需要创建一个子节点,存储当前字符
            if (cur.next.get(c) == null){
                cur.next.put(c,new Node());
            }
            cur = cur.next.get(c);
        }
        //对添加的新单词遍历结束后,判断当前节点是否为单词的结尾,如果不是我们才对size加一,并且维护当前节点的isWord
        if (! cur.isWord){
            cur.isWord = true;
            size ++;
        }

    }

Trie's query operation

    //Tire的查询操作
    public boolean contains(String word){
        Node cur = root;
        for (int i = 0;i < word.length(); i++){
            char c = word.charAt(i);
            if (cur.next.get(c) == null ){
                return false;
            }
            cur = cur.next.get(c);
        }
        return cur.isWord;
    }

  With the query type, we can write whether there is a word prefixed by a certain word

    //查询在Trie中是否有单词以prefix为前缀
    public boolean isPrefix(String prefix){
        Node cur = root;
        for (int i = 0; i < prefix.length(); i++){
            char c = prefix.charAt(i);
            if (cur.next.get(c) == null) 
                return false;
            cur = cur.next.get(c);
        }
        return true;
    }

Compare the performance of binary search tree and Trie

  Here is a comparison of the performance of the binary search tree and Trie. It is still used to add and count the book "Pride and Prejudice" as an example. About the file tools in the test case and the "Pride and Prejudice" document, please go to me before writing the collection and mapping is acquired.

    public static void main(String[] args) {
        System.out.println("Pride and Prejudice");

        List<String> words = new ArrayList<>();

        if(FileOperation.readFile("pride-and-prejudice.txt", words)){
//            Collections.sort(words);

            long startTime = System.nanoTime();

            //使用基于二分搜索树实现的集合进行添加和查询操作
            BSTSet<String> set = new BSTSet<>();
            for(String word: words)
                set.add(word);

            for(String word: words)
                set.contains(word);

            long endTime = System.nanoTime();

            double time = (endTime - startTime) / 1000000000.0;
            //基于二分搜索树实现的集合进行添加和查询操作所花费的时间
            System.out.println("Total different words: " + set.getSize());
            System.out.println("BSTSet: " + time + " s");

            // --- 测试通过Trie通过添加和查询所需要的时间

            startTime = System.nanoTime();

            Trie trie = new Trie();
            for(String word: words)
                trie.add(word);

            for(String word: words)
                trie.contains(word);

            endTime = System.nanoTime();

            time = (endTime - startTime) / 1000000000.0;

            System.out.println("Total different words: " + trie.getSize());
            System.out.println("Trie: " + time + " s");
        }

    }


  It can be seen from the above test code that in fact, when the amount of data is not large, for a collection of random strings, the use of binary search books and Trie to add and query operations, the difference is not big, if the data we add is Ordered, the binary search tree will degenerate into a linked list at this time, the time complexity is O (n), and the operation efficiency is very low, but Trie is not affected. We can sort the words and look at the operation Results:

  Through the above test, it can be seen that adding and querying the ordered data, the gap is particularly large.

The problem on leetcode

  We can see the 208 good problem on the leetcode official website, that is, implementing a Trie can

actually be seen from the title description. The three methods in this problem are the add (), contains (), isPrefix () operations that we implement, Directly change the code we wrote to a method name and submit it.



Let's take another look at question 211 on leetcode: adding and searching words

  through the title description, we will find that only the query operation is different from the Trie we implemented, and the addition operation has not changed. Since the character '.' Can represent any letter, we need to traverse all possibilities for '.'.

    public boolean search(String word) {
        //递归匹配查找
        return match(root,word,0);
    }

    private boolean match(Node node, String word, int index) {
        if (index == word.length())
            return node.isWord;

        char c = word.charAt(index);
        if (c != '.'){
            if (node.next.get(c) == null)
                return false;
            return match(node.next.get(c),word,index+1);
        }
        else {
            //如果当前节点的的值为‘.’,则需要遍历当前节点的所有子节点
            for (char nextChar : node.next.keySet()) {
                if (match(node.next.get(nextChar),word,index+1)){
                    return true;
                }
            }
            return false;
        }
    }

After the code is submitted to leetcode, you will be prompted to pass.

Let's take a look at question 677 on leetcode: Map Sum Pairs (key-value mapping)

  According to the description of the title, we can understand that: the map stores words and weight values. The sum () method is to obtain the weight and the
code that contains this prefix word. The code implementation is as follows:

    //设计节点类
    private class Node{
        //单词的权重值
        public int value;
        //每个节点都可能有若干个子节点
        public TreeMap<Character,Node> next;

        public Node(int value){
            this.value = value;
            next = new TreeMap<>();
        }

        public Node(){
            this(0);
        }
    }

    private Node root;

    public MapSum(){
        root = new Node();
    }

    //添加操作和我们实现的字典树中的添加操作类型
    public void insert(String word,int val){
        Node cur = root;

        for (int i = 0 ; i < word.length() ; i++){
            char c = word.charAt(i);
            if (cur.next.get(c) == null){
                cur.next.put(c,new Node());
            }
            cur = cur.next.get(c);
        }
        cur.value = val;
    }

    //求前缀为prefix的权重和
    public int sum(String prefix){
        Node cur = root;
        for (int i = 0 ; i < prefix.length() ; i++){
            char c = prefix.charAt(i);
            if ( cur.next.get(c) == null ){
                return 0;
            }
            cur = cur.next.get(c);
        }
        return sum(cur);
    }

    private int sum(Node node) {
        int res = node.value;
        for (char c : node.next.keySet()) {
            res += sum(node.next.get(c));
        }
        return res;
    }

Leetcode submission results:

Guess you like

Origin www.cnblogs.com/reminis/p/12724463.html