Trie tree preliminary learning

What is Trie tree

Trie tree has many names, dictionary tree, prefix tree.

Trie trees are mainly used to store strings or binary numbers efficiently , which can avoid the storage of duplicate elements and improve search efficiency.

weTake storing 26 lowercase letters as an example:
Since there are 26 letters, starting from the root node node, each parent node has a maximum of 26 child nodes.

1. Insert operation

The creation of the Trie tree starts from the root node, suppose we want to insert the string "in".

  1. We are at the root at the beginning, which is node 0, which we use to P=0indicate. We look at P Is there a logo with characters i edge connected to child nodes. They found no edges, so we create a new node, the node is No. 1, and the edge is identified as character i . Then we move to node 1, which is Ling P=1. In this way, we insert the "in" i character into the Trie tree.

  2. Then we insert characters n, is first find P, that is not marked with a node number 1 is the character n sides, or not, then a further new node 2, and the edge is identified as the character n . Finally move to P=2. In this way, we insert n as well.

  3. Since n is the last character of "in", we also need to mark the node P=2 as the end point .

The same is true for inserting other strings. If the child node already exists, there is no need to create it again, just take advantage of it, if not, create it again.

2. Find operation

How to check if the string S is contained in the Trie tree?

We only need to start from the root node and move along the edge marked S[0]->S[1] -> S[2] -> S[3]… -> S[S.len],

  1. If finally reach an end point successfully, it means that S is in the Trie tree;
  2. If there is no way to go at the end, or a node that is not the end point is reached, it means that S is not in the Trie tree.

3. Two-dimensional array simulation Trie tree

3.1 Realize the storage function

int trie[M][26];
Still taking a trie tree storing 26 lowercase letters as an example,
the value of M depends on the sum of the number of characters in all input strings . In the worst case, a node may be established for each character;
26 because there are at most 26 Branches.

Use a two-dimensional array to simulate a Trie tree:

1. trie[i]Represents the i-th row of the two-dimensional array, and the number i of each row corresponds to the meaning of the node number p in the Trie tree in the above figure :i==p

Note that it is possible that the node numbers of the 26 child nodes of the root node are not 1 to 26, which is related to the order in which this character appears .
It is possible trie[0][3]=233;that the third branch of the root node, the node number is 233, in the 233rd row of the two-dimensional arraytrie[233]

This need to define a variable additional int idx=1;maintenance, No. 0 node is the root node, no two ways , each time starting from the root search, insert, so idx directly from the beginning. When storing different characters, which character appears first will be stored on which edge extends first.
The role of idx is equivalent to the role of idx in a singly linked list.

2. Each row has 26 columns, which means there are 26 branches, here it is only from az, that is, from 0 to 25;

charset[i]=ch; i is from 0 to 25, corresponding to lowercase letters a to z, namelycharset[i]=i+'a';

Three, trie[i][j]=x;the meaning:

  1. If x is equal to 0, it means that the node with node number i has no extended chaset[j]edges, that is, no chaset[j]child nodes have been reached ;
  2. If x is not equal to 0, it means that the node with node number i has a protruding chaset[j]edge, that is, there chaset[j]are child nodes that are reached , and the node number subscript is x must have x>i

So how exactly are characters stored?

trie[0]It is the root node. The node number i is 0. The meaning of the value stored in the entire two-dimensional array is the number of its next node.
trir[i][j]=xSuch a process of storing values ​​means that from node i to node x, extending a one charset[j]side, i.e. the character stored charset[j], it is by an edge between the nodes is stored , not the node itself through.

3.2 Realize the end of mark function

Take the tea path in the above figure as an example. In the figure, only the a node marks the end tag, all only has the string "tea", but not the string "te";
look at the path int, n node and t node. Both have end tags, so the strings "in" and "int" are stored in the trie tree.

int cnt[M];
We open a cnt array to maintain the number of each node in the Trie tree, that is, each row in the two-dimensional array.
cnt[i]=x;The meaning of is: the number of character strings ending with node number i is x, but only one path can point to the node with node number i , so the number of occurrences of a certain character string is counted.

Title description

Maintaining a collection of strings supports two operations:

"I x" inserts a character string x into the set;
"Q x" asks how many times a character string appears in the set.
There are N operations,The total length of the input string does not exceed 10 5, The string contains only lowercase English letters.

Input format The
first line contains the integer N, which represents the operand.

Next N lines, each line contains an operation instruction, the instruction is one of "I x" or "Q x".

Output format
For each query command "Q x", an integer must be output as the result, which represents the number of times x appears in the set.

Each result occupies one line.

Data range
1≤N≤2∗10 4
input example:

5
I abc
Q abc
Q ab
I ab
Q ab

Sample output:

1
0
1

Algorithm implementation

#include <iostream>
#define read(x) scanf("%d",&x)

using namespace std;

const int M=1e5+10;
int trie[M][26],cnt[M],idx=1;  "0号结点是根节点,有效的字符存储是从下标1开始的,i->j的边才表示一个字符:0->1"
                                                   "0->1的边每次存储哪个字符是不确定的,i->j同理"
void insert(char *str)
{
    
    
    int p=0;  //从根节点开始插入,p表示节点编号
    for (int i=0;str[i]!='\0';i++) {
    
    
        int t=str[i]-'a'; //指向具体哪个分支
        if (trie[p][t]==0) trie[p][t]=idx++; //如果分支不存在的话,开辟一个新结点
        p=trie[p][t];   //指向该分支存储在的下一个结点,形成了边,到这一步时才说明插入字符str[i]了
    }
    cnt[p]++; //统计字符串个数
}

int find(char *str)
{
    
    
    int p=0; //从根节点开始查找,p表示节点编号
    for (int i=0;str[i]!='\0';i++) {
    
    
        int t=str[i]-'a';
        if (!trie[p][t]) return 0; //顶点p->triep[p][t]不可达,说明不存在边str[i]了,即没有存储字符str[i]
        p=trie[p][t];
    }
    return cnt[p];
}

int main()
{
    
    
    int n;
    read(n);
    char op[3],str[M];
    while (n--) {
    
    
        scanf("%s%s",op+1,str);   "尝试从下标1开始输入字符串,注意'\0'也要占位置"
        if(op[1]=='I') insert(str);
        else printf("%d\n",find(str));
    }
    
    return 0;
}

Guess you like

Origin blog.csdn.net/HangHug_L/article/details/114178852