Construction and application of Trie tree

    Trie tree is dictionary tree. When we look up a dictionary, it is impossible to see a word and search sequentially from the first page to find the desired word. It must be based on the shared prefix to continuously narrow the search range. That is, if we look up air, we first search for a, and then Search for ai, then search for air, and ai starts with avian, then these two words share the prefix ai, and then the different words in the following are used as subtrees, forming a dictionary tree structure.

    The following mainly talks about the implementation and application.

Table of contents

Trie tree

insert word

look up word

 application


Trie tree

    The above is a simple dictionary tree, from the root node a down to any point can be a word, such as ab, ac, abc.

    When querying, if you want to query aba, you can first query a, then query ab, and then query aba, and find that you can find it. If it is abd, you can obviously not find it.

insert word

   Next is how to construct a Trie tree. 

   Since the first word of each word is uncertain, it may be necessary to build a tree with az as the root node, which is troublesome, so the root node is established as the common root node . The Trie tree can actually be regarded as a directed graph, each node points to its own child nodes, and the process of building is actually the process of building a graph.

    Consider setting a linked list for each node, so all nodes can form a linked list array , and this linked list is connected to this point as the parent node, and the linked list position of the point pointed to.

     This is very abstract. For example, for example, ab, we need to create a linked list for each node, and multiple linked lists form a linked list array. We need to create a linked list for a. The position of the linked list array is 1, followed by b, and the b node is in The array position of the linked list array is 2, then the link b node of the linked list of node a, the value is 2 (1->2).

    This may still be very abstract, so let's talk about the actual operation. In actual operation, for the sake of convenience, the linked list is generally not actually written. In fact, a two-dimensional array will be used instead (that is, the structure drawn above). One line represents a linked list, and each value stores the point to which the point is connected . Where is the linked list of the points . Every time a word is built word by word, if a point is encountered and no node is established (that is, there is no linked list pointed to), then adding one at the end is equivalent to establishing a node. Finally, in order to judge whether each node has a word ending here, mark the position of the last word node, introduce cnt, and if it ends at this point once, give this point +1, and finally see if this point cnt is 0, then it means that no word ends here.

code:

const int N = 1000050;
int trie[N][26];
int cnt[N];//记录以每条链表起点作为结尾的个数
int id;

void insert(string s)
{
	int p = 0;
	for (int i = 0; i < s.size(); i++)
	{
		int x = s[i] - 'a';
		if (trie[p][x] == 0) trie[p][x] = ++id;//如果这个点还没有建立属于自己的链表,那么在最后追加一个新链表,标记位置
		p = trie[p][x];//记录这条个点引导链表所在位置,和字母维度结合可以实现定位任何一点
		/*
		trie实际上也是一个建边过程,但是一个点不一定只出现在一个地方,也就是a会被分为a1,a2,a3...散布于图的不同位置
		建立二维数组,每一行表示一个起点,和列维度结合,定位特定字母,存储这个点所连向的点在哪行
		比如当前是a,在0行,连向a,b,第一行存储了这个a所连的,第二行b连的,那么trie[0][a]=1,trie[1]=2
		当前存储了aa,ab
		*/
	}
	cnt[p]++;//表示录完了一个单词,然后这个词所在的点+1,以此来看有没有单词在这个点结束,要是有的话那么就不为0
}

look up word

    If you want to find out whether a word appears, then find it in the order of the words. If you find a point without starting from this point, then you will definitely not be able to find it later. For example, if you want to find abc, find b and find that there is no follow-up, then There is no way to talk about c. If c is found, then it depends on whether there is a node ending with c. If there is, then it is counted as existing. This uses the cnt mark mentioned above.

 code:

int  find(string s)
{
	int p = 0;
	for (int i = 0; i < s.size(); i++)
	{
		int x = s[i] - 'a';
		if (trie[p][x] == 0)return 0;//如果找到终点了,也就是都不存在这个点为起点的,那么可以结束了
		p = trie[p][x];
	}
	return cnt[p];//看看有没有找到,因为有点在这里结束那么cnt[p]!=0
}

 application

Topic Link: [TJOI2010] Reading Comprehension - Luogu 

     This question requires us to tell the number of lines where each word appears, so first construct the trie tree, because we need to say lines, so at this time, only the number of times each point is used as the end point of the mark cannot realize the function, so we need to mark each Points are used as the number of lines at the end point, so the bitset type is introduced (because opening bool will burst the memory, and opening bitset saves space) to mark which lines appear this word as the end word, and output it later.

code:

#include<stdio.h>
#include<algorithm>
#include<string.h>
using namespace std;
#define N 5100000
#include <bitset>

bitset<1001> b[N];//开bool爆内存,开bitset节省空间,功能是标记这个节点在哪一行出现作为结尾
int trie[N][26],cnt;//二维数组存储,每行是一个节点,cnt记录节点数
int n;

void insert(char a[], int row)
{
	int st = 0;//st链表代表当前所在起点
	for (int i = 1; i <= strlen(a + 1); i++)
	{
		int x = a[i] - 'a';
		if (!trie[st][x])trie[st][x] = ++cnt;
		st = trie[st][x];
	}
	b[st][row] = 1;//表示这个字母在这一行出现了
}

int display(char a[])
{
	int st = 0;
	for (int i = 1; i <= strlen(a + 1); i++)
	{
		int x = a[i] - 'a';
		if (trie[st][x] == 0)return 0;
		st = trie[st][x];
	}
	return st;
}

int main()
{
	scanf("%d", &n);
	int num;
	char a[22];
	for (int i = 1; i <= n; i++)//行
	{
		//printf("start:%d\n", i);
		scanf("%d", &num);//这一行的单词数
		for (int j = 1; j <= num; j++)//每个单词
		{
			scanf("%s", a+1);
			//printf("%s\n", a+1);
			insert(a, i);//插入单词
		}
		//printf("end:%d\n", i);
	}
	int m;
	scanf("%d", &m);
	for (int i = 1; i <= m; i++)
	{
		scanf("%s", a + 1);
		int st = display(a);//找到这个单词作为结束点出现在链表数组中的位置
		if (st)//出现了
		{
			for (int j = 1; j <= 1000; j++)
				if (b[st][j])
					printf("%d ", j);
		}
		printf("\n");
	}
	return 0;
}

Partial code reference from: [data structure] dictionary tree TrieTree graphic detailed explanation_Avalon Demerzel's blog-CSDN blog_dictionary tree 

Guess you like

Origin blog.csdn.net/weixin_60360239/article/details/128894196