AC automatic machine is used to doing:
AC automatic machine is used to solve the multi-mode matching problem, for example, words s1, s2, s3, s4, s5, s6, Q: There appeared a few words, similar to the text string ss.
AC automaton achieve this function requires three parts:
1, all the words in a dictionary tree method achievements
2, constructed mismatch pointer
3, a text string in the lookup function
Here mainly about 2 and 3
First, achievements
int tree[400005][26],vis[400005],fail[400005]; int t,n,cnt,id,root,num=0; string s,ss; void insert()//建树 { root=0; for(int i=0;s[i];i++) { id=s[i]-'a'; if(tree[root][id]==0) tree[root][id]=++num; root= Tree [the root] [ID]; } VIS [the root] ++; // words ending tag }
Second, the pointer mismatch Construction
Role mismatch pointer is this: When a text string mismatch in the current node, which node we should continue to go to match
After mismatch pointer function can find the longest common suffix length text string is [0, current node position] of the string
How to build a mismatch pointers:
Obviously, we have to do is to quickly find all fail pointer points. We obtained the order bfs turn each node fail, so that when we ask a node fail, it must have been his father's fail seeking out. If the current node is A, parent node of B, fail B to C, the string constant C represents the longest suffix of B. If there is a son C D A character equivalent characters, then obviously the string represented by D (C plus a character) is represented by a string of A (B plus a character) of the longest suffix. If C does not have a son, so the characters A character with the equivalent of it? Very simple, just need to access C's fail on the line. And so forth, until the suffix A find the longest, pointing up or fail A root node. (A no suffix in the Trie, a re-match obediently back to the root of it!)
step:
- For less special judge, the son of a setting for all auxiliary root node 0, node 0 are pointing to the real root node No. 1, then fail No. 1 node to the node 0.
- fail to find the node number 0 node 2 node's father node, node 0 see there is a child node of. There are, then 2 nodes fail point No. 1 node.
- fail to find the node number 0 node 3 node's father node, node 0 see there is no child node b. There are, then node 3, node number points to fail.
- fail to find the node node No. 1 No. 4 node's father node, see No. 1 for the node has no child node b. There are, then fail No. 3 No. 4 node to the node.
- Ibid.
- Ibid.
- Ibid.
- 找到8号节点的父亲节点的fail节点5号节点,看5号节点有没有为b的子节点。没有,于是再找到5号节点的fail节点2号节点,看2号节点有没有为b的子节点。有,于是8号节点的fail指向4号节点。
代码:
void build()//构建失配指针 { queue<int>p; for(int i=0;i<26;i++) { if(tree[0][i])//将第二行所有出现过的字母的失配指针指向root节点0 { fail[tree[0][i]]=0; p.push(tree[0][i]); } } while(!p.empty()) { root=p.front(); p.pop(); for(int i=0;i<26;i++) { if(tree[root][i]==0)//没有建树,不存在这个字母 continue; p.push(tree[root][i]); int fa=fail[root];//fa是父亲节点 while(fa&&tree[fa][i]==0)//fa不为0,并且fa的子节点没有这个字母 fa=fail[fa];//继续判断fa的父亲节点的子节点有没有这个字母 fail[tree[root][i]]=tree[fa][i];//找到就构建失配指针 } } }
三、查找函数
for循环遍历一遍文本串,统计被标记的次数,记录最终答案
这里要注意的是,失配指针不仅仅是在失配的时候起作用
为了不让这种事情发生,我们每遇到一个fail指针就必须进行“失配”转移,以保证不会漏过任何一个子串,就像这样:
代码:
int search(string ss)//查找 { root=0,cnt=0; for(int i=0;ss[i];i++) { id=ss[i]-'a'; while(root&&tree[root][id]==0)//失配转移 root=fail[root]; root=tree[root][id]; int temp=root; while(vis[temp]) { cnt=cnt+vis[temp]; vis[temp]=0;//清除标记,避免重复 temp=fail[temp]; } } return cnt; }
模板题:https://www.cnblogs.com/-citywall123/p/11300251.html