AC automaton: Aho-Corasick automaton, the algorithm was born in 1975 with an annual output of Bell Labs, is a well-known multi-mode matching algorithm.
Today's review konjac tree-lined AC automaton
Front cheese
- KMP algorithm
- TRIE tree (if this does not go right kindergartens)
Okay, I admit AC automaton than KMP easy to understand.
How to build an AC automaton
- The establishment of a dictionary tree, exactly where the insertion and ordinary trie
- In the trie construction fair array (which is linked with AC automatic machine KMP place)
- Find
How to construct a fair array? / Fair array is a what?
It is the same as fair a mismatch array and the array of KMP next, and for the node X, the stored element is fair fair [X father] son in a same X, if there is no fair point root.
That is fair [x] = x and a point or the root node of the same, and the chain Fair [x] where necessarily present in this strand where X, in other words, Fair [x] jumped to X string must be located forward of the string portion from X, and includes a constant X.
As for how to find fair, it is achieved by BFS
Let's Photo
- Red line: root dequeued two points of h and s is enqueued fair set to root, and h and s
- Blue line: h dequeued child node e is equal to the location of their child nodes fair fair father of e, it is the root, the absence of queued e, while the s team, a child node of his father was fair child nodes fair in a position does not exist as the root, a enqueue child nodes h, fair subnode his father fair in h location, presence, fair [tire [now] [i]] = tire [fair [now]] [i]; h enqueue
- Green Line: e dequeued fair child node of r is equal to r, the position of the neutron node his father's fair (the root), y is the root node, r enqueue, dequeue A sub fair likewise the root does not exist, position of the child node y enqueued, h dequeued child node e is fair equal father fair in e a, fair [tire [now] [i]] = tire [fair [now]] [i], and left sub e connected to the tree, e enqueue, fair for the child node of the root r, enqueue;
这样的话fair数组就整理好了,对于不存在的点,tire[now][i]=tire[fair[now]][i]。如果这个点不存在,那么就指向自己父亲的fair的对应位置。类似于一个状态压缩,如果当前x的fair中并没有与x相同的元素,那么就前往fair[x]的fair,因为fair[x]代表的元素==x,fair[fair[x]]代表的元素==fair[x]代表的元素
void getfail() { queue<int>q; for(int i=0;i<26;i++) { if(trie[0][i]) { q.push(trie[0][i]); fail[trie[0][i]]=0; } } while(!q.empty()) { int now=q.front(); q.pop(); for(int i=0;i<26;i++) { if(trie[now][i]) { fail[trie[now][i]]=trie[fail[now]][i]; q.push(trie[now][i]); } else { trie[now][i]=trie[fail[now]][i]; } } } }
万恶的fair终于整理完成了
下面来说说查询
AC自动机是用来查询原串中出现模式串次数的,所以这里引入一个变量tdword[x]代表在x节点结尾的模式串的个数
那么查询的第一重循环一定是在tire上确定原串每一位的位置的,联想到刚才fair的定义,也就是说一个点如果目前与原串匹配,那么这个点的fair及fair以上字符构成的串也一定能和原串匹配,所以说每到达一个节点,都要访问这个点可以通过fair数组所访问到的每一个点
int query(string x) { int ans=0; int now=0; for(int i=0;i<s.size();i++) { now=trie[now][x[i]-'a'];for(int j=now;j&&tdword[j]!=-1;j=fail[j]) { ans+=tdword[j]; tdword[j]=-1; } } return ans; }
而本人的代码对应的是洛谷上AC自动机的模板,该题要求统计出现模式串的种类数,第二重循环即为根据某个点跳fair,如果tdword[j]==-1的话就代表这个点已经在之前被访问过一遍,那么它的fair肯定也全部都跳完了。所以说跳到tdword[j]==-1的情况就停下来。
OK,完结撒花!