On Suffix Automaton (suffix automata)

This is a powerful automaton - Suffix Automaton ==> I learned most powerful, Niu Ben, the hardest to understand automata

Now a question for you:

Given a string, the number string that requires all substrings occurring What are

Simple algorithm

① left point enumeration, enumeration right point, with hash record about the number of statistics. (Note that the best double hash, ensure the correct rate)
expected time complexity: \ (O (the n-^ 2) \)
② can directly open trie records to build all suffixes, like so: aabbabd
Write pictures described here
each time to build on the record the number of times each place, because there \ (n ^ 2 \) points, then the time complexity is \ (O (n ^ 2) \)

\ (n ^ 2 \) complexity very good, but ...

n <= 100000 ah ah ah ....

You can not do this, how to do? !

Then we introduce Suffix Automaton (suffix automata)

(Concentrate, in front of high-energy)

Suffix automata defined

A string \ (S \) suffix automaton \ ((the SAM) \) is a finite state automaton \ ((the DFA) \) , and it can accept all \ (S \) suffix (of course, it We can do far not limited suffix).

It is actually a suffix automaton \ (the DAG \) (directed acyclic graph), is a state where the vertex, and the edges represent transitions between the states.

A state \ (S \) is called the initial state, it is possible to reach all the remaining states.

All automatic transfer machine, there is a directed edge, and some symbols are marked, starting from a certain state of all transfers must have a different mark.

State is referred to a final state, the initial state indicates if we \ (S \) via any path come to a final state, and sequentially write the labeled side, the original character string must be obtained suffix string.

In all automata meets the above conditions, the suffix automaton state transition have the fewest and the number of states and the transition suffix automaton are \ (O (| S |) \) a.

Pre definitions

It can be said, pre suffix automaton is the most central thing, the most difficult parts.So take a good look!
pre [X] represents the longest string [S ~ x] suffix [S ~ pre [x]] of the right end point, for example:
Write pictures described here
green dotted line indicates pre edges. Clearly we can find pre 3 side is connected to the [S ~ 3] the longest suffix [S ~ 1] is a right endpoint.

Even the Pre's side

For just joined the now point to find pre las edge, if not a new addition to the current of this same side edge on the plus side of this one, until you find a new current plus side of this same side until. Then divided into two types: we find the current set point p, connect it to the son to q.
Write pictures described here
① If p, q is a distance between two points, it can be directly pre new node currently pointed edges q.
② if p, q 1 is greater than the distance between two points, we found that if the new node is currently pre edge point q, does not meet the [S ~ q] is [S ~ now] suffix (below)
Write pictures described here
(ab & abb is not suffix)
we analyze the reason for this error, because we have the original string as the ab b (that is, even below that edge), so the mistake became abb ab suffix, so we thought we had it even want to b suffix, so this is we need to add some operations.

Additional Operation

We added a point after the p, and p and all points q by the current symbol as an edge connected to the previous, with the newly added point is connected a current symbol the same side, for example, a map is B, because at this point in the original string is non-existent, it's just a spare q, so q point out to all sides of it must be even, then pre newly added point becomes a natural side q original pre edge, then q and now the pre side will point to the current plus new node:
Write pictures described here
for "aabbabd" for its auto captain this:
Write pictures described here

time complexity

Time complexity is: \ (O (n-) \) .
The maximum number of states is 2n-1, because in addition to the first three points, all points can be added later point.

application

1. given string T, each time a query p, p asked whether the T sub-string.

  • T to build a suffix automaton.
  • Always ask began to go from the initial state ss, then walk along the string query.
  • Time complexity \ (O (| T | + Σ | p |) O (| T | + Σ | p |) \)

2. Given a string S, it asked how many different sub-strings.

  • It would first build a suffix automaton.
  • Then a suffix automaton for any path, is a different string.
  • So the answer is the number of different paths starting at S departure.

3. given string S, each time all of the different sub-query strings lexicographically of small S k.

  • And the first two ask similar, we only have to deal start from a state, the number of paths for each character.
  • Then from the starting point S has been looking just fine.

4. Given the string S, and find it loops isomorphic lexicographically smallest string.

  • We make a suffix automaton string S + S, then lexicographically smallest greedy search Jiuhaola ~

5. Given a plurality of strings, they find the longest common substring.

  • Thinking ...

Template Code

#include<cstdio>
#include<iostream>
#include<cstring>
#define maxN 2000010
using namespace std;
int son[maxN][27],pre[maxN],len[maxN],sum[maxN];
int sz,las,lens,now,q,p;
char s[maxN];
void add(int x)
{
    len[++sz]=len[las]+1,sum[sz]=1,now=sz;
    for (p=las;p&&!son[p][x];p=pre[p]) son[p][x]=now;
    if(p)
    {
        q=son[p][x];
        if(len[q]>1+len[p])
        {
            len[++sz]=len[p]+1;
            memcpy(son[sz],son[q],sizeof(son[q]));
            pre[sz]=pre[q];pre[q]=pre[now]=sz;
            for (;son[p][x]==q;p=pre[p]) son[p][x]=sz;
        }
        else pre[now]=q;
    }
    else pre[now]=1;
    las=now;
}
int main()
{
    scanf("%s",s+1);
    lens=strlen(s+1);sz=las=1;
    for (int i=1;i<=lens;++i) add(s[i]-96);
} 

Guess you like

Origin www.cnblogs.com/Chandery/p/11332806.html