【Topic】Suffix Automaton (Suffix Automaton)

It's a powerful automaton - Suffix Automaton ==> The most powerful, hardest, most difficult automaton I've ever learned

Now to give you a question:

Given a string, how many times do all substrings of this string appear?

naive algorithm

① Enumerate the left endpoint, enumerate the right endpoint, record it with hash, and count the number. (Note that it is best to double hash to ensure the correct rate)
Estimated time complexity: O ( n 2 )
②You can directly open the trie record and build it with all suffixes, just like this: aabbabd
write picture description here
records the number of times of each place every time it is built, because there are n 2 point, then the time complexity is O ( n 2 )

n 2 Complexity is excellent, but...

n<=100000 ah ah ah ah ah...

I can't do this, what should I do? !

At this time we introduce Suffix Automaton (suffix automaton)

(concentrated, high energy ahead)

Suffix Automata Definition

a string S suffix automaton ( S A M ) is a finite state automaton ( D F A ) , it can and can only accept all S suffix (of course, it can do much more than just suffix).

A suffix automaton is actually a D A G (Directed Acyclic Graph) where vertices are states and edges represent transitions between states.

a state S is called the initial state, from which all other states can be reached.

All transitions of an automaton are a directed edge and are marked by some kind of symbol, and all transitions from a certain state must have different labels.

A state is called the terminal state and represents if we start from the initial state S Going to a certain terminal state through any path, and writing out the marks of the passing edges in sequence, the obtained string must be the suffix of the original string.

Among all the automata that meet the above conditions, the suffix automaton has the fewest states and transitions, and the number of states and transitions of the suffix automata are both O ( | S | ) of.

Definition of Pre

It can be said that pre is the core thing in the suffix automaton, and it is also the most difficult part.So look carefully!
pre[x] represents the right endpoint of the longest suffix [S~pre[x]] of the string [S~x], for example:
write picture description here
the green dotted line represents the pre edge. Obviously, we can find that the pre edge of 3 is connected to the right endpoint 1 of the longest suffix [S~1] of [S~3].

Edge of Pre

For the newly added now point, look for the pre edge of las. If there is no edge that is the same as the currently added edge, add such an edge until you find an edge that is the same as the currently added edge. until. Then it is divided into two cases: we set the currently found point to be p, and its connected son to be q.
write picture description here
①If the distance between p and q is 1, then you can directly point the pre edge of the current new node to q.
②If the distance between p and q is greater than 1, we will find that if we point the pre edge of the current new node to q, it does not meet the suffix of [S~q] being [S~now] (as shown in the figure below)
write picture description here
(ab is not abb’s ) suffix)
We analyze the reason for doing this wrong, because we take ab in the original string as b (that is, the side connected to the bottom), so we mistake ab as the suffix of abb, then we think, what about us? I want to add b as a suffix, so this is what we need to add dots.

Point addition operation

We add a point after p, connect to p and all the previous points connected to q with the current symbol as an edge, and connect the newly added point to an edge with the same symbol as the current symbol. For example, the figure above is b, because this point is in the original string. It does not exist in , it is just a clone of q, so it must be connected to all the edges pointed out by q, and then the pre edge of the newly added point will naturally become the original pre edge of q, and then the edges of q and now The pre edge also points to the current newly added node:
write picture description here
for "aabbabd", its automaton is as follows:

time complexity

The time complexity is: O ( n ) .The
maximum number of states is 2n-1, because all the following points can add points except the first three points.

application

1. Given a string T, ask one p at a time whether p is a substring of T.

  • Construct a suffix automaton for T.
  • Each query starts from the initial state ss, and then walks along the query string.
  • time complexity O ( | T | + | p | ) O ( | T | + | p | )

2. Given a string S, ask how many distinct substrings it has.

  • Or build a suffix automaton first.
  • then for any path in the suffix automaton, is a distinct substring.
  • So the answer is the number of distinct paths starting from S.

3. Given a string S, ask the kth smallest lexicographically among all the different substrings of S each time.

  • Similar to the previous two questions, we only need to process the number of paths for each character starting from one state.
  • Then just keep looking from the starting point S.

4. Given a string S, find the lexicographically smallest string that is cyclically isomorphic to it.

  • We make a suffix automaton for the string S+S, and then greedily find the smallest lexicographic order~

5. Given multiple strings, find their longest common substring.

  • think…

Template Code

#include<cstdio>
#include<iostream>
#include<cstring>
#define maxN 2000010
using namespace std;
int son[maxN][27],pre[maxN],len[maxN],sum[maxN];
int sz,las,lens,now,q,p;
char s[maxN];
void add(int x)
{
    len[++sz]=len[las]+1,sum[sz]=1,now=sz;
    for (p=las;p&&!son[p][x];p=pre[p]) son[p][x]=now;
    if(p)
    {
        q=son[p][x];
        if(len[q]>1+len[p])
        {
            len[++sz]=len[p]+1;
            memcpy(son[sz],son[q],sizeof(son[q]));
            pre[sz]=pre[q];pre[q]=pre[now]=sz;
            for (;son[p][x]==q;p=pre[p]) son[p][x]=sz;
        }
        else pre[now]=q;
    }
    else pre[now]=1;
    las=now;
}
int main()
{
    scanf("%s",s+1);
    lens=strlen(s+1);sz=las=1;
    for (int i=1;i<=lens;++i) add(s[i]-96);
} 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325905403&siteId=291194637