"Project summary" suffix automata

Suffix automaton focused on the nature of things a lot of attention to distinguish between the concepts.

  1. Suffix automaton is a \ (the DAG \) , the path from the root can be recognized \ (S \) Each suffix (substring), a fixed path from the root of the child is not able to recognize the absence of the string S.

  2. Point: Each node represents a \ (endpos \) class, all the strings from the root to the node appears in the same position S,
    a representative point \ (endpos \) set between the respective strings have the same suffix and continuous relationship, let us call these strings is set \ (P \) .

  3. Edge: go \ (trans \) corresponds to the character behind the increase, jumping \ (parent \) tree is cut character in front.

  4. \ (nq \) node to divide different \ (endpos \) collection, has its own \ (len \) , but not \ (S \) is a prefix that does not have \ (SIZ \) .

  5. A node \ (endpos \) size \ (parent \) the number of tree solid dots.

  6. Playing the \ (parent \) tree, two points \ (lcp \) is \ (lca \) is \ (len \) , xy attention to lca.

  7. Generalized suffix automaton able to identify a plurality of sub-string string.

  8. \ (the minlen (X) = the maxlen (FA [X]). 1 + \) , so that by \ (len \) sorting (bucket row) can be obtained by \ (Trans \) topological order.

  9. A node called \ (x \) in all paths (string can be represented as the root of x to \ (Trans \) ) are arranged on characters, each node representative of a string without charge does not leak, essentially different.


application:

  1. Different substring seeking nature, to count the number of root path to the other point, or \ (Ans = \ sum \ limits_ {i = 2} ^ {tot} len [i] -len [fa [i]] \)
  2. Lcp find all suffixes, built in the SAM playing, then the prefix of two solid dots lcp is the parent tree lca len,
    because it is not required, and the prefix \ (length \) take min, the statistics of the topology for each point (as lca) little bit of the real number. "difference"

  3. Lexicographically smallest circular string copied twice, the SAM construction, greedy from the 'a' -> 'z' walking \ (| S | \) long. "Technology"

  4. Every time a character is added, seeking essentially different substring online. The answer is found only incremental \ (len [NP] -len [FA [NP]] \) , other internal point of conservation. "Generation curse"

  5. Find the longest common substring of S and T, running on a matching T S of the suffix automaton, the violence with regret hop fa (since the parent tree match a depth of up to +1
    , -1 certain depth jump fa, so similar stack potential complexity analysis as \ (O (lenth_T) \) ), to give each longest suffix length i to the end of the match and S \ (mx_i \) , then the \ (Ans = max (mx_i) \ ) .

  6. A plurality of strings find the longest common substring of \ (S_1 \) built SAM, supra match the rest of the string on a run SAM, matching records \ (MX \) information on the SAM node,
    indicates that the string in this endpos the same set of suffix string \ (P \) up to match the length to which, since there is the ancestor of the parent suffix extension set of all strings,
    a node as long as the value mx, it is imperative that parent tree ancestor \ (P \ ) all set to \ (len_p \) . Final answer is the maximum on each node SAM (all string values mx minimum). "Public string"

  7. K-seeking kid string: dp each node in the DAG the start how many paths and began walking from the root is equivalent to shrinking the size of the problem,
    if \ (k> dp [v] \) is skipped, or go one step further and minus the string answer several point v path. "String theory"
    have to say about this DP, if the requirements of different nature, when \ (rd [u] = 0 \) when necessary ++ \ (dp [U] \) , indicate whether this point (the strings) of \ (| endpos | \) much,
    I only counted once, then "with a" dp has been the emergence of this contribution to the root path statistics.
    If you can repeat, then they would have \ (du [u] \) + = \ (| endpos_u | \) , where you want to be considered different, in fact, is at the root of the number of paths u a superior \ (| endpos_u | \)
    Similarly there \ (\ sum \ limits_ {i = 2} ^ {tot} (len [i] -len [fa [i]]) \ times | endpos_i | = \ frac {n (n + 1)} {2 } \)

  8. T appears in S frequency matching T on SAM S, and if a mismatch. S in complete T does not exist, or to find representation T in the SAM collection \ (p_u \) , the number of occurrences T on is \ ( | endpos_u | \) .
    If you want to support online with repair, maintenance LCT trees need to play a dynamic parent tree. Maintenance and tree or sub-chain and single check point will do.
    The second point to give attention nq q assign point values with "Substring"


Here are some applications that do not quite template:

B. gods favored fantasy village (in fact, in a broad sense template)

Seeking tree fundamentally different substrings. No more than 20 leaf nodes, the root can then be built generalized suffix automaton to each leaf,Coupled with the board Deepinc.
This question gives Trie, a fixed root rt, same time dfs actually built out of the SAM, and enumeration of SAM leaf nodes to build a form. Since the Japanese sentence q = las.
Complexity of construction line \ (\ Theta (| A || T | + G (T)) \) \ (| A | \) is the alphabet size, \ (| T | \) is inserted all strings Trie size, \ (G (T) \) is the depth of the leaves and all Trie.
DFS will be online card into \ (the n-^ 2 \) ? Does not permit. . .

H. Cheat

The answer monotone, consider half L.
Generalized construction of the M series SAM, the processing L [i] is the longest match to the end of the suffix length i. Does not match the L is disconnected, because then the part L to less loss.
Interval division problem, consider the DP.
Set \ (f [i] \) is the maximum familiar length before i, metastasis \ (f [i] = max (f [i-1], f [j] + ij), il [i] \ leq j \ leq iL \)
as \ (L [I] \ GEQ L [I +. 1] \) , so \ (il [i] \ leq
i + 1-l [i + 1] \) decision monotonic, with optimization can be monotonous queue \ (\ Theta (n-) \) Check the.

SAM template

void extend(int c){
    int p=las,np;np=las=++tot;
    len[np]=len[p]+1;
    for(;p&&!to[p][c];p=fa[p])to[p][c]=np;
    if(!p)fa[np]=1;
    else{
        int q=to[p][c];
        if(len[q]==len[p]+1)fa[np]=q;
        else{
            int nq=++tot;len[nq]=len[p]+1;
            F(i,0,25)to[nq][i]=to[q][i];
            fa[nq]=fa[q];fa[np]=fa[q]=nq;
            for(;p&&to[p][c]==q;p=fa[p])to[p][c]=nq;
        }
    }
}
   
Generalized SAM template

void extend(int x){
    int p=las,q,nq,np;
    if(q=to[p][x]){
        if(len[q]==len[p]+1){las=q;return;}
        las=nq=++tot;len[nq]=len[p]+1;
        F(i,0,c)to[nq][i]=to[q][i];
        fa[nq]=fa[q];fa[q]=nq;
        for(;p&&to[p][x]==q;p=fa[p])to[p][x]=nq;
    }
    else{
        np=las=++tot;
        len[np]=len[p]+1;
        for(;p&&!to[p][x];p=fa[p])to[p][x]=np;
        if(!p)fa[np]=1;
        else{
            q=to[p][x];
            if(len[q]==len[p]+1)fa[np]=q;
            else{
                nq=++tot;
                len[nq]=len[p]+1;
                F(i,0,c)to[nq][i]=to[q][i];
                fa[nq]=fa[q];fa[q]=fa[np]=nq;
                for(;p&&to[p][x]==q;p=fa[p])to[p][x]=nq;
            }
        }
    }
}
   

Guess you like

Origin www.cnblogs.com/hzoi-yzh/p/12115758.html