Suffix automata related

Perhaps a better reading experience

EDITORIAL

This blog is mainly about some of their own understanding, do not like do not spray
found online blog about \ (right \) collection of talk about how to do that without clarifying the reasons, and students then discussed for a long time after recording it
, if not understand place or the wrong place, welcome to ask questions or point out
any other questions you may be asked, bloggers will be answered
in this blog content and in-depth understanding of the various methods of analysis, so a little bit part of the president
if just known methods, the intermediate Reflection skippable


FIG appreciated suffix automaton

  1. \ (Parent \) node in the tree node and suffix automata exactly the same, but the sides are not the same
  2. From starting the root , to get out of all the sub-strings, and only out of the substring
  3. \ (endpos \) is a collection that represents the position of a string of all occurrences to the last character position where the appearance position
  4. \ (right \) set representing \ (endpos \) same substring set, a \ (right \) set may be represented by a plurality of sub-strings, the sub-length of the string must be continuous, and have the suffix relationship, \ (right \) is that it represents the size of the representation \ (endpos \) size of the collection, that is, the \ (endpos \) the set number of elements
  5. Two (right \) \ relations are only two, either containment relationship, or independent of each other, there will be no intersection

    If both strings have the same \ (endpos \) , then there must be a sub-string to another

  6. A \ (right \) set \ (R1 \) may be another \ (right \) set \ (R2 \) comprises, this time must satisfy the \ (R2 \) is \ (R1 \) suffix
  7. According to 2, we can begin to run on a suffix automaton from the root, you can run out of all sub-strings, go to the point where the point \ (right \) size, also said the current match to this string occurrences
  8. \ (The Parent \) tree may be built out suffix automaton, each node \ (Fail \) node shown suffix string must be, because \ (Fail \) nodes \ (right \) collection containing the node.

    why? Because the suffix automaton is to build a tree built out how this can be explained in detail to see other dalao, this article only speak crucial

  9. Suffix automaton node is represented by the node is the end of the suffix

    Construction of automatic machines

    Previously stated, I am playing the array version

    Promise

    \ (\ begin {aligned} &
    fa \ rightarrow fail \\ & len \ rightarrow longest \ length \\ & size \ rightarrow size \ of \ right \ end {aligned} \) main string prefix of the original string, old master string as a after the prefix added automatic machine which node
    addition, \ (size \) does not affect the construction of the suffix automata, it can be ignored, we will speak later in \ (Parent \) say when \ (size \)
    main contents written in code, will repeat the above content in order to understand
    if you feel a comment here ugly it can be ripped open his own editor to see, or to read CSDN, there are links to the top
#include <cstdio>
#include <cstring>
const int maxn = 2000006;
const int maxc = 27;
int tot=1,last=1;//last -> 旧主串的节点
int fa[maxn],len[maxn],size[maxn];
//fa -> fail  fa[x]的right集合一定包含x   fa[x]一定是x的后缀
//len[x] -> x为后缀最长串长度
//size[x] -> x 号节点表示的right集合的大小
int son[maxn][maxc];//son[p][c] -> 在p所代表的集合后加c字符,该字符c是哪个节点 亦可认为是边

//1 号节点为初始节点 初始节点没有fa

//{{{构建SAM
void extend (int c)
{
    int p=last,np=++tot;
    last=tot,len[np]=len[p]+1;
    while (p&&!son[p][c])   son[p][c]=np,p=fa[p];//跳后缀 因为是旧主串的后缀 它们全都可以加一个c
    //当前p没有c,表示该子串是第一次出现,p向np连一条c边
    //若当前p有c了表示后面的fa所表示的后缀有节点集合表示了以c为结尾的后缀了(因为曾经出现过),此时出现了两个以c结尾的right集合
    if (!p) fa[np]=1;//表示c从未出现过 它的后缀为空
    else{
        //要处理这两个以c为末尾的节点
        int q=son[p][c];
        if (len[q]==len[p]+1)   fa[np]=q;//q是新主串的后缀
        else{
            //即len[q]>len[p]+1
            int nq=++tot;//不是新主串的后缀 因为p是新主串的后缀 而len[q]>len[p]+1且q还没被跳过(若是其后缀,按理说应该先被跳到
            len[nq]=len[p]+1;//p的endpos多了个n 所以要新节点 表示由p+c得到的后缀 即nq 
            fa[nq]=fa[q];//nq只是endpos变多 其样子仍是原来那样  其后缀仍是原来的后缀
            fa[np]=fa[q]=nq;//nq 是 q的后缀 也是新主串的后缀
            memcpy(son[nq],son[q],sizeof(son[q]));
            while (son[p][c]==q)    son[p][c]=nq,p=fa[p];//p的后缀的endpos也多了个n 
        }
    }
    size[np]=1;//该节点right大小初值赋值为1
}
//}}}

\ (right \) collection

Seeking \ (right \) collection size

We know, \ (right \) in \ (Parent \) tree is a variety of containment relationship
so keep \ (Parent \) demand on \ (right \) collection size
first main string \ (right \) collection the size of the initial value of 1, the process is complete when building suffix automaton
a \ (right \) size of the collection that is their son \ (right \) collection size and with its original size (possibly 1 )

\ (right \) a set of no intersection, node main string \ (right \) set size is 1

Only the main point string representation of energy is a leaf node, but the leaf node \ (right \) collection not only represents a point on the main string, and the string is not necessarily the main point on the leaf node is
why the
two kinds of understanding

  • \ (Parent \) side of the tree by the \ (fa [i] \) is connected to the \ (i \)
    to open up the secondary node are others \ (FA \) , then it will not be a leaf node
  • Consider any substring, if it occurs only once, then as the substring end of the main string (prefix) must be longer than it, (which is the main string suffix), the node must be the master node of the string
    so constructed automatic machine only when the main strings \ (size \) given initial value is 1
    , then, a main string can be repeated, it will have a son, and why still give it its initial value of 1
  • For string \ (ABAC \) , we draw what \ (Parent \) tree
    asf.PNG
    will find a \ (right \) collection into small \ (right \) the set when there is an element missing (No. 2 to No. 4 Node node)
    this case is a case where the first character appears in the back had
    to consider any point rearward \ (endpos \) set, its suffix \ (endpos \) set necessarily comprising, if \ (endpos \) the number of elements is 1, that is the main node of the string, not 1, it must have a son \ (endpos \) elements less than it
    is only the front of this character is no suffix, it is not the son \ ({1} \) of this set, that is to say one less element, so that there is no problem given the initial value is 1
  • The case of the above extension, if the main string repetition (i.e. a prefix appears in the middle), as \ (abcab \)
    is not the first character of that case, \ (ab & \) a \ (right \) son's collection has been lost \ ({2} \)
    above case because it has no suffix, and the case was that the suffix \ (right \) collection either its father, either, and it is the same
    so it is lost where it first appears

Therefore, the initial value is assigned to the main train 1 is correct and necessary

So we have to do is to build a \ (Parent \) tree, then run again in the above \ (dfs \)
Of course, we can also directly recursive, because it is \ (DAG \) , it is possible to calculate the topology , that is considered by the son of the father, the greater the length naturally \ (the Parent \) tree is in the lower position, the descending order according to the length, this step can be selected \ (Sort \) , radix sort may be
\ (\ mathcal {Code} \)

for (int i=2;i<=tot;++i)    add(fa[i],i);//因为根节点没有fa,所以要从2开始枚举,当然也可以设根节点为0,那么还得给根节点的fa赋初值为-1,上面特判也得改一下
dfs(1);

void dfs (int p)
{
    for (int e=head[p];e;e=nxt[e]){
        dfs(to[e]);
        size[p]+=size[to[e]];
    }
}

//基数排序 常数较小
for (int i=1;i<=tot;++i)    ++cup[len[i]];
for (int i=1;i<=n;++i)      cup[i]+=cup[i-1];
for (int i=1;i<=tot;++i)    mp[cup[len[i]]--]=i;
for (int i=1;i<=tot;++i)    size[fa[i]]+=size[i];

\ (right \) understand the collection

\ (right \) collection personally think it is amazing
we will find, \ (right \) when the size of the collection is only updated by the master node of the string, the string encountered master node which \ (size \) is increased by one
in other words, a node \ (right \) size of the set is equal to the sub-tree (including itself), the number of nodes is master node of the string
it is understood how the sentence

  • A substring must be a main string (prefix) suffix
  • Each main string (prefix) are not the same

Then a node \ (right \) set size is equal to the prefix string is a suffix number of

Therefore, the \ (the Parent \) tree node a \ (right \) size of the set may be expressed as
\ (endpos \) for the (right \) \ set corresponding \ (endpos \) sets of strings is suffix prefix number
may represent the number of sub-tree nodes a main string


Application suffix automata

Determining whether a substring

According to 2.

From starting the root , to get out of all the sub-strings, and only out of the substring

On suffix automata starting from the root node to run again
Trie tree has been suspended or beaten up

The number of different substrings

There are two ways this

  • We know that there may be a plurality of strings share \ (right \) , the length of the strings certainly not the same, the same series is the same, then we \ (len [i] -len [ fa [i]] \) i.e. this may be derived \ (right \) collection of several strings share, for each node are so seek time, the total number is the number of different sub-strings
    which requires \ (\ sum_ {i = 1 } ^ {tot} (len [i] -len [fa [i]]) \)

  • Consider \ (DP \) , 2. According to the above, we can start from the root node, the entire suffix automata are running side, when confronted with a node plus a give answers

If the different sub-series having such a definition
different from the same position in different sub-strings substring operator
we can consider 7.

According to 2, we can begin to run on a suffix automaton from the root, you can run out of all sub-strings, go to the point where the point \ (right \) size, also said the current match to this string occurrences

First find each (right \) \ size, confronted a node to answer plus \ (size \) to

K-boy string

From this node to how much substring recorded, and then to \ (Splay \) run as you can

ll dfs (int x)//先处理出还有多少子串
{
    if (num[x]!=-1) return num[x];
    num[x]=size[x];
    for (int i=1;i<=26;++i)
        if (son[x][i])  num[x]+=dfs(son[x][i]);
    return num[x];
}

void kth (int x,ll k)//求第k小
{
    if (k<=size[x]) return;
    k-=size[x];
    for (int i=1;i<=26;++i){
        if (son[x][i]){
            if (k<=num[son[x][i]]){
                printf("%c",i+'a'-1);
                kth(son[x][i],k);
                return;
            }
            else    k-=num[son[x][i]];
        }
    }
}

Seeking first \ (k \) big change from just the cycle \ (26 \) to \ (1 \) enumeration can

The minimum cyclic shift (minimum notation)

Representation of the code is simple and short, what with the smallest suffix automaton
We \ (s + s \) suffix automaton built out
then is to find a minimum length \ (| s | \) substring of the
same code above ...

It speaks so much of it
so hard to write for several days (half-written then found a new problem and want a couple of days)
to give a praise it

Guess you like

Origin www.cnblogs.com/Morning-Glory/p/11295040.html