[Data Structure] suffix automaton

Foreword

For string \ (s \) , \ (| s | \) represents the length s
for characters \ (A \) , \ (| A | \) represents \ (A \) the size of
the text string and index all start from zero .
This article word more, if any typos or conceptual error, please contact the bloggers or reply below.

SAM

Suffix automaton (suffix automaton, SAM) is a solution to the problem of more data string structure.
SAM is based on a string constructed in compressed form is given to all the sub-string string.
Standard is defined as: string \ (S \) of a SAM is acceptable \ (S \) is the smallest of all suffixes \ (\ the DFA texttt {} \) (deterministic finite automaton or deterministic finite state automaton)
we note \ (T_0 \) to a virtual source point of the string, in fact this operation (virtual node configuration) is widely used.
So SAM should be:

  1. Directed acyclic graph, node state, called the transfer edge
  2. All nodes can be made \ (t_0 \) arrive
  3. Each letter of the alphabet representing a transition, a state and any different side
  4. The presence of one or more end state , so that all transfers are arranged from $ $ T_0 to terminate the state by access sequence, the corresponding original string \ (S \) is a suffix, and \ (S \) of any suffix can be described above.
  5. SAM or more nodes satisfying the condition of minimal automaton
    simply, without a suffix is a link to the SAM \ (T_0 \) undirected source point. This figure by the name suffix link tree.

No suffix link of SAM look

You do not need to know what the current time being suffix link Yes.

  • Null string

  • String s = "a"

  • String = S "abbb"

    \ (\ Tiny \ texttt {borrow oi-wiki pictures} \)

The nature of the string

Arbitrary character string on a path the SAM \ (S \) string mutual mapping (each may represent a)
Note that this is important above.

End position

End position, typically referred to as endpos, substring in the original string is the last character of a matching index in the original string.
We note that a substring endpos not unique, so the position should end when a collection.
Set the end position, generally denoted endpos set, indicates a substring all positions in the string end.
For example, the string "abbbabbb" substring "ab" is endpos is {1,5},

Suffix link

Long story short, the suffix link connects two different endpos collection,
which means a collection of endpos longest suffix from the current state to jump to another endpos.
Is connected to the current longest substring "with the current substring endpos different suffix" sides of the substring that status.

Suffix link tree

SAM tree in all suffix links constitute a link called suffix tree.
Suffix link tree is very important because most of our follow-up operations are carried out for the title in the suffix link tree.

SAM structure

  • Reads the required characters c
  • Create a new state \ (CUR \) , and put it in the \ (len \) is set to a state +1
  • Jumping forward from a state extension link, updated along the way \ (the Next \) , until empty or jump to found a state \ (the p-\) , \ (the p-\) already exists to character c of metastasis
  • The \ (P \) by the character c to the state is the \ (Q \)
  • If \ (len_p + 1 = len_q \) , is very simple, the \ (cur \) suffix links lead to \ (q \) and the end of the algorithm.
  • Otherwise, the "copy (Copy)" \ (q \) to clone a new node denoted \ (clone \) , the \ (clone \) is \ (len \) re-set to \ (len_p + 1 \) , then \ (cur \) and \ (q \) suffix links to \ (clone \) . Ultimately, we need to use the suffix link from the state \ (p \) to go back, as long as there is a by \ (p \) to state \ (q \) is transferred, the transfer will be redirected to the state \ (clone \)

Code

Referring to the original title Luo Gu [template] suffix automata .

map version ( \ (O_2 \) )

Time and space requirements are high, but stable and can handle any character.
To \ (O_2 \) in high demand.

#include <cstdio>
#include <cstring> 
#include <string>
#include <map>

using namespace std;

const int MAXN = 3000005;

int sz[3000005];

struct SAM{
    int size, last;
    struct Node{
        int len, link;
        map<char, int> next;
    } nodes[MAXN];
    void init(){
        nodes[0].len = 0, nodes[0].link = -1;
        size = 1; last = 0;
    }
    void insert(char ch){
        int cur = size++, p; nodes[cur].len = nodes[last].len + 1; sz[cur] = 1;
        for (p = last; ~p && !nodes[p].next.count(ch); p = nodes[p].link)
            nodes[p].next[ch] = cur;
        if (p == -1)
            nodes[cur].link = 0;
        else{
            int q = nodes[p].next[ch];
            if (nodes[p].len + 1 == nodes[q].len)
                nodes[cur].link = q;
            else{
                int clone = size++;
                nodes[clone].len = nodes[p].len + 1;
                nodes[clone].next = nodes[q].next;
                nodes[clone].link = nodes[q].link;
                for ( ; ~p && nodes[p].next[ch] == q; p = nodes[p].link)
                    nodes[p].next[ch] = clone;
                nodes[q].link = nodes[cur].link = clone;
            }
        }
        last = cur;
    }
    void build(char *buf, int len = 0){
        if (!len) len = strlen(buf);
        for (int i = 0; i < len; ++i)
            insert(buf[i]);
    }
} sam;

struct Edge{
    int to, next;
} edges[6000005];

int head[3000005], edge_num;

inline void addEdge(int u, int v){
    edges[++edge_num] = (Edge){v, head[u]};
    head[u] = edge_num;
}

inline void buildParentTree(){
    for (int i = 1; i < sam.size; ++i)
        addEdge(sam.nodes[i].link, i);
}

long long ans = 0;

void DFS(int u){
    for (int c_e = head[u]; c_e; c_e = edges[c_e].next){
        int v = edges[c_e].to;
        DFS(v); sz[u] += sz[v];
    }
    if (sz[u] > 1)
        ans = max(ans, 1ll * sz[u] * sam.nodes[u].len);
}

char ch[1000005];

int main(){
    sam.init(); scanf("%s", ch);
    sam.build(ch); buildParentTree();
    string s = ch;
    DFS(0);
    printf("%lld", ans);
    return 0;
}

Array version

Lower demand for time and space, but can only deal with issues like picture.

#include <cstdio>
#include <cstring> 
#include <string>

using namespace std;

const int MAXN = 3000005;

int sz[3000005];

struct SAM{
    int size, last;
    struct Node{
        int len, link;
        int next[26];
    } nodes[MAXN];
    void init(){
        nodes[1].len = 0, nodes[1].link = 0;
        size = 2; last = 1;
    }
    void insert(char ch){
        int cur = size++, p; nodes[cur].len = nodes[last].len + 1; sz[cur] = 1;
        for (p = last; p && !nodes[p].next[ch - 'a']; p = nodes[p].link)
            nodes[p].next[ch - 'a'] = cur;
        if (!p)
            nodes[cur].link = 1;
        else{
            int q = nodes[p].next[ch - 'a'];
            if (nodes[p].len + 1 == nodes[q].len)
                nodes[cur].link = q;
            else{
                int clone = size++;
                nodes[clone].len = nodes[p].len + 1;
                memcpy(nodes[clone].next, nodes[q].next, sizeof(nodes[q].next));
                nodes[clone].link = nodes[q].link;
                for ( ; p && nodes[p].next[ch - 'a'] == q; p = nodes[p].link)
                    nodes[p].next[ch - 'a'] = clone;
                nodes[q].link = nodes[cur].link = clone;
            }
        }
        last = cur;
    }
    void build(char *buf, int len = 0){
        if (!len) len = strlen(buf);
        for (int i = 0; i < len; ++i)
            insert(buf[i]);
    }
} sam;

struct Edge{
    int to, next;
} edges[6000005];

int head[3000005], edge_num;

inline void addEdge(int u, int v){
    edges[++edge_num] = (Edge){v, head[u]};
    head[u] = edge_num;
}

inline void buildParentTree(){
    for (int i = 2; i < sam.size; ++i)
        addEdge(sam.nodes[i].link, i);
}

long long ans = 0;

void DFS(int u){
    for (int c_e = head[u]; c_e; c_e = edges[c_e].next){
        int v = edges[c_e].to;
        DFS(v); sz[u] += sz[v];
    }
    if (sz[u] > 1)
        ans = max(ans, 1ll * sz[u] * sam.nodes[u].len);
}

char ch[1000005];

int main(){
    sam.init(); scanf("%s", ch);
    sam.build(ch); buildParentTree();
    string s = ch;
    DFS(1);
    printf("%lld", ans);
    return 0;
}

References and literature

①: Oier-Wiki > string> suffix automaton

Guess you like

Origin www.cnblogs.com/linzhengmin/p/11361325.html