HDU - 2222 Keywords Search (basic AC automata)

Links: HDU-2222

AC ready long before learning automata, and put the most basic to get it today.

Because it is the most basic AC automatic machine so I'll summarize the basic content (qvq

The most basic solution is to AC automaton: multi-string matching problem, we know kmp is to optimize one to one match. Then the N-to-one matching Could use kmp run N times? The answer is no.

AC automaton full name: Aho-Corasick automaton, since there is this algorithm, it is certainly more to this question will have its optimized solutions.

Online said AC automaton is KMP x trie (hush table seem to be able deposit). Trie course, yes, because you want to build an AC automaton The first step is to use the existing pattern string to construct the trie. For KMP on the main part of this is the understanding of the array fail.

We know that the next array KMP is currently seeking strings of length i is the maximum length of the prefix and suffix. Since the storage of this, we have a mismatch when you can skip the match after the same part of the direct mismatch position

AC automaton is the same reason, it has a fail array storage position of the node is the longest common suffix node. She and her string pattern such as text strings sher, we know that she h of point h of her, she will follow the e refers to the position of the parent node of her e

So, after a text string sher been traversed she (she already described) jump her e-location continues to match, and finally found exactly matches the last bit of r, the text string appears on her, she two pattern strings

Since we can not miss any of the string, so every encounter we have fail position of the pointer will jump, so as far as possible to find all the strings are finished. Achievements when we have to count the number of array with a val to the end node, so that finally you can count the number of query to add After this value. But the following code is used after an array val is set to -1 (already used), there will be a problem, this AC automaton can not be sustained run, can only run once. So there will be a variety of optimization after persistent and so on.

Because here is the most basic AC automaton so on the matter ...

#include <bits / STDC ++ H.>
 const  int N = 1000000 + . 5 ;
 the using  namespace STD;
 int TOT; // number 
int Trie [N] [ 26 is ]; // trie 
int Val [N]; // string closing tag (string prefix number to the end of the current) 
int Fail [N]; // mismatch pointer 
void iNSERT ( char * S) { // insert mode string 
    int the root = 0 ; // trie match to the current node 
    for ( int I = 0 ; S [I]; I ++ ) {
        int id = S [I] - ' A ' ; // child node numbers 
        IF (Trie [root] [id] == 0 ) // if not previously prefix from the root to the id of the 
            trie [root] [id] = + TOT +; // inserted into 
        the root = Trie [the root] [ID]; // along trie go down 
    } 
    Val [the root] ++ ; 
} 
void build () { // build fail FIG dictionary pointer field established 
    queue < int > Q;
     for ( int I = 0 ; I < 26 is ; I ++) // child node of the root node enqueued 
        IF (Trie [ 0] [I]) 
            q.push (Trie [ 0 ] [I]); 
  
    the while (! q.empty ()) {
         int K = q.front (); // for the head of the queue node k, which is a pointer to the request fail too, are now required to fail his child node pointer 
        q.pop ();
         for ( int I = 0 ; I < 26 is ; I ++) { // iterate character set 
            IF (Trie [K] [I]) { // If there is a character corresponding to the child node i 
                fail [Trie [K] [i]] = Trie [fail [K]] [i]; // this sub-node fail fail from pointer [K] of the node corresponding to the characters i 
                q.push (Trie [K] [I]); 
            } 
            the else 
                Trie [K] [I] = Trie [Fail [K]] [I]; //The Fail [k] is assigned to the direct child nodes of the child node k 
        } 
    } 
} 
int Query ( char * T) { // text string matching 
    int RES = 0 ; // store results 
    int the root = 0 ; // dictionary matched to the current tree node 
    for ( int I = 0 ; T [I]; I ++) { // text string is traversed 
        int ID = T [I] - ' a ' ; // child node ID 
        root = trie [the root] [ID]; // dictionary FIG shuttling constantly beating 
        int J = the root;
         the while ! (J && Val [J] = - . 1) { // use pointers fail to find all occurrences of the pattern string 
            RES = Val + [J]; // accumulating answer to 
            Val [J] = - . 1 ; // have already been added (not repeated addition) 
            J = fail [ J]; // Fail pointer jumps 
        } 
    } 
    return RES; 
} 
char P [N];
 char T [N];
 int main () {
     int T; 
    Scanf ( " % D " , & T);
     the while (T- - ) { 
        Memset (Trie, 0 , the sizeof (Trie)); 
        Memset (Val, 0, The sizeof (Val)); 
        Memset (Fail, 0 , the sizeof (Fail)); 
        TOT = 0 ; 
 
        int n-; // pattern string number 
        Scanf ( " % D " , & n-);
         the while (N-- ) { 
            Scanf ( " % S " , P); // input pattern string 
            iNSERT (P); // into the dictionary tree 
        } 
        build (); // build the dictionary pointer mismatch FIG 
 
        Scanf ( " % S " , T); // input text string
        int res=query(T);
        printf("%d\n",res);
    }
    return 0;
}

 

Guess you like

Origin www.cnblogs.com/Tianwell/p/11373509.html