kmp algorithm to explain its template


I. Introduction

kmp algorithm is a text string from a text string, the pattern string finding algorithm the number of templates contained / position.
For example, in abcabcccabc text string, the string abc number of templates is 3, is 0,3,8 its starting position.

Second, the idea

Violence is two for loops O (n- m) to get apparent lack polish and kmp is O (n + m).

kmp be honest, a little bit around, but in fact the key point: to find the longest string template with the same suffix prefix, and by the next storage array .

How to understand? For chestnuts

for abcab template string, we start from scratch next array structure

Character string Prefix suffix Longest common suffix next array
"a" [] [] no next[0] = 0
"ab" [a] [b] no next[1] = 0
"abc" [a,ab] [c,bc] no next[2] = 0
"abca" [a,ab,abc] [A, c, CBS] a next[3] = 1
"abcab" [a,ab,abc,abca] [b,ab,cab,bcab] from next[4] = 2
"abcabc" [a,ab,abc,abca,abcab] [c,bc,abc,cabc,bcabc] abc next[5] = 3

For abcabb text string, when the matching to 5, i.e. abcab B , the string matching template to abcab C , this time difference in the position of the character

when a force algorithm, a text string must jump back position, i.e. B, and the template sequence must jump back to the initial position, re-matching

when using kmp algorithm, the template string next [-1 = current position 4] = 2, i.e., jump to the second position, the template string of ABC case, the text string isabcABB , find it? Template prefix string ab & C and suffix text strings ab & B with the same prefix ab &, so the template sequence does not need to return to the origin, the comparison may continue, i.e., a text string is caseabcab, and the template for the string ab, then compare the next character.

Third, the code

int nextt[maxn];
void get_nextt(char pattern[]){//为pattern字符串创建nextt数组
    nextt[0] = 0;
    int max_length = 0;
    for(int i = 1;pattern[i];i++){
        while(max_length > 0 && pattern[max_length] != pattern[i])
            max_length = nextt[max_length-1];
        if(pattern[i] == pattern[max_length])
            max_length++;
        nextt[i] = max_length;
    }
}
queue<int> search(char text[],char pattern[]){//从test字符串中,寻找含有多少个pattern字符串,并将其开头位置存入队列中
    queue<int> q;
    int pattern_length = strlen(pattern);
    get_nextt(pattern);
    int count = 0;
    for(int i = 0;text[i];i++){
        while(count > 0 && pattern[count] != text[i])
            count = nextt[count-1];
        if(pattern[count] == text[i])
            count++;
        if(count == pattern_length){
            q.push(i-pattern_length+1 );
            count = nextt[count-1];
        }
    }
    return q;
}

Guess you like

Origin www.cnblogs.com/MMMMMMMW/p/11607518.html