table of Contents
I. Introduction
kmp algorithm is a text string from a text string, the pattern string finding algorithm the number of templates contained / position.
For example, in abcabcccabc text string, the string abc number of templates is 3, is 0,3,8 its starting position.
Second, the idea
Violence is two for loops O (n- m) to get apparent lack polish and kmp is O (n + m).
kmp be honest, a little bit around, but in fact the key point: to find the longest string template with the same suffix prefix, and by the next storage array .
How to understand? For chestnuts
for abcab template string, we start from scratch next array structure
Character string | Prefix | suffix | Longest common suffix | next array |
---|---|---|---|---|
"a" | [] | [] | no | next[0] = 0 |
"ab" | [a] | [b] | no | next[1] = 0 |
"abc" | [a,ab] | [c,bc] | no | next[2] = 0 |
"abca" | [a,ab,abc] | [A, c, CBS] | a | next[3] = 1 |
"abcab" | [a,ab,abc,abca] | [b,ab,cab,bcab] | from | next[4] = 2 |
"abcabc" | [a,ab,abc,abca,abcab] | [c,bc,abc,cabc,bcabc] | abc | next[5] = 3 |
For abcabb text string, when the matching to 5, i.e. abcab B , the string matching template to abcab C , this time difference in the position of the character
when a force algorithm, a text string must jump back position, i.e. B, and the template sequence must jump back to the initial position, re-matching
when using kmp algorithm, the template string next [-1 = current position 4] = 2, i.e., jump to the second position, the template string of ABC case, the text string isabcABB , find it? Template prefix string ab & C and suffix text strings ab & B with the same prefix ab &, so the template sequence does not need to return to the origin, the comparison may continue, i.e., a text string is caseabcab, and the template for the string ab, then compare the next character.
Third, the code
int nextt[maxn];
void get_nextt(char pattern[]){//为pattern字符串创建nextt数组
nextt[0] = 0;
int max_length = 0;
for(int i = 1;pattern[i];i++){
while(max_length > 0 && pattern[max_length] != pattern[i])
max_length = nextt[max_length-1];
if(pattern[i] == pattern[max_length])
max_length++;
nextt[i] = max_length;
}
}
queue<int> search(char text[],char pattern[]){//从test字符串中,寻找含有多少个pattern字符串,并将其开头位置存入队列中
queue<int> q;
int pattern_length = strlen(pattern);
get_nextt(pattern);
int count = 0;
for(int i = 0;text[i];i++){
while(count > 0 && pattern[count] != text[i])
count = nextt[count-1];
if(pattern[count] == text[i])
count++;
if(count == pattern_length){
q.push(i-pattern_length+1 );
count = nextt[count-1];
}
}
return q;
}