On the KMP algorithm

In the recent review of data structures, algorithms learned KMP chapter, it seems confused, and I remember the first time to learn this algorithm, the teacher speaks Tuomohengfei in the classroom, very passionate, and we hear a look below ignorant than, what? What is that algorithm? Hell? Go read a book, totally do not understand it? In short, I feel very ignorant ratio, after-school look at some video and blog, slowly have a little more understanding, learning is not easy, and need to work down to earth. After three years, re-study this algorithm, it seems still not very clear, thorough enough understanding, to pick up the books and videos, to seriously study this algorithm.

1.KMP Introduction to Algorithms

KMP algorithm was developed by veteran (DEKnuth, JHMorris and VRPratt) the results of three studies, the clever algorithms that avoid repeating traversal, full name is called Knut - Morris - Pratt algorithm, referred to as KMP algorithm, DEKnuth , preparation of the "Art of computer programming" finished fourth volume of this work in the computer field known as the "Theory of relativity."

2. Calculation of the array next substring

KMP algorithm key point is to find next [] array, the array only with the next string pattern matching relating to, for example, "abababca" The substring calculate its next array

Subscript start index = 0,

index = 0, "a" prefix and suffix are empty set, value = 0;

index = 1, "ab" prefix and suffix are "a" and "b", are not equal, value = 0;

index = 2, the prefix "aba" is "a", "ab", the suffix is "ba", "a", the same intersection "a", a length of 1, value = 1;

index = 3, the prefix "abab" is "a", "ab", "aba", suffix "bab", "ab", "b", the same as the intersection of the longest "ab", a length of 2, value = 2;

index = 4, the prefix "ababa" is "a", "ab", "aba", "abab", suffix "baba", "aba", "ba", "a", maximum the same intersection "aba ", a length of 3, value = 3;

index = 5, the prefix "ababab" is "a", "ab", "aba", "abab", "ababa", suffix "babab", "abab", "bab", "ab", "b "longest same intersection" ABAB ", a length of 4, value = 4;

index = 6, the prefix "abababc" is "a", "ab", "aba", "abab", "ababa", "ababab", suffix "bababc", "ababc", "babc", "abc "," bc "," c ", is not the same intersection, value = 0;

index = 7, the prefix "abababca" is "a", "ab", "aba", "abab", "ababa", "abababc", suffix "bababca", "ababca", "babca", "abca "," bca "," ca "," a ", the same intersection" a ", a length of 1, value = 1;

Final results are as follows:

char: | a | b | a | b | a | b | c | a |

index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

value: | 0 | 0 | 1 | 2 | 3 | 4 | 0 | 1 |

3, how to use the next [] array

Resulting substring next subsequent array in the target string matching using the next array, to avoid duplication of matching is already matched elements by using a next array, if the length of the found partial match partial_match_length, and the table next [partial_match_length]> 1, we partial_match_length can skip ahead - next [partial_match_length-1] characters

= Summed shift bit number that has been matched characters - value corresponding to the partial match

char: | a | b | a | b | a | b | c | a |

index: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

value: | 0 | 0 | 1 | 2 | 3 | 4 | 0 | 1 |

In "bacbababaabcbab" It is an example of a matching process, the first match, adjusted position index = 1, the following

bacbababaabcbab

abababca

Easy to see that the length of the matching portion is partial_match_length = 1, but in the next [partial_match_length - 1] = 0, i.e. next [0] = 0, this element, so we do not skip any element, and a next cb match does not match directly to the right, to a place next match

bacbababaabcbab

| | | | |

　 abababca

Came to this place, you will find a partial match length at this time is 5, partial_match_length = 5, next [partial_match_length - 1] = next [4], check the next array, next [4] = 3, which means access we matching down to skip partial_match_length - next [partial_match_length-1], i.e., 5 - next [4] = 5 - 3 = 2, to skip two characters, so the next match should become as follows :

bacbababaabcbab

xx | | |

　 abababca

xx indicates skipped, a length of the matching section 3, partial_match_length = 3, next [partial_match_length - 1] = next [2] = 1, skip to the next matching

partial_match_length - next [partial_match_length - 1], i.e., 3 - 1 = 2, the matching of skipped after two characters are as follows:

bacbababaabcbab

xx |

　 abababca

Partially match length is 1, partial_match_length = 1, next [partial_match_length - 1] = 0, skip the next character do not match, the right match, the matching string to be longer than the remaining main string, no matching character string is found.

4, KMP algorithm implemented in code using C language

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void get_next(char T[],int next[])//next数组
{
    int i,j;
    i=0;//前
    j=1;//后
    next[1]=0;
    while(j<T[0]) {
        if(i==0 || T[i]==T[j])
        {
            i++;
            j++;
            next[j]=i;
            /*if(T[i]!=T[j])
            {
                next[j]=i;
            }
            else 
            {
 
                next[j]=next[i];
            }*/
        }
        else 
        {
            i=next[i];
        }
    }
}
int Index_KMP(char S[],char T[])
{
    int next[1000];
    int i=1;
    int j=1;
    get_next(T,next);//获得next数组
    /*
    for(i=1;i<=T[0];i++)
    {
            printf("%d ",next[i]);
    }
    */
    while(i<=S[0] && j<=T[0])
    {
        if(j==0||S[i]==T[j])
        {
           i++;
           j++;
        }
        else 
        {
            j=next[j];
        }
    }
    if(j>T[0])
        return i-T[0];
    return 0;
 
}
int main (){
    char T[1000],S[1000];
    int i,k;
    while(scanf("%s %s",S,T)!=EOF)
    {
        k=strlen(T);
        for(i=strlen(T);i>0;i--)//向后移动
        {
            T[i]=T[i-1];    
        }
        T[0]=k;
        k=strlen(S);
        for(i=strlen(S);i>0;i--)//向后移动
        {
            S[i]=S[i-1];    
        }
        S[0]=k;
        printf("%d\n",Index_KMP(S,T));
    }
    return 0;
 
}

Results are as follows:

4 for the first occurrence of the string matches the numerical index starts

5, personal summary

After this exercise for KMP algorithm, so I re practicing again, some of the steps on the KMP algorithm is still not very clear, in some places was not particularly want to understand, and perhaps this is the gap. Today there have been some code Bug, Bug check the information in order to solve some sites, re-study the use of the C language, had a very full today.

Welcome to the concern of my micro-channel public number:

References:

http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/

http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html

https://liam.page/2016/12/20/KMP-Algorithm/

https://blog.dotcpp.com/a/8986

On the KMP algorithm

Guess you like