KMP algorithm graphic detailed explanation

KMP algorithm detailed

First explain what KMP algorithm:

The problem to be solved by the KMP algorithm is to locate the pattern in the string (also called the main string). To put it simply is the keyword search we usually say. The pattern string is the keyword (called P in the following), if it appears in a main string (called T in the following), return its specific position, otherwise return -1 (commonly used means)

On the solution of this kind of problem

  • Brute force method, time complexity O( N ∗ MN*MNM ) (N is the length of the main string, M is the length of the pattern string), the pattern string tries to match each position of the main string until the match is successful.
    For example:
    when two strings are matched, the P string starts from the main string T The first character starts to match, and the red position is the position where the first character does not match. At this time, the subscript of the unmatched character is i=j=3. The
    Insert picture description here
    next match is that P starts from subscript 0 and T from subscript 1 to match, and then repeats this process.
    Insert picture description here
    This method is too violent, so we have to introduce A faster method,KMP algorithm
  • KMP algorithm time complexity O( NNN ) The length of the main string of N is
    in accordance with normal thinking. For the match in the above figure, when a mismatch occurs for the first time, it is natural to think of the a in the P string and the subscript 3 in the T string, namely a matching characters, see figure
    Insert picture description here
    i.e., T i subscript string constant, P j string becomes zero, i.e. advantagewith this partial match has valid information, the pointer i is not held back by modifying the pointer j, so that the pattern string Try to move to a valid location as much as possible.
    Let us look at a set of data again.
    Insert picture description here
    When T and P do not match at the subscript 3, according to the above, i does not move, and the P string moves to the valid position. The next time the comparison starts, the position is j=2.
    Insert picture description here
    In fact, this is like a pushing process.The position of j is determined bythe value of the longest common prefix suffix in the stringbeforethemismatch between P and T.

First, explain the prefix and suffix. Take the string as an example. The
Insert picture description here
prefix is all consecutive substrings that contain the first character but not the last character, as shown in the figure below. The
Insert picture description here
suffix is ​​that contains the last character and does not include the first one. All consecutive substrings of characters
Insert picture description here
Insert picture description here
Here I want to find a next array for the pattern string , next[i] represents the value of the longest common prefix suffix of the first i-1 characters. Artificial regulations next[0]=-1, next[1]=0;

  • Because when there is only one character, there is no character before it, so it is meaningless.
  • When there are two characters, take the P string above as an example, next[1] is to find the value of the longest common prefix suffix of a before b, but because there is only one character, the prefix does not include the last character and The suffix does not contain the first character, contradictory, so the value is 0.

For next[6], the longest common prefix and suffix of the front abcabc is abc, so next[6]=3

How to solve the next array
Start with next[2]. As mentioned earlier, the 0 and 1 positions are artificially specified values. Below is the solution method of next[i]( i>=2 ).

  • First, a value of k needs to be specified here. The value of k represents the value of next[i-1], which is the value of the longest common prefix and suffix in the first i-2 characters.
  • When solving next[i], you need to compare whether the i-1th character and the kth character are the same
    • When the same, next[i]=next[i-1]+1;
    • Otherwise k=next[k], until k=0;

For example:
Insert picture description here
initial value k=0; nex[0]=-1, next[1]=0; when
Insert picture description here
i=2, p[1]!=p[k](k=0) means b!=a
k=0//That is, it has reached the end, and it has not been matched yet. Because k==0, let next[2]=0; when
Insert picture description here
i=3, p[2]!=p[k](k=0 ), because k= =0, so let next[3]=0, when
Insert picture description here
i=4 at the end , p[3]==p[k](k=0), so let next[4]=++k ; That is, next[4]=1; when
Insert picture description here
i=5, p[4]=p[k](k=1 ), so next[5]=++k; namely next[5]=2; the
Insert picture description here
same Get next[6]=3
Insert picture description here

But I only found out after running it again. . . The usage of k=next[k] is not used in this example. So let me modify the P string, so that c with subscript 5 becomes a. When
Insert picture description here
i=6, k=2,p[6]!=p[2],k=next[k], that is k=0
p[0]==p[6], so next[6]=++k=1;

You can split P into two parts and look at the
Insert picture description here
next array.
Next is the KMP algorithm part

  • Enter two strings s1, s2 (s1 is the main string, s2 is the pattern string)
  • Get the next array of s2
  • Define two pointers i1 and i2 to represent the position of s1 and the position of s2 respectively
  • When s1[i1]==s2[i2], i1++, i2++
  • Otherwise i2=next[i2]
  • When i2 is moved to 0, that is, next[i2]==-1, i1++ matches to the first one of the pattern string and does not match, and can only match the next one of the main string.
  • Finally, judge whether i2 is equal to the length of s2, which is equal to return i1-i2, otherwise return -1

For example, the
Insert picture description here
next array of P string has been calculated. When i1=i2=6, the two strings are different. At this time, i1 does not move, i2=next[6]=3. As shown in the figure below
Insert picture description here
, the match can happen successfully and return i1 -i2=3.

Below is the code ()

Find next array

void get_next()
{
    
    
    Next[0]=-1;
    Next[1]=0;

    int i=2,k=0;//i是模式串的起始位置,即从第三个字符开始匹配,k是i-1个字符要匹配的位置
    int len=s2.size();
    while(i<len)
    {
    
    
        if(s2[i-1]==s2[k])//如果i-1和k相等,i后移准备匹配下一个位置,k后移
            Next[i++]=++k;
        else if(k>0)//没有匹配成功,k移动到next[k]的位置
            k=Next[k];
        else
            Next[i++]=0;//移动到头了,next[i]只能为0了
    }
}

km²

int kmp()
{
    
    
    int i1=0,i2=0;
    int len1=s1.size();
    int len2=s2.size();
    get_next();//获得Next数组
    while(i1<len1&&i2<len2)//i1没到头,i2也没到头
    {
    
    
        if(s1[i1]==s2[i2])//相等就齐头并进
        {
    
    
            i1++;
            i2++;
        }
        else if(next[i2]==-1)//模式串到头都没有和主串能匹配的字符,主串往后移
            i1++;
        else
            i2=next[i2];//匹配不成功,i2移动
    }
    return i2==len2?i1-i2:-1;//i2到头证明匹配成功,否则返回-1
}

Template question: HDU-1711
Pit: This is a number, not a character, use an integer array to receive the
AC code

#include<iostream>
#include<cstdio>
#include<string.h>
#include<queue>
#include<cmath>
#include<fstream>
using namespace std;

int Next[200005];
int s1[1000005];
int s2[1000005];
int a,b;
void get_next()
{
    
    
    Next[0]=-1;
    Next[1]=0;

    int i=2,k=0;//i是模式串的起始位置,即从第三个字符开始匹配,k是i-1个字符要匹配的位置
    int len=b;
    while(i<len)
    {
    
    
        if(s2[i-1]==s2[k])//如果i-1和k相等,i后移准备匹配下一个位置,k后移
            Next[i++]=++k;
        else if(k>0)//没有匹配成功,k移动到next[k]的位置
            k=Next[k];
        else
            Next[i++]=0;//移动到头了,next[i]只能为0了
    }
}

int kmp()
{
    
    
    int i1=0,i2=0;
    get_next();
    int len1=a;
    int len2=b;

    while(i2<len2&&i1<len1)//i1没到头,i2也没到头
    {
    
    

        if(s1[i1]==s2[i2])//相等就齐头并进
        {
    
    
            i1++;
            i2++;
        }
        else if(Next[i2]==-1)//模式串到头都没有和主串能匹配的字符,主串往后移
            i1++;
        else
            i2=Next[i2];//匹配不成功,i2移动

    }
    return i2==len2?i1-i2:-1;//i2到头证明匹配成功,否则返回-1
}

int main(void)
{
    
    
    int t;
    cin>>t;

    while(t--)
    {
    
    
        //memset(Next,0,sizeof(Next));

        int ans=kmp();
        if(ans!=-1)
            cout<<ans+1<<endl;
        else
            cout<<ans<<endl;
    }
    return 0;
}

Logu P3375
pit point: the next array does not contain the longest prefix suffix value of the first n-1 characters, so I add a useless character at the beginning, and then output the next array from 1 when outputting.
AC code

#include<iostream>
#include<cstdio>
#include<string.h>
#include<queue>
#include<cmath>
using namespace std;

int next[2000005];
string s1,s2;
void get_next()
{
    
    
    next[0]=-1;
    next[1]=0;

    int i=2,k=0;//i是模式串的起始位置,即从第三个字符开始匹配,k是i-1个字符要匹配的位置
    int len=s2.size();
    while(i<len)
    {
    
    
        if(s2[i-1]==s2[k])//如果i-1和k相等,i后移准备匹配下一个位置,k后移
            next[i++]=++k;
        else if(k>0)//没有匹配成功,k移动到next[k]的位置
            k=next[k];
        else
            next[i++]=0;//移动到头了,next[i]只能为0了
    }
}

int kmp()
{
    
    
    int i1=0,i2=0;
    get_next();
    s2=s2.substr(0,s2.size()-1);
    int len1=s1.size();
    int len2=s2.size();

    while(i1<len1)//i1没到头,i2也没到头
    {
    
    


        if(s1[i1]==s2[i2])//相等就齐头并进
        {
    
    
            i1++;
            i2++;
        }
        else if(next[i2]==-1)//模式串到头都没有和主串能匹配的字符,主串往后移
            i1++;
        else
            i2=next[i2];//匹配不成功,i2移动
        if(i2==len2)
        {
    
    
            printf("%d\n",i1-i2+1);
            i2=next[i2]; //再次匹配
            i1--;
        }

    }
    //return i2==len2?i1-i2:-1;//i2到头证明匹配成功,否则返回-1
}

int main(void)
{
    
    

    cin>>s1>>s2;
    s2+="$";
    kmp();
    next[0]++;

    int len=s2.size();
    for(int i=1;i<=len;i++)
    {
    
    
        printf("%d ",next[i]);
    }
    return 0;
}

I wrote it all afternoon, but it’s finished. Please give me a thumbs up if you’ve seen it, thank you

Guess you like

Origin blog.csdn.net/Yang_1998/article/details/89764554