2018 Summer KMP算法浅析和在算法竞赛中的应用

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u011469138/article/details/82292339

KMP算法浅析


由BF朴素字符串匹配算法谈起


kmp算法是一种字符串匹配算法,它使用的典型场景如下:

现在有字符串A,B。问B是否是A的子串,如果是,B在A中出现了几次?
A:aaabbcabbccc
B:abbc
如果使用朴素的匹配方法,则思路如下

BF朴素字符串匹配

1.先找到A中第一个出现”a的位置flag
2.判断A中flag+1位置上是否是’b’
3.如果2中成立,则判断下一位是否匹配,若不匹配,则在第一个flag的下一位重复以上流程

char a[]={"aaabbcccc"};
char b[]={"abbc"};
int len=strlen(b);
for(int i=0;i<strlen(a)-len;i++)//枚举匹配起点
    {
        for(int j=0;j<len;j++)
            {
                if(a[i+j]!=b[j])//只要有一位不匹配
                    break;
                if(j==len-1)//全匹配,答案+1
                    ans++;
            }

    }

显然,这样判断消耗的时间很长,因为只要不匹配,重新开始匹配的地方只是下一位
这时候,就需要效率更高的算法来控制每次匹配失败时下移的长度
例如:
匹配串:AABBCBBACABCBBCD
模板串:BCBBC
我们很容易用肉眼看出匹配成功的时候是在蓝色部分,但是实际的匹配过程是什么呢?
实际上,第一次匹配失佩的地方是在
AABBCBBBBACABCBBCD
####BCBBC
在如上的红色位置,这样按照BF朴素算法,失配了,模板串就要后移一位
如下
AABBCBBBBACABCBBCD
#####BCBBC
但这样显然依然一个都不匹配,而到下一个可能匹配的地方还需要再移动几次
由肉眼可知,唯一最后可能匹配的地方只能是
AABBCBBBBACABCBBCD
#########BCBBC
从这里开始,若再不匹配,便右移一位,重新从头开始匹配,这样如果直接移动到最后一个可能匹配的位置只要操作两次,而朴素算法一个个匹配的话就要移动4次
到这里,我们正式开始介绍kmp算法,而可以将模板串直接移动到最后一个可能匹配成功(前面的都无法匹配成功)的位置的算法,就叫KMP算法


KMP算法


next数组

确定模板串匹配到失去匹配的位置时,下一个移动位置的数组,就叫next数组

next数组的计算

next数组是和模板串有关的
对于一个模板串BCBBC,我们称B、BC、BCB、BCBB为它的前缀,B、BC、BBC、CBBC称为它的后缀
next[i]就等于第i位之前的那部分字符串的最大前缀==最大后缀长度的值
例如BCBBC

Bnext值=0BCnext值=0BCBnext值=1;因为B==B
BCBBnext值=1;因为B==BBCBBCnext值=2;因为BC=BC;
所以BCBBCnext数组为
BCBBC
00011 2(2虽然不对应位了,但是还是计算一下比较直观)

好了,当匹佩到第i位失配时,这位匹配串就从模板串的第next[i]重新匹配
际上,第一次匹配失佩的地方是在
AABBCBBBBACABCBBCD
####BCBBC
再看这个例子,从模板串的i==5位开始失配,则next[5]=1,则从模板串的第1位匹配,也可以理解为右移了
i-next[i]位

AABBCBBBBACABCBBCD
#########BCBBC
相对上图右移4位,从模板串的第1位重新开始匹配

现在给出next数组的计算模板

void init()
{
    int lb=strlen(t);
    int j=0;
    f[0]=f[1]=0;
    cout<<f[0];
    for(int i=1;i<lb;i++)
    {
        while(j&&t[i]!=t[j]) j=f[j];
        if(t[i]==t[j]) j++;//前缀==后缀
        f[i+1]=j;
         cout<<f[i];    
    }   

}

KMP算法的主函数

主函数的思想就是利用next数组,在失配的地方模板串按照next数组移动

void kmp()
{
    int j=0;
    int la=strlen(s);
    int lb=strlen(t);
    for(int i=0;i<la;i++)
    {
        if(j&&s[i]!=t[j]) j=f[j];
        if(s[i]==t[j])
                j++;
        if(j==lb)
            {
            ans++;//匹配完了,答案+1
            j=f[j];//一样重新开始匹配
        }

    }   
}

讲一个重要结论:

如果len%(len-next[len])==0就说明有循环节,len-next[len]的值,就是s的最小循环节的长度,而len/(len-next[len])就是最大循环次数!

到此,KMP算法的讲解结束,下面是几个经典例题


https://vjudge.net/problem/HDU-1711

HDU-1711 Number Sequence

Problem Description
Given two sequences of numbers : a[1], a[2], …… , a[N], and b[1], b[2], …… , b[M] (1 <= M <= 10000, 1 <= N <= 1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] = b[2], …… , a[K + M - 1] = b[M]. If there are more than one K exist, output the smallest one.

Input
The first line of input is a number T which indicate the number of cases. Each case contains three lines. The first line is two numbers N and M (1 <= M <= 10000, 1 <= N <= 1000000). The second line contains N integers which indicate a[1], a[2], …… , a[N]. The third line contains M integers which indicate b[1], b[2], …… , b[M]. All integers are in the range of [-1000000, 1000000].

Output
For each test case, you should output one line which only contain K described above. If no such K exists, output -1 instead.

Sample Input
2
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 1 3
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 2 1

Sample Output
6
-1
就是找A是不是B的字串,是的话输出位置
裸的kmp

#include<cstdio>
#include<iostream>
#include<cstring>
#include<algorithm>
using namespace std;
char s[1000005];
char t[10005];
int f[10005];
int ans=0;
void init()
{
    int lb=strlen(t);
    int j=0;
    f[0]=f[1]=0;
    cout<<f[0];
    for(int i=1;i<lb;i++)
    {
        while(j&&t[i]!=t[j]) j=f[j];
        if(t[i]==t[j]) j++;
        f[i+1]=j;
         cout<<f[i];    
    }   

}
void kmp()
{
    int j=0;
    int la=strlen(s);
    int lb=strlen(t);
    for(int i=0;i<la;i++)
    {
        if(j&&s[i]!=t[j]) j=f[j];
        if(s[i]==t[j])
                j++;
        if(j==lb)
            {
            ans++;
            j=f[j];
        }

    }   
}

int main()
{
    int k;
    scanf("%d",&k);
    while(k--)
    {   
        memset(f,0,sizeof(f));
        ans=0;
        scanf("%s%s",t,s);
        init();
        kmp();
        cout<<ans<<endl;


    }
    return 0;
}

https://vjudge.net/problem/POJ-2406

poj2406 Power Strings

Given two strings a and b we define a*b to be their concatenation. For example, if a = “abc” and b = “def” then a*b = “abcdef”. If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = “” (the empty string) and a^(n+1) = a*(a^n).
Input

Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.
Output

For each s you should print the largest n such that s = a^n for some string a.
Sample Input

abcd
aaaa
ababab
.
Sample Output

1
4
3
这题是循环节的裸题,求循环节的出现次数

#include<cstdio>
#include<iostream>
#include<cstring>
#include<algorithm>
using namespace std;
char s[1000005];
char t[10005];
int f[10005];
int ans=0;
void init()
{
    int lb=strlen(t);
    int j=0;
    f[0]=f[1]=0;
    cout<<f[0];
    for(int i=1;i<lb;i++)
    {
        while(j&&t[i]!=t[j]) j=f[j];
        if(t[i]==t[j]) j++;
        f[i+1]=j;
         cout<<f[i];    
    }   

}
void kmp()
{
    int j=0;
    int la=strlen(s);
    int lb=strlen(t);
    for(int i=0;i<la;i++)
    {
        if(j&&s[i]!=t[j]) j=f[j];
        if(s[i]==t[j])
                j++;
        if(j==lb)
            {
            ans++;
            j=f[j];
        }

    }   
}

int main()
{
    int k;
    scanf("%d",&k);
    while(k--)
    {   
        memset(f,0,sizeof(f));
        ans=0;
        scanf("%s%s",t,s);
        init();
        kmp();
        cout<<ans<<endl;


    }
    return 0;
}

https://vjudge.net/problem/HDU-3746

HDU-3746 Cyclic Nacklace

Problem Description
CC always becomes very depressed at the end of this month, he has checked his credit card yesterday, without any surprise, there are only 99.9 yuan left. he is too distressed and thinking about how to tide over the last days. Being inspired by the entrepreneurial spirit of “HDU CakeMan”, he wants to sell some little things to make money. Of course, this is not an easy task.

As Christmas is around the corner, Boys are busy in choosing christmas presents to send to their girlfriends. It is believed that chain bracelet is a good choice. However, Things are not always so simple, as is known to everyone, girl’s fond of the colorful decoration to make bracelet appears vivid and lively, meanwhile they want to display their mature side as college students. after CC understands the girls demands, he intends to sell the chain bracelet called CharmBracelet. The CharmBracelet is made up with colorful pearls to show girls’ lively, and the most important thing is that it must be connected by a cyclic chain which means the color of pearls are cyclic connected from the left to right. And the cyclic count must be more than one. If you connect the leftmost pearl and the rightmost pearl of such chain, you can make a CharmBracelet. Just like the pictrue below, this CharmBracelet’s cycle is 9 and its cyclic count is 2:

Now CC has brought in some ordinary bracelet chains, he wants to buy minimum number of pearls to make CharmBracelets so that he can save more money. but when remaking the bracelet, he can only add color pearls to the left end and right end of the chain, that is to say, adding to the middle is forbidden.
CC is satisfied with his ideas and ask you for help.

Input
The first line of the input is a single integer T ( 0 < T <= 100 ) which means the number of test cases.
Each test case contains only one line describe the original ordinary chain to be remade. Each character in the string stands for one pearl and there are 26 kinds of pearls being described by ‘a’ ~’z’ characters. The length of the string Len: ( 3 <= Len <= 100000 ).

Output
For each case, you are required to output the minimum count of pearls added to make a CharmBracelet.

这题是求给一个字符串左端或者右端添加字符,使之能成为有循环的字符串,思路比较暴力,不用枚举加的字母,直接枚举加的长度,每次判断是否符合len%(len-next[len])==0就行了,有点思维

#include<cstdio>
#include<iostream>
#include<cstring>
using namespace std;
char s[100005];
int f[100005];
int ans=0;
int len;
void init()
{
    len=strlen(s);
    int j=0;
    f[0]=f[1]=0;
    for(int i=1;i<len;i++)
    {
        while(j&&s[i]!=s[j]) j=f[j];
        if(s[i]==s[j])j++;
        f[i+1]=j;
    }
}
int main()
{
    int t;
    scanf("%d",&t);
    while(t--)
    {   

        scanf("%s",s);
        len=strlen(s);
        init();
        if(f[len]!=0&&(len%(len-f[len]))==0)
            printf("0\n");
        else
        {
        int x=f[len];
        //cout<<x<<endl;
        if(x==0)
            printf("%d\n",len);
        else
            {   
                int ans=0;
                for(int i=0;i<100000;i++)
                    {

                    x++;len++;
                    ans++;
                    if((len%(len-x))==0)
                        break;


                    }
                cout<<ans<<endl;

            }

        }
    }
    return 0;
}

猜你喜欢

转载自blog.csdn.net/u011469138/article/details/82292339
今日推荐