kmp algorithm topics

Reference: https://www.zhihu.com/question/21923021/answer/281346746

kmp algorithm Front: None

kmp algorithm is an efficient string matching algorithms, for finding the length m of the main character string in the pattern string P S n of a given length, the time complexity can be optimized from O (n * m) is O (n + m).

The core kmp algorithm is referred to as a partial match table (Partial Match Table) (hereinafter abbreviated as PMT) array. For a character string "abababca", as shown in FIG value below its PMT, the PMT value of the length of the prefix and suffix strings to set the intersection of the longest set of elements.

Find matching string P = "abababca" host string S = "ababababca" in. If not at a character j, then the nature of the aforesaid character string matching the PMT, the PMT before the main string pointer i [j-1] bit is the 0th bit to the certain matching string PMT [ j-1] is the same position.

In the example in FIG., In the i mismatched, so that the first 6 digits of the same main character string and the matching string. And because the string matches the first six bits, its first 4 bits after the prefix and suffix 4 is the same, so we infer main string before I 4 and the start of the string 4 is the same. Gray is part of the figure, and that this part would not compare.

With previous ideas, we can use the Find PMT accelerate the string. If the j-bit mismatch, then the impact j pointer back position is actually the first j-1 PMT-bit value, so for convenience, we do not directly use the PMT array, but will be moved back a PMT array. We get this new array called the next array.

In the above example, next array as shown in FIG. Where we in the backward shift in the PMT, the value of bit 0 is set to -1 we aim to facilitate programming.

其实，求next数组的过程完全可以看成字符串匹配的过程，即以匹配字符串为主字符串，以匹配字符串的前缀为目标字符串，一旦字符串匹配成功，那么当前的next值就是匹配成功的字符串的长度。具体来说，就是从匹配字符串的第1位(注意，不包括第0位)开始对自身进行匹配运算。在任一位置，能匹配的最长长度就是当前位置的next值，如下图所示。

例题：
HDU1711 Number Sequence

Problem Description
Given two sequences of numbers: a[1], a[2], ...... , a[N], and b[1], b[2], ...... , b[M] (1 <= M <= 10000, 1 <= N <= 1000000). Your task is to find a number K which make a[K] = b[1], a[K + 1] = b[2], ...... , a[K + M - 1] = b[M]. If there are more than one K exist, output the smallest one.

Input
The first line of input is a number T which indicate the number of cases. Each case contains three lines. The first line is two numbers N and M (1 <= M <= 10000, 1 <= N <= 1000000). The second line contains N integers which indicate a[1], a[2], ...... , a[N]. The third line contains M integers which indicate b[1], b[2], ...... , b[M]. All integers are in the range of [-1000000, 1000000].

Output
For each test case, you should output one line which only contain K described above. If no such K exists, output -1 instead.

Sample Input
2
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 1 3
13 5
1 2 1 2 3 1 2 3 1 3 2 1 2
1 2 3 2 1

Sample Output
6
-1

参考代码：

#include <stdio.h>
#include <string.h>

int n, m;
int a[1000005], b[10005], next[10005];

void buildnext()
{
    next[0] = -1;
    int i = 0, j = -1;
    while (i < m)
    {
        if (j == -1 || b[i] == b[j])
            next[++i] = ++j;
        else
            j = next[j];
    }
}

int kmp()
{
    buildnext();
    int i = 0, j = 0;
    while (i < n && j < m)
    {
        if (j == -1 || a[i] == b[j])
        {
            ++i;
            ++j;
        }
        else
            j = next[j];
    }
    if (j == m)
        return i - j;
    else
        return -1;
}

int main()
{
    int t, i, ret;
    scanf("%d", &t);
    while (t--)
    {
        memset(a, 0, sizeof(a));
        memset(b, 0, sizeof(b));
        memset(next, 0, sizeof(next));
        scanf("%d %d", &n, &m);
        for (i = 0; i < n; ++i)
            scanf("%d", &a[i]);
        for (i = 0; i < m; ++i)
            scanf("%d", &b[i]);
        ret = kmp();
        if (ret >= 0)
            printf("%d\n", ret + 1);
        else
            printf("-1\n");
    }
    return 0;
}

kmp algorithm topics

Guess you like