Extended KMP

Extended KMP

Source
Problem definition: Given two strings S and T (n-lengths and m), the subscript starts from 0, define extend [i] is equal to S [i] ... S [n -1] T, and the longest prefix of the same length, all determined Extend [I] . For example, to the following table:

i 0 1 2 3 4 5 6 7
S a a a a a b b b
T a a a a a c
extend[i] 5 4 3 2 1 0 0 0

Why is this an extension of KMP algorithm it? Obviously, if there is at a certain position i of S Extend [i] is equal to m, it is found that a match is found in the string S, T, and i is the position of the first match. Moreover, the expansion of KMP matching algorithm can be found in all of T S. Next, for detailed information about the algorithm.

A: algorithmic process


As shown above, it assumes that the current train position S i to traverse, i.e. extend [0] ... extend [i - 1] value of the i-th position which has been calculated. Set two variables, a and p. p represents a starting position with the character matches the success of the far right circles, that is, "p = success last match position + 1." Compared to the string derived T, S [A ... P) is equal to T [0 ... PA) .

Then define an auxiliary array int Next [] , where next [i] meaning: T [I] ... T [m -. 1] and T is the same as the longest prefix length, m is the length of string T. for example:

i 0 1 2 3 4 5
T a a a a a c
next[i] 6 4 3 2 1 0


S [i] corresponding to T [I - A] , if I + next [I - A] <P , as shown above, the same three ellipses length, according to the definition of the next array, then extend [i] = next [i - a].

(3)

if i + next [i - a] == p it? As shown above, three ellipses are identical, S [p] = T [P - A]! And T [P - I]! = T [P - A] , but the S [p] There may be equal to T [ the p-- i] , so we can directly from S [p] and T [p - i] next match to start, picked up speed.

(4)


If i + next [i - a] > p it? It means S [i ... p) and T [ia ... pa) the same, notes S [p] = T [pa ] and T [p -! I] == T [pa] , that is to say S [the p-] = T [the p-- i]! , so there is no need to continue down the judgment, we can directly extend [i] assigned to the p-- i .

(5) Finally, there is an array of solving the next. Let us look at the next [i] defined and extend [i] of:

  • ext[i]T[i]...T[m - 1] 与 T 的最长相同前缀长度;
  • extend[i]S[i]...S[n - 1] 与 T 的最长相同前缀长度。
    恍然大悟,求解next[i]的过程不就是 T 自己和自己的一个匹配过程嘛,下面直接看代码。

二:代码

#include <iostream>
#include <string>

using namespace std;

/* 求解 T 中 next[],注释参考 GetExtend() */
void GetNext(string & T, int & m, int next[])
{
    int a = 0, p = 0;
    next[0] = m;

    for (int i = 1; i < m; i++)
    {
        if (i >= p || i + next[i - a] >= p)
        {
            if (i >= p)
                p = i;

            while (p < m && T[p] == T[p - i])
                p++;

            next[i] = p - i;
            a = i;
        }
        else
            next[i] = next[i - a];
    }
}

/* 求解 extend[] */
void GetExtend(string & S, int & n, string & T, int & m, int extend[], int next[])
{
    int a = 0, p = 0;
    GetNext(T, m, next);

    for (int i = 0; i < n; i++)
    {
        if (i >= p || i + next[i - a] >= p) // i >= p 的作用:举个典型例子,S 和 T 无一字符相同
        {
            if (i >= p)
                p = i;

            while (p < n && p - i < m && S[p] == T[p - i])
                p++;

             **extend[i]** = p - i;
            a = i;
        }
        else
             **extend[i]** = next[i - a];
    }
}

int main()
{
    int next[100];
    int extend[100];
    string S, T;
    int n, m;
    
    while (cin >> S >> T)
    {
        n = S.size();
        m = T.size();
        GetExtend(S, n, T, m, extend, next);

        // 打印 next
        cout << "next:   ";
        for (int i = 0; i < m; i++)
            cout << next[i] << " ";
 
        // 打印 extend
        cout << "\nextend: ";
        for (int i = 0; i < n; i++)
            cout <<  **extend[i]** << " ";

        cout << endl << endl;
    }
    return 0;
}

数据测试如下:

aaaaabbb
aaaaac
next:   6 4 3 2 1 0
extend: 5 4 3 2 1 0 0 0

abc
def
next:   3 0 0
extend: 0 0 0

Guess you like

Origin www.cnblogs.com/tttfu/p/11309577.html