table of Contents
Extended KMP
Source
Problem definition: Given two strings S and T (n-lengths and m), the subscript starts from 0, define extend [i] is equal to S [i] ... S [n -1] T, and the longest prefix of the same length, all determined Extend [I] . For example, to the following table:
i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
S | a | a | a | a | a | b | b | b |
T | a | a | a | a | a | c | ||
extend[i] | 5 | 4 | 3 | 2 | 1 | 0 | 0 | 0 |
Why is this an extension of KMP algorithm it? Obviously, if there is at a certain position i of S Extend [i] is equal to m, it is found that a match is found in the string S, T, and i is the position of the first match. Moreover, the expansion of KMP matching algorithm can be found in all of T S. Next, for detailed information about the algorithm.
A: algorithmic process
As shown above, it assumes that the current train position S i to traverse, i.e. extend [0] ... extend [i - 1] value of the i-th position which has been calculated. Set two variables, a and p. p represents a starting position with the character matches the success of the far right circles, that is, "p = success last match position + 1." Compared to the string derived T, S [A ... P) is equal to T [0 ... PA) .
Then define an auxiliary array int Next [] , where next [i] meaning: T [I] ... T [m -. 1] and T is the same as the longest prefix length, m is the length of string T. for example:
i | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
T | a | a | a | a | a | c |
next[i] | 6 | 4 | 3 | 2 | 1 | 0 |
S [i] corresponding to T [I - A] , if I + next [I - A] <P , as shown above, the same three ellipses length, according to the definition of the next array, then extend [i] = next [i - a].
(3)
if i + next [i - a] == p it? As shown above, three ellipses are identical, S [p] = T [P - A]! And T [P - I]! = T [P - A] , but the S [p] There may be equal to T [ the p-- i] , so we can directly from S [p] and T [p - i] next match to start, picked up speed.
(4)
If i + next [i - a] > p it? It means S [i ... p) and T [ia ... pa) the same, notes S [p] = T [pa ] and T [p -! I] == T [pa] , that is to say S [the p-] = T [the p-- i]! , so there is no need to continue down the judgment, we can directly extend [i] assigned to the p-- i .
(5) Finally, there is an array of solving the next. Let us look at the next [i] defined and extend [i] of:
- ext[i] : T[i]...T[m - 1] 与 T 的最长相同前缀长度;
- extend[i]: S[i]...S[n - 1] 与 T 的最长相同前缀长度。
恍然大悟,求解next[i]的过程不就是 T 自己和自己的一个匹配过程嘛,下面直接看代码。
二:代码
#include <iostream>
#include <string>
using namespace std;
/* 求解 T 中 next[],注释参考 GetExtend() */
void GetNext(string & T, int & m, int next[])
{
int a = 0, p = 0;
next[0] = m;
for (int i = 1; i < m; i++)
{
if (i >= p || i + next[i - a] >= p)
{
if (i >= p)
p = i;
while (p < m && T[p] == T[p - i])
p++;
next[i] = p - i;
a = i;
}
else
next[i] = next[i - a];
}
}
/* 求解 extend[] */
void GetExtend(string & S, int & n, string & T, int & m, int extend[], int next[])
{
int a = 0, p = 0;
GetNext(T, m, next);
for (int i = 0; i < n; i++)
{
if (i >= p || i + next[i - a] >= p) // i >= p 的作用:举个典型例子,S 和 T 无一字符相同
{
if (i >= p)
p = i;
while (p < n && p - i < m && S[p] == T[p - i])
p++;
**extend[i]** = p - i;
a = i;
}
else
**extend[i]** = next[i - a];
}
}
int main()
{
int next[100];
int extend[100];
string S, T;
int n, m;
while (cin >> S >> T)
{
n = S.size();
m = T.size();
GetExtend(S, n, T, m, extend, next);
// 打印 next
cout << "next: ";
for (int i = 0; i < m; i++)
cout << next[i] << " ";
// 打印 extend
cout << "\nextend: ";
for (int i = 0; i < n; i++)
cout << **extend[i]** << " ";
cout << endl << endl;
}
return 0;
}
数据测试如下:
aaaaabbb
aaaaac
next: 6 4 3 2 1 0
extend: 5 4 3 2 1 0 0 0
abc
def
next: 3 0 0
extend: 0 0 0