KMP algorithm
1. Brief introduction
The KMP algorithm is used for string matching to return the starting position of the successfully matched string, and the time complexity is O(N)
The indexOf function comes with Java, and the indexOf function is an optimized version of KMP, which only optimizes the constant time.
2.next array
effect
- It can speed up the matching process without violent matching
- The next array saves the maximum matching length of the prefix string and the suffix string (not including the string itself)
Implementation process
next[0]
The default value is -1, which is artificially specified and used for subsequent judgmentsnext[1]=0
,i=1
when,[0,i-1]
there is only one character in the range, so the prefix length and suffix length are 0, because the prefix and suffix lengths do not include themselves when calculating- Traversing the string from
i=2
the beginning, there are three general situations:- Case 1:
i-1
The character at the position is equal to the starting position of the prefix to be matched,next[i]
equal to the starting position of the prefix plus 1, the expressionnext[i]=++index
- Case 2: If the prefix and suffix do not match successfully,
next
find the corresponding prefix position in the index subscript from the array, and the expressionindex=next[index]
- Case 3: The prefix and suffix are not matched successfully, and the next array can no longer look for values.
next[i]=0
- Case 1:
Graphical next array implementation process
next array code
vector<int> getNext(string str) {
// 每个位置字符串的前缀与后缀最大匹配长度,不包含整串
vector<int> next(str.size());
next[0] = -1; //人为规定,0号位置的值是-1
next[1] = 0;
int i = 2; // 从2开始遍历str
// index代表当前是哪个位置的字符,在和index+1也就是i位置比较
int index = 0; // index既用来作为下标访问,也作为值
while (i < next.size()) {
// str[i-1]代表后缀开始的位置, str[index]代表前缀开始的位置
// index保存了上一次匹配的最大长度, str[index]代表了当前前缀位置, 可以通过这个来进行加速匹配
if (str[i - 1] == str[index]) {
// 如果str[i-1](后缀待匹配的字符) 等于 str[index](前缀待匹配的字符)
// next数组i位置的值 直接等于上次最大匹配长度+1
next[i++] = ++index;
}
else if (index > 0) {
// 后缀与前缀没有匹配成功, 并且index还可以往前找 next[index]的前缀, 也就是找当前前缀的前缀开始位置
index = next[index];
}
else{
// index=0, 没有前缀了, 长度记为0
next[i++] = 0;
}
}
return next;
}
3. Main string and substring comparison function
process
- Call the getNext function to get the next array of substrings
- Use
i
andj
as subscripts to traverse the main stringstr1
and substring respectivelystr2
- There are three cases when neither
i
andj
- Case 1: The character at the current position of the main string is equal to the character at the current position of the substring,
i
andj
both sums are incremented - Situation 2: When the next array of the substring is equal to -1, that is,
next[0]
the artificially specified value, orj
equal to 0, it means that the matching failed, andi
it will be incremented,j
keeping 0 unchanged - Case 3: The character at the current position of the main string is not equal to the character at the current position of the substring. At this time
j>0
, find the position of the previous prefix in the next array - The last
j
value to check is whether it is equal to the length of the substring. If it is equal to the length of the substring, it means that the match is successful, and then returningi-j
means thati-j
the match starts from the position of the main string. - Returns -1 if the match fails
the code
int getIndex(string str1, string str2) {
vector<int> next = getNext(str2);
int i = 0;
int j = 0;
while (i < str1.size() && j < str2.size()) {
if (str1[i] == str2[j]) {
i++;
j++;
}else if (next[j] == -1) {
i++;
}else{
j = next[j];
}
}
if (j == str2.size()) {
return i - j;
}
return -1;
}
4. Overall code
#include<iostream>
#include<string>
#include<vector>
using namespace std;
vector<int> getNext(string str) {
// 每个位置字符串的前缀与后缀最大匹配长度,不包含整串
vector<int> next(str.size());
next[0] = -1; //人为规定,0号位置的值是-1
next[1] = 0;
int i = 2; // 从2开始遍历str
// index代表当前是哪个位置的字符,在和index+1也就是i位置比较
int index = 0; // index既用来作为下标访问,也作为值
while (i < next.size()) {
// str[i-1]代表后缀开始的位置, str[index]代表前缀开始的位置
// index保存了上一次匹配的最大长度, str[index]代表了当前前缀位置, 可以通过这个来进行加速匹配
if (str[i - 1] == str[index]) {
// 如果str[i-1](后缀待匹配的字符) 等于 str[index](前缀待匹配的字符)
// next数组i位置的值 直接等于上次最大匹配长度+1
next[i++] = ++index;
}
else if (index > 0) {
// 后缀与前缀没有匹配成功, 并且index还可以往前找 next[index]的前缀, 也就是找当前前缀的前缀开始位置
index = next[index];
}
else{
// index=0, 没有前缀了, 长度记为0
next[i++] = 0;
}
}
return next;
}
int getIndex(string str1, string str2) {
vector<int> next = getNext(str2);
int i = 0;
int j = 0;
while (i < str1.size() && j < str2.size()) {
if (str1[i] == str2[j]) {
i++;
j++;
}else if (next[j] == -1) {
i++;
}else{
j = next[j];
}
}
if (j == str2.size()) {
return i - j;
}
return -1;
}
int main() {
// 在str1中查找有没有子串str2
string str1 = "abbcabcccc";
string str2 = "abcabc";
//cin >> str1 >> str2;
int index = getIndex(str1, str2);
cout << index;
return 0;
}
5. Related topics about KMP
Minimum number of characters to add
Given a string str, you can only add characters after str to generate a longer string. The longer string needs to contain two strs, and the starting positions of the two strs cannot be the same. Find the minimum number of characters to add.
Input description:
Enter a line, indicating the original string
Output description:
Output an integer, indicating the minimum number of characters required to be added
Example 1
input
123123
output
3
Example 2
input
11111
output
1
train of thought
- According to the meaning of the question, there are three situations
- Case 1: One character, the answer is 1, just add one character
- Situation 2: Two characters, judge whether the two characters are the same, if they are the same, the answer is the length of the string, because this string needs to be added, if they are different, the answer is 1, just use the first character just add it
- Case 3: multiple characters, the next array at the last position of the string, because the meaning of the next array is the maximum matching length between the prefix and the suffix. So the answer is the length of the string
next[str.length()]
minus 1
#include<iostream>
#include<vector>
#include<string>
using namespace std;
int getNext(string str) {
vector<int> next(str.size());
next[0] = -1; //人为规定,0号位置的值是-1
next[1] = 0;
int i = 2; // 从2开始遍历str
int val = 0; // val既用来作为下标访问,也作为值
while (i < next.size()) {
if (str[i - 1] == str[val]) {
next[i++] = ++val;
}
else if (val > 0) {
val = next[val]; // 取出前一个next数组的值
}
else {
next[i++] = 0;
}
}
return next[str.size() - 1];
}
int main() {
string str;
cin >> str;
int ans = 0;
if (str.size() == 0) {
cout << 0;
return 0;
}
else if (str.size() == 1) {
ans = str.size() + str.size();
}
else if (str.size() == 2) {
ans = str[0] == str[1] ? str.size() + 1 : str.size() + str.size();
}
else {
int next = getNext(str);
ans = str.size() + str.size() -1 - next;
}
ans -= str.size();
cout << ans;
return 0;
}
recommended article
Detailed explanation of Mancher algorithm with practice questions