Detailed explanation of KMP algorithm with practice questions

KMP algorithm

1. Brief introduction

The KMP algorithm is used for string matching to return the starting position of the successfully matched string, and the time complexity is O(N)

The indexOf function comes with Java, and the indexOf function is an optimized version of KMP, which only optimizes the constant time.

2.next array

effect

  • It can speed up the matching process without violent matching
  • The next array saves the maximum matching length of the prefix string and the suffix string (not including the string itself)

Implementation process

  • next[0]The default value is -1, which is artificially specified and used for subsequent judgments
  • next[1]=0, i=1when, [0,i-1]there is only one character in the range, so the prefix length and suffix length are 0, because the prefix and suffix lengths do not include themselves when calculating
  • Traversing the string from i=2the beginning, there are three general situations:
    • Case 1: i-1The character at the position is equal to the starting position of the prefix to be matched, next[i]equal to the starting position of the prefix plus 1, the expressionnext[i]=++index
    • Case 2: If the prefix and suffix do not match successfully, nextfind the corresponding prefix position in the index subscript from the array, and the expressionindex=next[index]
    • Case 3: The prefix and suffix are not matched successfully, and the next array can no longer look for values.next[i]=0

Graphical next array implementation process
target substringinsert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

next array code

   vector<int> getNext(string str) {
    
    
   	// 每个位置字符串的前缀与后缀最大匹配长度,不包含整串
   	vector<int> next(str.size());
   	next[0] = -1; //人为规定,0号位置的值是-1
   	next[1] = 0;
   	int i = 2; // 从2开始遍历str
   	// index代表当前是哪个位置的字符,在和index+1也就是i位置比较
   	int index = 0; // index既用来作为下标访问,也作为值
   	while (i < next.size()) {
    
    
   		// str[i-1]代表后缀开始的位置, str[index]代表前缀开始的位置
   		// index保存了上一次匹配的最大长度, str[index]代表了当前前缀位置, 可以通过这个来进行加速匹配
   		if (str[i - 1] == str[index]) {
    
    
   			// 如果str[i-1](后缀待匹配的字符) 等于 str[index](前缀待匹配的字符) 
   			// next数组i位置的值 直接等于上次最大匹配长度+1
   			next[i++] = ++index;
   		}
   		else if (index > 0) {
    
    
   			// 后缀与前缀没有匹配成功, 并且index还可以往前找 next[index]的前缀, 也就是找当前前缀的前缀开始位置
   			index = next[index]; 
   		}
   		else{
    
    
   			// index=0, 没有前缀了, 长度记为0
   			next[i++] = 0;
   		}
   	}
   	return next;
   }

3. Main string and substring comparison function

process

  • Call the getNext function to get the next array of substrings
  • Use iand jas subscripts to traverse the main string str1and substring respectivelystr2
  • There are three cases when neither iandj
  • Case 1: The character at the current position of the main string is equal to the character at the current position of the substring, iand jboth sums are incremented
  • Situation 2: When the next array of the substring is equal to -1, that is, next[0]the artificially specified value, or jequal to 0, it means that the matching failed, and iit will be incremented, jkeeping 0 unchanged
  • Case 3: The character at the current position of the main string is not equal to the character at the current position of the substring. At this time j>0, find the position of the previous prefix in the next array
  • The last jvalue to check is whether it is equal to the length of the substring. If it is equal to the length of the substring, it means that the match is successful, and then returning i-jmeans that i-jthe match starts from the position of the main string.
  • Returns -1 if the match fails

the code

   int getIndex(string str1, string str2) {
    
    
   	vector<int> next = getNext(str2);
   	int i = 0;
   	int j = 0;
   	while (i < str1.size() && j < str2.size()) {
    
    
   		if (str1[i] == str2[j]) {
    
    
   			i++;
   			j++;
   		}else if (next[j] == -1) {
    
    
   			i++;
   		}else{
    
    
   			j = next[j];
   		}
   	}
   	if (j == str2.size()) {
    
    
   		return i - j;
   	}
   	return -1;
   }

4. Overall code

#include<iostream>
#include<string>
#include<vector>
using namespace std;

vector<int> getNext(string str) {
    
    
	// 每个位置字符串的前缀与后缀最大匹配长度,不包含整串
	vector<int> next(str.size());
	next[0] = -1; //人为规定,0号位置的值是-1
	next[1] = 0;
	int i = 2; // 从2开始遍历str
	// index代表当前是哪个位置的字符,在和index+1也就是i位置比较
	int index = 0; // index既用来作为下标访问,也作为值
	while (i < next.size()) {
    
    
		// str[i-1]代表后缀开始的位置, str[index]代表前缀开始的位置
		// index保存了上一次匹配的最大长度, str[index]代表了当前前缀位置, 可以通过这个来进行加速匹配
		if (str[i - 1] == str[index]) {
    
    
			// 如果str[i-1](后缀待匹配的字符) 等于 str[index](前缀待匹配的字符) 
			// next数组i位置的值 直接等于上次最大匹配长度+1
			next[i++] = ++index;
		}
		else if (index > 0) {
    
    
			// 后缀与前缀没有匹配成功, 并且index还可以往前找 next[index]的前缀, 也就是找当前前缀的前缀开始位置
			index = next[index]; 
		}
		else{
    
    
			// index=0, 没有前缀了, 长度记为0
			next[i++] = 0;
		}
	}
	return next;
}

int getIndex(string str1, string str2) {
    
    
	vector<int> next = getNext(str2);
	int i = 0;
	int j = 0;
	while (i < str1.size() && j < str2.size()) {
    
    
		if (str1[i] == str2[j]) {
    
    
			i++;
			j++;
		}else if (next[j] == -1) {
    
    
			i++;
		}else{
    
    
			j = next[j];
		}
	}
	if (j == str2.size()) {
    
    
		return i - j;
	}
	return -1;
}

int main() {
    
    
	// 在str1中查找有没有子串str2
	string str1 = "abbcabcccc";
	string str2 = "abcabc";
	//cin >> str1 >> str2;
	int index = getIndex(str1, str2);
	cout << index;
	return 0;
}

5. Related topics about KMP

Minimum number of characters to add

Given a string str, you can only add characters after str to generate a longer string. The longer string needs to contain two strs, and the starting positions of the two strs cannot be the same. Find the minimum number of characters to add.

Input description:
Enter a line, indicating the original string

Output description:
Output an integer, indicating the minimum number of characters required to be added

Example 1
input
123123
output
3

Example 2
input
11111
output
1

train of thought

  • According to the meaning of the question, there are three situations
    • Case 1: One character, the answer is 1, just add one character
    • Situation 2: Two characters, judge whether the two characters are the same, if they are the same, the answer is the length of the string, because this string needs to be added, if they are different, the answer is 1, just use the first character just add it
    • Case 3: multiple characters, the next array at the last position of the string, because the meaning of the next array is the maximum matching length between the prefix and the suffix. So the answer is the length of the string next[str.length()]minus 1
#include<iostream>
#include<vector>
#include<string>
using namespace std;

int getNext(string str) {
    
    
	vector<int> next(str.size());
	next[0] = -1; //人为规定,0号位置的值是-1
	next[1] = 0;
	int i = 2; // 从2开始遍历str
	int val = 0; // val既用来作为下标访问,也作为值
	while (i < next.size()) {
    
    
		if (str[i - 1] == str[val]) {
    
    
			next[i++] = ++val;
		}
		else if (val > 0) {
    
    
			val = next[val]; // 取出前一个next数组的值
		}
		else {
    
    
			next[i++] = 0;
		}
	}
	return next[str.size() - 1];
}

int main() {
    
    
	string str;
	cin >> str;
	int ans = 0;
	if (str.size() == 0) {
    
    
		cout << 0;
		return 0;
	}
	else if (str.size() == 1) {
    
    
		ans = str.size() + str.size();
	}
	else if (str.size() == 2) {
    
    
		ans = str[0] == str[1] ? str.size() + 1 : str.size() + str.size();
	}
	else {
    
    
		int next = getNext(str);
		ans = str.size() + str.size() -1 - next;
	}
	ans -= str.size();
	cout << ans;
	return 0;
}

recommended article

Detailed explanation of Mancher algorithm with practice questions

Guess you like

Origin blog.csdn.net/weixin_44839362/article/details/117134881