Reprinted from http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html
1.BF Violent Solution
The basic idea :
compare the first character of the target string s with the first character string of the pattern string ss, if they are equal, compare the subsequent character strings;
otherwise , compare the next character of the string s with the pattern string again.
By analogy, until each character in ss is equal to a continuous substring in s, the match is successful. At this time, the position of the first character of ss in s is the position of ss in s, otherwise it matches unsuccessful.
(In fact, you can use the find function in this place)
2.KMP
For example, there is a string "BBC ABCDAB ABCDABCDABDE", I want to know whether it contains another string "ABCDABD"?
1.
First, compare the first character of the string "BBC ABCDAB ABCDABCDABDE" with the first character of the search term "ABCDABD". Because B does not match A, the search term is shifted one digit backward.
2.
Because B does not match A, the search term moves back.
3.
That's it, until the string has a character that is the same as the first character of the search term.
4.
Then compare the string with the next character of the search term, and it is still the same.
5.
Until there is a character in the string that is different from the character corresponding to the search term, the violent solution at this time is to move the string one place backward and compare it again, as follows
6.
This is very inefficient, because you have to The search position is moved to the position that has been compared, and the comparison is performed again.
7.
A basic fact is that when the space does not match D, you actually know that the first six characters are "ABCDAB". The idea of the KMP algorithm is to try to use this known information, do not move the "search position" back to the position that has been compared, and continue to move it backward, which improves efficiency.
8. How do you know how much to move?
At this time, you have to count as a "partial match table".
First, you have to understand two words: prefix and suffix. "Prefix" refers to all the head combinations of a string except the last character; "suffix" refers to the first combination. Except for characters, all tails of a string are combined.
"Partial match value" is the length of the longest common element of "prefix" and "suffix" . Take "ABCDABD" as an example,
- "A"的前缀和后缀都为空集,共有元素的长度为0;
-The prefix of "AB" is [A], the suffix is [B], and the length of common elements is 0;
- The prefix of "ABC" is [A, AB], the suffix is [BC, C], and the length of the total elements is 0;
- The prefix of "ABCD" is [A, AB, ABC], the suffix is [BCD, CD, D], and the length of the common elements is 0;
- The prefix of "ABCDA" is [A, AB, ABC, ABCD], the suffix is [BCDA, CDA, DA, A], the total element is "A", and the length is 1;
- The prefix of "ABCDAB" is [A, AB, ABC, ABCD, ABCDA], the suffix is [BCDAB, CDAB, DAB, AB, B], the total element is "AB" and the length is 2;
- The prefix of "ABCDABD" is [A, AB, ABC, ABCD, ABCDA, ABCDAB], the suffix is [BCDABD, CDABD, DABD, ABD, BD, D], and the length of the common elements is 0.
The essence of "partial matching" is that sometimes, the beginning and end of the string will be repeated. For example, if there are two "AB" in "ABCDAB", then its "partial match value" is 2 (the length of "AB"). When the search word moves, the first "AB" moves backward 4 digits (string length-partial matching value), and then you can come to the second "AB" position.
8. When the
known space does not match D, the first six characters "ABCDAB" are matched. Looking up the table, we can see that the "partial matching value" corresponding to the last matching character B is 2, so calculate the number of bits moved backward according to the following formula:
Number of shifts = number of matched characters-corresponding partial matching value
Because 6-2 is equal to 4, move the search term backward by 4 places.
9.
Because the space does not match the C, the search term must continue to move backward. At this time, the number of matched characters is 2 ("AB"), and the corresponding "partial match value" is 0. Therefore, the number of shifts = 2-0, and the result is 2, so the search term is shifted back by 2 digits.
10.
Because the space does not match with A, continue to move back one digit.
11.
Compare bit by bit, until C and D do not match. So, move the number of digits = 6-2, continue to move the search term backward 4 places.
12.
Compare bit by bit until the last bit of the search term is found and a complete match is found, so the search is complete. If you want to continue the search (that is, find all matches), move the number of digits = 7-0, and then move the search term backward by 7 places, so I won’t repeat it here.
The following is the algorithm implementation
#include <iostream>
#include <string>
#include <vector>
using namespace std;
//部分匹配表
void cal_next(string &str, vector<int> &next)
{
const int len = str.size();
next[0] = -1;
int k = -1;
int j = 0;
while (j < len - 1)
{
if (k == -1 || str[j] == str[k])
{
++k;
++j;
next[j] = k;//表示第j个字符有k个匹配(“最大长度值” 整体向右移动一位,然后初始值赋为-1)
}
else
k = next[k];//往前回溯
}
}
vector<int> KMP(string &str1, string &str2, vector<int> &next)
{
vector<int> vec;
cal_next(str2, next);
int i = 0;//i是str1的下标
int j = 0;//j是str2的下标
int str1_size = str1.size();
int str2_size = str2.size();
while (i < str1_size && j < str2_size)
{
//如果j = -1,或者当前字符匹配成功(即S[i] == P[j]),
//都令i++,j++. 注意:这里判断顺序不能调换!
if (j == -1 || str1[i] == str2[j])
{
++i;
++j;
}
else
j = next[j];//当前字符匹配失败,直接从str[j]开始比较,i的位置不变
if (j == str2_size)//匹配成功
{
vec.push_back(i - j);//记录下完全匹配最开始的位置
j = -1;//重置
}
}
return vec;
}
int main(int argc, char const *argv[])
{
vector<int> vec(20, 0);
vector<int> vec_test;
string str1;
cin>>str1;
string str2 ;
cin>>str2;
vec_test = KMP(str1, str2, vec);
vector<int>::iterator it;
for(it = vec_test.begin(); it != vec_test.end(); it++)
{
cout<<*it + 1<<endl;
}
// for (const auto v : vec_test)
// cout << v << endl;
return 0;
}