(Contents to be perfect)
Knowledge Point
String pattern matching also known positioning operation or substring string matching. In matching, the main string called a target (string), substring called pattern (string) .
BF 法 (Brute Force):
KMP method:
Both methods string pattern matching. BF method, a simple string matching method. KMP method, sliding farther as possible, using the matching results portion.
Simple pattern matching algorithm (BF algorithm)
Illustration
The first round of comparison:
The second round comparison:
...... same principle, an intermediate step is omitted
Fifth round:
Sixth round:
- The first round: sub-string comparison with the first character in the main character string in a first
- If equal, then the comparison continues with the main string of the second character substring
- If equal, comparing a second round
- Second round: the first character string is compared with the second main character sub-string ......
- N-th round: compared in turn continues until all matches
Code:
(slightly)
BF algorithm advantages: thinking simple, direct, drawback: each time the characters do not match, it must go back to the starting position, big time overhead. Time complexity O ((n-m + 1) * m).
KMP pattern matching algorithm
Illustration:
From the figure, we can easily find the same elements as the preceding character is present, S and T, it is not necessary to go back to the position S S [1] is, do not have T [0] position back to T. Comparison we can skip back to the same elements, a direct comparison of S [8] and T [3].
So we build a next storage array back position.
KMP algorithm idea: Suppose pattern matching process, the implementation of T [i] and W [j] matches the check. If T [i] = W [j], then continue to check T [i + 1] and W [j + 1] match.
Two Calculations next array
(1) The first method for finding: The next value of the previous character request
initialization:
Code:
1 char t[]={"ababaabab"};
2 int Len=strlen(t);
3
4 int i = 0, j = -1;
5 int next[len];
6 next[0]=-1;
7 while (i < len - 1) { 8 if ((j == -1) || t[i] == t[j]) { 9 ++i, ++j; 10 next[i] = j; 11 }else{ 12 j = next[j]; 13 } 14 } 15 16 for(i=0;i<len;i++) 17 {printf("next[%d]->%d\n",i,next[i])}
(2)第二种求法:根据最大公共元素长度求
next数组优化(nextval的求法)
当子串中有多个连续重复的元素,例如主串 S=“aaabcde” 子串T=“aaaaax” 在主串指针不动,移动子串指针比较这些值,其实有很多无用功,因为子串中5个元素都是相同的a,所以我们可以省略掉这些重复的步骤。
nextval其实是next的改进。