Pattern matching KMP algorithm study notes

Pattern matching KMP algorithm

String: BBC ABCDAB ABCDABCDABDE

Search term: ABCDABD

1.

B does not match A, the search term is shifted one digit backward.

2.


B does not match A, and the search term moves back.

3.

Until the string has a character that is the same as the first character of the search term.

4.

Then compare the string with the next character of the search term, and it is still the same.

5.

Until the string has a character that is not the same as the character corresponding to the search term.

6.

At this time, the most natural reaction is to move the entire search term back by one place and compare them one by one from the beginning. Although this is feasible, it is very inefficient, because you have to move the "search position" to a position that has already been compared and repeat the comparison.

7.

A basic fact is that when a space does not match D, you actually know that the first six characters are "ABCDAB". The idea of ​​the KMP algorithm is to try to use this known information, not to move the "search position" back to a position that has already been compared, and to continue to move it backwards, which improves efficiency.

8.

For the search term, calculate a "Partial Match Table". How this table is generated will be introduced later, as long as you can use it here.

9.

When the known space does not match D, the first six characters "ABCDAB" are matched. Looking up the table, we can see that the "partial matching value" corresponding to the last matching character B is 2, so calculate the number of bits moved backward according to the following formula:

  Number of shifts = number of matched characters-corresponding partial matching value (the length of the longest common element of the search terms "prefix" and "suffix")

Because 6-2 is equal to 4, move the search term backward by 4 places.

10.

Because the space does not match the C, the search term must continue to move backward. At this time, the number of matched characters is 2 ("AB"), and the corresponding "partial match value" is 0. Therefore, the number of shifts = 2-0, and the result is 2, so the search term is shifted back by 2 digits.

11.

Because the space does not match with A, continue to move back one bit.

12.

Compare bit by bit, until C and D do not match. So, move the number of digits = 6-2, continue to move the search term backward 4 places.

13.

Comparing bit by bit, until the last bit of the search term, an exact match is found, and the search is completed. If you want to continue the search (that is, find all matches), move the number of digits = 7-0, and then move the search term backward by 7 places, so I won’t repeat it here.

14.

The following describes how the "Partial Match Table" is generated.

First, we must understand two concepts: "prefix" and "suffix". "Prefix" refers to all the head combinations of a string except the last character; "suffix" refers to all the tail combinations of a string except the first character.

15.

"Partial match value" is the length of the longest common element of "prefix" and "suffix". Take "ABCDABD" as an example,

                                  Prefix suffix length of public element

    "A" empty set empty set 0

    "AB"                     [A]                                                                         [B]                                                                        0

  "ABC"                 [A, AB]                                                                 [BC, C]                                                                   0

  "ABCD"              [A, AB, ABC]                                                       [BCD, CD, D]                                                          0

  "ABCDA"            [A, AB, ABC, ABCD]                                         [BCDA, CDA, DA, A]                                                1

  "ABCDAB"         [A, AB, ABC, ABCD, ABCDA]                         [BCDAB, CDAB, DAB, AB, B]                                    2

  "ABCDABD"      [A, AB, ABC, ABCD, ABCDA, ABCDAB]      [BCDABD, CDABD, DABD, ABD, BD, D]                      0

16.

The essence of "partial matching" is that sometimes, the beginning and end of the string will be repeated. For example, if there are two "AB" in "ABCDAB", then its "partial match value" is 2 (the length of "AB"). When the search word moves, the first "AB" moves backward 4 digits (string length-partial matching value), and then you can come to the second "AB" position.

17. Next array

The next array is equivalent to the "maximum length value". The whole is moved one bit to the right, and then the initial value is assigned to -1.

Number of shifts = the position of the mismatched character (the subscript starts from 0)-the next value corresponding to the mismatched character

18 next array optimized version

Equal, then let next[j] = next[k]

j=3,k=0,next[0]=-1

j=5,k=1,next[1]=0

j=7,k=1,next[1]=0

j=8, k=2, next[2]=0
——————————————
Reprint address: https://blog.csdn.net/v_JULY_v/article/details/7041827

 

Guess you like

Origin blog.csdn.net/qq_30507287/article/details/105219189
Recommended