KMP string matching algorithm (a)

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/shipsail/article/details/102617736

KMP

A string to be matched

0 1 2 3 4 5 6 7 8 9
a b a a c a b a b c

Target string

0 1 2 3 4
a b a b c

Prefix Table

string P = “a b a b c”

Prefix Table
a
a b
a b a
a b a b
a b a b c

Prefix Table Role: in matching strings, met halfway to matching errors requiring backtracking. The matching algorithm to ababx at (x! = C), an error occurs. It is determined to go before the string x abab optimum backtrack point.

Max length of the same Prefix and Suffix

Prefix Table Prefix Suffix Max length
a - - 0
a b a b 0
a b a a , a b a , b a 1
a b a b a , a b , a b a b , a b , b a b 2
a b a b c a , a b , a b a,a b a b c , b c , a b c , b a b c 0

Next Array

In the matching process, if there ababc the match has been successful, so you can ignore this situation. And when a problem occurs in matching ababx, looking for the same prefix and suffix abab the maximum , so that a downward movement of the array, the first element value is set to -1

Index -1 0 1 2 3 4
character - a b a b c
Next Value - -1 0 0 1 2

index = -1 is the imaginary position, does not exist.

Why jump directly next to it?

Next table can be understood as: the same as before the next index element longest character string preceding the prefix element suffix. So directly out next, has been matched by a suffix, prefix == suffix, prefix it need not match again.

When
abaacababc
ABAB
B to find the wrong maximum aba same prefix and suffix, the length of an index starting from 0, 1 is the next element of the prefix. Ab & because a in the suffix a have been matched by, and prefixes and suffixes are the same, the need to match a prefix again, after the element to align with the point of error prefix.
That is:
A B A acababc
----- A BAB
Here Insert Picture Description

to sum up

  1. Calculated next (position error from the matching process, the same suffix given maximum front position of the prefix, the prefix is ​​aligned with a location of the error element, so as to achieve optimization of the back)
  2. Iterating and back
  3. Such algorithms difficult to explain in words, draw a diagram easier to understand wailing ~

Code

Liao's too late now, then tomorrow liver.

Guess you like

Origin blog.csdn.net/shipsail/article/details/102617736