kmp study notes (template)

kmp study notes

Introduction

Knuth-Morris-Pratt string search algorithm (referred to as KMP algorithm
for short ) is used for pattern matching of strings. Compared with the naive algorithm, its time complexity is O (n + m), where n is the length of the text string and m is the length of the pattern string.

principle

kmp reduces the repetitive comparisons in the naive algorithm by calculating the next array of pattern strings as an aid.

The next array represents the longest identical true suffix of the pattern string. For example: in the pattern string s = "ABABCD", its next array is next = [-1 0 0 1 2 0 0]. Where next [0] is -1, it is to ensure that it will not fall into an endless loop when the complete mismatch.

Suppose there is a text string t = "ABADDABABD". Start with position 0 as the starting point, and match t [0: 2] = "ABA" are equal; but match t [3] = 'D'! = S [3] = 'C' Position 1 restarts. Because it is known by next [3] = 1, the longest and true suffix before position 3 of s string is 1, that is s [0: 0] == s [2: 2]. Therefore, it can be directly matched from the position 3 of the text string to the position 1 of the pattern string, that is, the next [3] position.

How to construct next array? Assuming that the next array including position i has been constructed, set j = next [i]. If s [i + 1] == s [j + 1], then obviously next [i + 1] = j + 1; otherwise, since s [0: next [j]] has a true prefix and s [0: i -1] True suffixes are equal, so let j = next [j], if s [i + 1] == s [j + 1], then next [i] = j, otherwise repeat the above process until j = next [ 0] = -1 means that the same true suffix does not exist, then next [i] = 0.

achieve

The board is very easy to write

//求next
int next[N];
void getnext(char *t) {
    int i = 0, j = -1;
    next[i] = j;
    while(s[i]) {
        if(j == -1 || s[i]  == s[j]) {
            i++;
            j++;
            next[i] = j;
        } else {
            j = next[j];
        }
    }
}
void search(char *s, char *t) {
    int i = 0;
    int j = 0;
    while(s[i]) {
        if(j == -1 || s[i] == t[j]) {
            i++;
            j++;
            if(!s[j]) { //找到了
                //处理
                j = next[j]; //继续找。如果只找一次可以直接break
            }
        } else {
            j = next[j];
        }
    }
}

application

In addition to matching pattern strings, you can also find loop sections, find the longest and the same suffix, and so on. Specific topic.

Extended kmp

Let the text string be s (length n) and the pattern string t (length m). Expanding kmp is to seek the extend array. Where extent [i] represents the longest identical prefix of t and s [i: n-1].

Because when there exists extend [i] == m, it functions as the kmp algorithm, so it is also called extended kmp.

Seeking the extend array requires the next array as an auxiliary. Here next [i] represents the longest identical prefix of t and t [i: m-1]. Assuming that the next array has been obtained, the following finds the extend array.

Let p be the rightmost boundary that can be reached during the matching process. Let a be the position corresponding to p, ie extend [a] = p-a. It is assumed that the extend array before i-1 has been obtained. Then first, the position of i must be located within the interval [a, p]. Then the i position corresponds to the position on the next i-a. There are two situations:

  1. If i + next [i-a] is less than p, then extend [i] = next [i-a];
  2. If i + next [i-a] is greater than or equal to p, then just expand on the basis of t [p-i]. Note that it is p-i instead of p-a, because it is based on the extension of the prefix. Then update p and a.

Seeking the next array is actually the process of asking yourself for the extend array.

The board is easy to write

int extend[N], next[N];
void getnext(char *t) {
    int a = 0, p = 0;
    int len = strlen(t);
    nt[a] = len;
    for(int i = 1; i < len; i++) {
        if(i >= p || i + nt[i - a] >= p) { 
            if(i >= p) p = i;
            while(p < len && t[p] == t[p - i]) p++;
            nt[i] = p - i;
            a = i;
        } else {
            nt[i] = nt[i - a];
        }
    }
}

void search(char *s, char *t) {
    int a = 0, p = 0;
    int n = strlen(s);
    int m = strlen(t);
    for(int i = 0; i < n; i++) {
        if(i >= p || i + nt[i - a] > p) { // i >= p 的作用:举个典型例子,S 和 T 无一字符相同
            if(i >= p) p = i;
            while(p < n && p - i < m && s[p] == t[p - i]) p++;
            extend[i] = p - i;
            a = i;
        } else {
            extend[i] = next[i - a];
        }
    }
}

Related topics

kuangbin kmp 专题

扩展kmp:
Clairewd’s message
Period II
Count the string

reference

https://oi-wiki.org/string/kmp/

Guess you like

Origin www.cnblogs.com/limil/p/12695071.html