BF algorithm from the algorithm to explain kmp

Text index
a, KMP introduce
two examples: substring matching female string
1.BF algorithm solution
Third, to achieve kmp algorithm
(1) Why has the BF algorithm KMP algorithm should also have it?
The basic idea of the algorithm (2) of the invention
(3) embodied


A, KMP Introduction

KMP algorithm is an improved string matching algorithms (algorithms have improved from the BF, BF algorithm is the search for a matching profits, rather it is a backtracking process KMP BF algorithm is improved, thereby significantly reducing the time complexity) , it works well with the parent substring matching string

Two examples: substring matching letter string
letter string: abaacababcac
substring: ababc

Requirements to the parent substring matching string, which solve in a matched position.

Solution 1.BF algorithm

Keywords: one by matching search violence

The first step in matching

Letter string: abaacababcac
substring: ababc
matching: a fourth position matching fails

Step two matches

Letter string: abaacababcac
substring: ababc
matching: a first matching position failure

The third step match

Letter string: abaacababcac
substring: ababc
matching: second position matching fails

The fourth position

Letter string: abaacababcac
substring: ababc
matching: second position matching fails

Fifth position

Letter string: abaacababcac
substring: ababc
matching: a first position matching fails

The sixth position

Letter string: abaacababcac
substring: ababc
matching: matches, return position 6

These are BF algorithm matching process, move one by one, each position again try
(Algorithm illustrated, note back position)

back to the start position i + 1

j back to the position of a substring 0

Reasoning: j is substantially equal to the length of the left position i, then to return to the start position plus 1 can be expressed as i - j + 1

int BFstring(string MotherStr, string SonStr){
    int i = 0, j = 0;
    for(;(i != MotherStr.size()) && (j != SonStr.size());){
        if(MotherStr[i] == SonStr[j]){
            i++, j++;
        }
        else{
            i = i - j + 1;
            j = 0;
        }
        if(j == SonStr.size()){
            return i - j + 1;
        }

    }
    return 0;
}

int BFchar(char MotherStr[],char SonStr[]){
    int i, j;
	i = 0;//主串指针
	j = 0;//子串指针
	while (MotherStr[i] != '\0' && SonStr[j]!='\0')   //两个都没到尾部
	{
		if (MotherStr[i] == SonStr[j])   //如果相等两个指针都递增
		{
			i++;
			j++;
		}
		else
		{
			i = i - j + 1;   //回溯
			j = 0;
		}
	}
	if (SonStr[j] == '\0')
	{
		//如果子串指针指向了'\0',表示匹配完成
		return i - strlen(SonStr) + 1;
	}
	return -1;

}

Third, to achieve kmp algorithm
Why (1) has a BF algorithm KMP algorithm should also have it?

You can look at the following example

a a a a a a a a a a a a a a a a a a a b
a a a a b

If you are using BF match, every time in the last position of this trip only to find the match fails, so each match is the maximum time complexity, which is the worst-case BF algorithm.

Algorithm inventors: knuth-morris-pratt

The basic idea of ​​the algorithm (2) of the invention

When there is a mismatch, we have been able to know the contents of a portion of text (because they are already in the match before the match and the failure mode). We can use this information to avoid before all these known character pointers to fall back.

(3) embodied

Or use of this example

Letter string: abaacababcac
substring: ababc

prefix table

Find the longest prefix and suffix longest and longest identical prefix and suffix, then we can calculate the longest common prefix and suffix following the substring (the substring itself can not be oh)

A -1 (longest first common suffix is ​​defined before the special value of -1 and the string itself not so much)

a b 0

a b a 1

a b a b 2

a b a b c 0

Get the longest common prefixes and suffixes table

Substring: ababc
-1. 1 0 2 0

This time we are in the BF i and j, i and the returned value need not be i - j + 1 and the next array of values ​​can be returned directly back to reduce the distance

(The shorter the distance back, the more reduced time)

#include <bits/stdc++.h>
#define REP(i, a, b) for(int i = a; i < b; i++)
#define REP_(i, a, b) for(int i = a; i <= b; i++)
#define sl(n) scanf("%lld", &n);
#define si(n) scanf("%d", &n);
#define RepAll(a) for(auto x: a)
#define cout(ans) cout << ans << endl;
typedef long long ll;

void prefix_table(char pattern[],int prefix[],int n){
    prefix[0] = 0;
    int len    = 0;
    int i = 1;
    while(i < n){
        if(pattern[i] == pattern[len] ) {
            len++;
            prefix[i] = len;
            i++;
        }
        else {
                if(len > 0)
                    len = prefix[len - 1];
                else
                    prefix[i] = len, i++;
        }
    }
}
void move_prefix_table(int prefix[], int n){
    for(int i = n-1; i > 0; i--){
        prefix[i] = prefix[i - 1];

    }
    prefix[0] = -1;
}
void kmp_search(char MotherStr[], char SonStr[]){
    int n = strlen(SonStr);
    int m = strlen(MotherStr);
    int *prefix = new int [n];
    prefix_table(SonStr, prefix, n);
    move_prefix_table(prefix, n);
    //MotherStr[i] len(MotherStr) = m;
    //SonStr[j]    len(SonStr0    = n;
    int i = 0, j = 0;
    while(i < m){
        if (j == n - 1&& MotherStr[i] == SonStr[j]){
            printf("Found pattern %d\n", i - j);
            j = prefix[j];
        }
        if (MotherStr[i] == SonStr[j]){
            i++, j++;
        }
        else {
            j = prefix[j];//回溯
            if(j == -1){
                //特殊点
                i++, j++;

            }
        }
    }

}
int main(){
    char pattern[] = "ababcabaa";
    int prefix[9];
    int n = 9;
    prefix_table(pattern, prefix, n);
    move_prefix_table(prefix, n);
    cout << "prefix table:" << '\n';
    for(int i = 0; i < n; i++){
        //看一下prefixtable是否正确
        cout << prefix[i] << '\n';

    }
    char text[] = "abababcabaabababab";
    kmp_search(text, pattern);
}

Published 20 original articles · won praise 3 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_43382350/article/details/102168930