Thinking about data structure learning: KMP algorithm to solve next array

1. Algorithm Analysis

This is the next array algorithm solved by the KMP algorithm on page 110 of the 2022 Wangdaoshu data structure. The principle is not explained in detail in the book, and the senior salted fish also said that this is one of the more obscure algorithms in the entire course, which aroused the author's thinking .

void get_next(String T,int next[]){
    int i=1, j=0;
    next[1]=0;
    while(i<T.length){
        if(j==0||ch[i]==T.ch[j]){
            ++i; ++j;
            next[i]=j;  //若pi=pj,则next[j+1]=next[j]+1
        }
        else
            j=next[j];  //否则令j=next[j],循环继续
    }
}

Solution principle:

1. First check that the maximum matching length of the Y character before the character Z to be requested is m characters, corresponding to the initial  T[1]-T[m] , at this time if  X (T[m+1]) ==Y , then it can be judged that  the maximum matching character before  Z is m+1  , so next is  m+2 ;      

2. If  X! =Y  , you need to find a string ① that matches string ②, and since there are  n maximum matching strings before  X,  string ②==string ③, and string ①==string ③, find a string that matches string ① ② string, if  ξ==Y at this time , then  next[Z]=n+1 of  Z ;  

Corresponding to the while loop in the above code

    while(i<T.length){
        if(j==0||ch[i]==T.ch[j]){
            ++i; ++j;
            next[i]=j;  //若pi=pj,则next[j+1]=next[j]+1
        }
        else
            j=next[j]  //否则令j=next[j],循环继续
    }

Two, KMP template

The i=0  and  j=-1  in the template  are because the character string in the program starts from subscript 0, and Wang Dao explains that the first character subscript is 1 by default.

void get_next(string T, int next[]) { 
    memset(next, 0, sizeof(next)); //多组输入时,每次需要初始化next[]数组
    int i = 0, j = -1;
    next[0] = -1;
    while (i < T.length()){
        if (j == -1 || T[i] == T[j]) {
            ++i; ++j;
            next[i] = j;  //若pi=pj,则next[j+1]=next[j]+1
        }
        else
            j = next[j];  //否则令j=next[j],循环继续
    }
}

int Index_KMP(string S, string T, int next[]) //S是字符串,T是模式串
{
	int i = 0, j = 0;
	while (i < S.length() && j < T.length()){
		if (j == -1 || S[i] == T[j]){
			++i; ++j;
		} 
		else
			j = next[j];

		if (j == T.length()) //匹配成功
			return i - T.length();
		else //匹配失败
			return 0;
	}
}

It should be noted that the Index_KMP() function is a function for obtaining the matching index of the pattern string. If the title requires the number of matching substrings in the string, j < T.length() in the while loop needs to be deleted , the content of which should be adjusted according to the actual situation.

In addition, the get_next() function can be optimized into the get_nextval() function, which can speed up the matching of strings. For details of the algorithm, see 4.2.3 Further optimization of the KMP algorithm on page P111 of the 2022 Wangdao textbook

void get_nextval(string T, int nextval[]) {
    memset(next, 0, sizeof(next)); //多组输入时,每次需要初始化next[]数组
	int i = 0, j = -1;
	nextval[0] = -1;
	while (i < T.length()) {
		if (j == -1 || T[i] == T[j]) {
			++i; ++j;
			if (T[i] != T[j]) nextval[i] = j;
			else nextval[i] = nextval[j];
		}
		else
			j = nextval[j];
	}
}

Guess you like

Origin blog.csdn.net/qq_21891843/article/details/123871649