Algorithm case analysis-string pattern matching algorithm

Algorithm case analysis-string pattern matching algorithm

I saw a high-quality blog on the Internet to share with you. Reprinted from the well-known blogger "Little Gray Ape" , if you like, please support the original.

Original link


Today I will share with you a pattern matching algorithm about string comparison. Among the related operations on strings in the data structure, the positioning operation on substrings is usually called string pattern matching, and it is also the most important in various string processing. One of the important operations, and the substring is also called the pattern string. There are mainly two commonly used matching algorithms for the main string and the pattern string: the naive pattern matching algorithm and the KMP algorithm (improved pattern matching algorithm). Analyze these two algorithms separately.

1. Naive pattern matching algorithm

The naive pattern matching algorithm is also known as the Brute-Fuchs algorithm. Its basic idea is to compare the first character of the main string with the first character of the pattern string. Characters are compared, otherwise the second character of the main string is re-compared with the first character of the pattern string, until each character in the pattern string matches the sequence of consecutive characters in the main string. This is called If the match is successful, if the content that matches the pattern string cannot be found in the main string, it is called a match failure.

Next, give an example of storing strings in character arrays to implement a simple pattern matching algorithm.

//传入主串s和模式串t,同时设定从主串中开始匹配的位置pos
int index(char s[],char t[],int pos) {
	int i,j,slen,tlen;
	i = pos;
	j = 0;
	slen = s.length;	//获取到主串的长度
	tlen = t.length;	//获取到模式串的长度
	while (i<slen && j<tlen) {
		if (s[i] == t[j]) {
			i++;
			j++;
		}
		else {
			i = i-j+1;
			j=0;
		}		
	}
	if (j>=tlen) {
		return i-tlen;
	}
	return 1;
}

2. KMP algorithm (improved pattern matching algorithm)

The KMP algorithm is an improvement of the previous algorithm. Compared with the naive pattern matching algorithm, the KMP algorithm does not need to go back whenever the compared characters are not equal in the matching process of the main string and the pattern string. The character position pointer of the main string, instead of using the "partial matching" result that has been obtained, "slide" the pattern string to the right as far as possible before continuing to compare.

Suppose the pattern string is "P0...P(m-1)", the idea of ​​the KMP matching algorithm is: when the character Pj in the pattern string is not equal to the corresponding character Si in the main string, because the first j characters ("P0 …P(j-1)”) has been successfully matched, so if “P0…P(k-1)” in the pattern string is the same as “P(jk)…P(j-1)”, then P0 can be compared with Si, so that i does not need to roll back.

In the KMP algorithm, the sliding of the substring can be realized according to the next function value of the pattern string. If next[j]=k, then next[j] means when the Pj in the pattern string is not equal to the corresponding character in the main string , Compare the Pnext[j] of the pattern string with the corresponding characters of the main string,

The definition of the next function is as follows:

Insert picture description here

The following is the procedure for finding the next function of the pattern string:

//求模式串p的next函数值,并存入数组next中
void Get_next(char *p,int next[])
{
	int i,j,slen;
	slen = strlen(p);	//获取到模式串的长度
	i=0;
	while (i<slen) {
		if (j==-1||p[i]==p[j]) {
			++i;
			++j;
			next[i] = j;
		} else {
			j = next[j];
		}
	}
}

After obtaining the next function,

In the KMP pattern matching algorithm, if the subscript of the first character of the pattern string is 0, the KMP algorithm is as follows:

/*利用模式串p的next函数,求p在主串s中从第pos个字符开始的位置*/
/*若匹配成功,返回模式串在主串的位置下标,否则返回-1 */
int Index_KMP(char *s,char *p,int pos,int next[])
{
	int i,j,slen,plen;
	i=pos-1;
	j=-1;
	slen = strlen(s);	//求主串的长度
	plen = strlen(p);	//求模式串的长度
	while (i<slen && j<plen) {
		if (j==-1||s[i]==p[j]) {
			++i;
			++j;
		} else {
			j=next[j];
		}
	}
	if (j>=plen) {
		return i-plen;
	}
	else {
		return -1
	}
		
}
I will share the string pattern matching algorithm here. If there are any deficiencies, I hope everyone can correct me.

Insert picture description here

Like please support original

Original link

Guess you like

Origin blog.csdn.net/weixin_45820444/article/details/108544901