Detailed explanation of KMP principle - different codes

Detailed explanation of KMP principle - different codes

Today, the blogger wrote the kmp algorithm again. In fact, this algorithm is difficult to understand. The reason why it is difficult to understand is because this algorithm essentially requires a proof process, but when many people explain this algorithm, they are only rough. Taken, they may not really understand the algorithm.

For the KMP algorithm, let me first talk about the origin of this algorithm:

What is the origin of this algorithm?
Let us have a string such as: abcabcef
and a total string: abcabdfaabcabcacefa

We want to match whether the total string abcabdfaabcabcefefasdfa contains the string abcabcef, and the location

The simple method is definitely violent enumeration, but the time overhead of this method is often very large.

So we want to simplify, so people find:
abcab dfaabcabcefefasdfa
abcabcef

After matching the first d, you can’t continue to match. At this time, some people say, let’s start matching from the beginning, and start matching from the next b. However, people found that if you start matching again, you can actually save some steps. ,
for example, we found that we can actually start matching from the second a and others can be omitted, because bc is obviously different from the first a.
That is to say, we hope that we can find a way to simplify our matching mechanism:
I will list it below. If the matching of abcab fails, what will we do next?
If it fails, we must first consider whether bcab and abca match. Is this the case?
Is this the next step:
a bcab dfaabcabcefefasdfa
abca bcef
found that there is an obvious mismatch, so is our next step like this:
ab cab dfaabcabcacefa
abc abcef
still does not match, what should we do next:
abc ab dfaabcabcacefa
ab cabcef
is, every time No match In fact, will we move forward the matching characters of the total string until it matches.
But we are about to take the test, can we not move it again and again?
So what do we do?

We found that if there is a mismatch after a certain step in the matching belt, we will continue to move the matching position of the total string forward by one character, and then match the string, but where will it be moved until the total string can be matched? Match before the string.
We can maintain such an array, and this array can tell us to move to that position directly:
for example, if we match a string like the following, we can't continue to match until d, we can start matching directly from the fourth position.
abcab dfaabcabcefefasdfa
abcab cef
so what are we actually doing?

Assuming that we start matching from the first character a, and we cannot match, we will move the total string forward by one position.
If we match the second character b and find that it cannot match, we must consider whether a can match the current character.

If the third character is matched and it cannot be matched, what should we do? First of all, we must check whether the first character can match the previous character. If we can’t consider whether a can match the current character.
That is, if we can't continue to match, do we want to find the largest prefix string that can match the previous one.

Therefore, what we need to solve is that if the match fails in a position, we hope to find the largest prefix that can match the string.
Therefore, a new idea for the kmp algorithm is as follows:
1. Input a character string, hoping to get an index array kmpindex that can be jumped (kmpindex is how to get to that character if the match fails)
2. Set the current maximum matching prefix String length jmax=0
3. Traverse the string
3.1 For the first position of the string, we know that if the first character fails to match, we have no place to jump, so kmpindex[0] is a negative value , but it is generally set to -1. In fact, it can be set to other values, but it is a little troublesome to deal with.
3.2 For the second position, we know that if the second character fails to match, should we check whether the current character matches the first character, then if the second character is also equal to the first character, yes Either it can be skipped directly, or it cannot be matched, kmpindex[1]=kmpindex[0]. If not equal then kmpindex[1]=0.
3.3 For the second position greater than or equal to, we know that if the current match cannot be found, we hope to find the next prefix string that can be tried to match.

Here we give the code:

#include<stdio.h>
#include<stdlib.h>

void kmp(char *subs,int * kmpindex,int n) {
    
    
	int i;
	printf("%s ", subs);
	int jmax = 0;
	for (i = 0; i<n; i++) {
    
    
		//printf("%c ", subs[i]);
		
		if (jmax == -1) {
    
    
			
			kmpindex[i] = 0;
			
			jmax++;
			continue;

	  }
		

		if (i == 0) {
    
    
			kmpindex[i] = -1;
			continue;
		}
		else if (i == 1) {
    
    
			if (subs[i] == subs[0]) {
    
    
				kmpindex[i] = -1;
			

			}
			else {
    
    
				kmpindex[i] = 0;

			}
			continue;
		}
		else {
    
    
		//	printf("%d  %d \n", jmax, i);
		
			if (subs[jmax] == subs[i-1]) {
    
    
				jmax++;
				kmpindex[i] = jmax;



			}
			else {
    
    
				//printf("%d  %d \n", jmax, i);

				jmax = kmpindex[jmax];
				i--;
				
				

			}


		}
		
	}


}

int main() {
    
    
	int i = 0;
	char s[100] = "adbcdbdadfcsacsdjhfghasbfkjbfshdshafhsbfbsbfsjabdfhsbajfbnsdkjfnsafsdafdsdgsdfasfsafsadfsa";
	char subs[29] = "abcabceddecaacdeedeed\0";
	int kmpindex[29];
	

	kmp( subs, kmpindex, 18);
	for (i = 0; i < 18; i++) {
    
    
		printf("%d ", kmpindex[i]);
	}
	return 0;
}


In fact, the kmp algorithm is similar to the dynamic programming algorithm.

When we were doing the kmp algorithm, we grasped three core points
1. The obtained array is the subscript of the next time the string is changed to jump if the match fails, that is, who will match the next time, and nextval returns the next time after the match fails. The position where the second match should start.
2. The next match must start from the largest matchable prefix string, so we must maintain a maximum prefix string length.
3. nextval returns the maximum matching length of the string, prefix and suffix before the current position after the current matching failure.

Guess you like

Origin blog.csdn.net/weixin_43327597/article/details/131275288