Detailed KMP algorithm (super detail)

  Hello everyone, I'm A guide , an undergraduate in college. Recent data structures self-study, the get tough battle, hard work pays off, through continuous learning, able to read something. In the service of the students, so that we detours attitude, I decided to share common experiences and explain some of the algorithms and code of some data structure learning, for your reference exchange.
  This series of blog I am going to insist on writing, can be considered a little record of their university studies. I'll try to update one week an article (after all, I had to learn (playing games) <Manual funny>). Welcome all students to exchange learning. Finally, I am limited, if wrong, please correct me.

    Thank big brother who inspired me, and I refer to reprint some of his pictures @ blog links Gangster

Begins

Rather long article, but definitely not explain part of the nonsense (except part of what I just explained bb, you can skip the (^ _ ^), please see the last patient, repeated reading, I believe you will harvest !
First we affix a mapping, outline the structure and explain the points under this article
flow chart

A. Why kmp algorithm

kmp algorithm has superiority

  If brute-force (BF algorithm), after each string match fails, the pointer will go back to the beginning where, again matching, the algorithm in the worst case to be M * (N-M + 1 ) comparisons, the time complexity is O (N * M) , which is a very slow manner. And by a clever algorithm kmp Next array (it will be mentioned later herein) reduces unnecessary backtracking string comparison process, thereby saving time.

Two: What is the kmp algorithm

(Not as boring theoretical knowledge discussed here, please self Baidu What is kmp algorithm)
we first posted a moving map, the first point kmp algorithm described advantages, we can just understand the operation of the process under kmp algorithm ( the following will make some in detail)

Here Insert Picture Description
Picture Link (invasion deleted)
  from the above gif image we can directly see the kmp algorithm after the match fails, will not go back to the first character of the string, but returned to a position for the next substring Compare.

About Next Array

We first on my ancestral kmp algorithm code, we can start to understand the algorithm of a general (not read does not matter, will be 11 later analysis) after watching him hand can draw on paper what steps the code runs deeper understanding

int KMP(int start,char S[],char T[])/*这里我们默认数组第一个位置 
                                    也就是S[0]和T[0]存的是数组的长度*/ 
{
	int i=start,j=0;
	int next[255];
	get_next(T,next);//get_next数组在下文中会给出代码和介绍
	while(S[i]!='\0'&&T[j]!='\0')
	{
		if(j==0||S[i]==T[j])
		{
			i++;         //继续对下一个字符比较 
			j++;         //模式串向右滑动 
		}
		else 
			j=next[j];
	}
	if(T[j]=='\0') 
		return (i-T[0]);    //匹配成功返回下标 
	else 
		return -1;                 //匹配失败返回-1 

}
          //没找到

  We can learn from the introduction of the first point, the essence of kmp algorithm, is one of his Next array , the array tag in the matching of failure , the pointer back position .

So the question is

1. What is the next array?
  First let us explain a noun: the longest common prefixes and suffixes. Suppose there is a string P = "p0p1p2 ... pj-1pj ". If there p0p1 ... pk-1pk = pj- kpj-k + 1 ... pj-1pj, we say that there is a maximum length of the common prefix and suffix in the P k + 1 stream.
So here there has been a problem
2. How to find the prefix and suffix? ,

  When looking for the prefix, looking for except the last character in all sub-string.

  When looking for a suffix to find except for the first character of all sub-string.
There may also be some students did not understand what it meant, we give on a diagram to explain

There are substring P = abaabca, the prefix and suffix shown in FIG.
Here Insert Picture DescriptionHere we give evaluation rules Next array
Here Insert Picture Description
(Source: "Data structures lying")
then we can write p Next string corresponding array
Here Insert Picture Description
may have students see the next array or look ignorant force, then we have to manually push
1. J =. 1 (J p is a pointer to a string of) this time in accordance with the above rules next [1] = 0

2. J = 2 , P [J] = B, find public string forward, this time in front of only one element a, belonging to the third case. At this time Next [2] = 1

3. J =. 3 , to find the string forward, there is no common substring, belonging to the third case, this time Next [3] = 1

4. J =. 4 , to find the string forward, find common string a B a where a is the common sub-string, a length of 1, so Next [4] = 1 + 1 ( if any substring, and that the value of the next substring of length 1), this time Next [4] = 2

5. The J =. 5 , to find the string forward, find common string a BA a wherein a public sub-string, a length of 1, this time Next [5] = 2

=. 6 6.j , to find the string forward, find common string a BA a B which is a common sub-string, a length of 1, so Next [4] = 1 + 1 ( if any substring that the next substring length + value), this time Next [6] = 3

=. 7 7.j , to find the string forward, found no common substrings, belonging to the third case, this time Next [7] = 1
  The following examples are given for several arrays Next practice all
Here Insert Picture DescriptionHere Insert Picture Description
Here Insert Picture Description

By practicing the above we can conclude that there must Next string longer than 2 [1] = 0, Next [ 2] = 1, so, our theoretical derivation Next array to end here

Here we discuss how to use an array of machine language requirements Next

III. Next array of code implementation

void GetNext(char T[],int *next)
{
	int i=1,j=0;
	next[1]=0;
	while(T[i]!='\0')
	{
		if(j==0||T[i]==T[j])
		{
			++j;
			++k;
			next[i]=j;
		}
		else 
			j=next[j];
	}
}

I suggest that the students look for a string on rough paper step by step "run" code once.
  Runs out of code, we think that should be the most amazing j = next [j] of this step back, could not help but doubt: why the match was carried forward j = Next [j] backtracking after the failure to find a shorter length of the same prefixes and suffixes do?
Let us solve this problem
  let's look at this picture
Here Insert Picture Description
  in pk! when = pj, k = next [k ], with pnext [k] continues to talk pj match. Why not other values and pj match it? We can see, before pj, there is a length k matched string; there is a blue matched string before pk, explained some length next [k] is the largest common prefix and suffix (blue characters before pk color period). If pk! = Pj, explained p0p1 ... pk-1pk! = Pj -kpj-k + 1 ... pj-1pj, then we can only find a shorter maximum common prefix and suffix, because at this time in front of the pk and pnext [k] blue string is fully matched, if pnext [k] and pj can match, then we find we need the string (that is, p0 to pnext [k] period length). If you still do not match, the next step pnext [next [k] ...] continue to talk pj match until you find the length of the shorter public before the suffix.
  Here, we content Next on the array is over.

Four: kmp algorithm optimization

We look at a map Here Insert Picture Description
in which the next following array string corresponding to FIG aaaax
Here Insert Picture Description
  we fancy FIG matching process found 2,3,4,5 fact that steps can be omitted. This is less than the output of Next, comparing some of the string, the comparison would be no need to duplicate the appearance. So, we introduce Next_val array.

Derivation Next_val array

  Next inference rule array is given below: When Next [J] corresponding to the character = Next [Next [j]] when the character corresponding to the Next [j] = Next [Next [j]], recursive sequence, until Next [j] corresponding to the character! = Next [Next [j]] when the character corresponding to the end.
Next we use an array of the character string array derived Next_val:

1.Next[1]=0 Next_val[1]=0.

2.Next[2]=1,Next[1]=a,Next[2]=a,。此时Next_val[2]=Next[1]=0

3.Next[3]=2,Next[2]=a,Next[3]=a此时Next[3]=Next[2]=1然后,Next[1]=a,Next[2]=a,所以Next_val[3]=Next[1]=0

4. the Next [. 4] =. 3 , Similarly available Next_val [4] = 0

5. The the Next [. 5] =. 4 Similarly available Next_val [5] = 0

6.Next [. 6] =. 5 , the Next [. 5] = A, the Next [. 6] = X, A! = X, so Next_val [6] = 5
we obtain an array next_val
Here Insert Picture Description

  This derivation text may look a little difficult to understand, to derive a lot of their own hands, we still give a few examples here for your reference
Here Insert Picture DescriptionHere Insert Picture Description
Finally, we give the code (and the manual process we derive vary widely) Next_val derived, need we all find examples of manual paper processes such as go over the code above first Photo of Next_val we can derive the code again.

void GetNext_val(char T[],int *nextval)
{
	int i=1,j=0;
	nextval[1]=0;
	while(T[i]!='\0')
	{
		if(j==0||T[i]==T[j])
		{
			++j;
			++k;
			if(T[i]!=T[j])
				nextval[i]=j;
			else
				nextval[i]=nextval[j];
		}
		else 
			j=nextval[j];
	}
}

Five: Testing Skills

  Here Gangster posted a blog link, his approach can quickly find next nexttval arrays and array of KMP and quickly calculate next nextval

Six: the end

  After four nights of effort, finally completed my first real sense of a rich form of blog (former are more boring), here to thank the above two big brothers gave me the inspiration and technical support, and thanked the "Westward data structure "to inspire some of my code specifications. Finally, if you think this blog to give you some inspiration (if any classmate, welcome to my dorm feeding snacks) , you can point to praise + focus on support (shy). I will then follow some tutorials on data structures. Finally, thank you can read this article. Thank you!

Released two original articles · won praise 2 · views 29

Guess you like

Origin blog.csdn.net/qq_45806146/article/details/104911112
Recommended