KMP algorithm is the most traditional of God left the most detailed ideas JAVA

This article is a summary of the study, there may be errors, to welcome you all pointed out. Repost.

Title: Given a string and a string str1 str2, find the first occurrence of the string str2 (starting from 0) in the string in str1. If not, it returns -1.

str1 = aaaaabcabc

str2 = abcabcaa

Some time ago incidental contact to the left of God to explain the video algorithm, about three days, and repeatedly looked at three times the KMP algorithm. Finally we had some of their own understanding and experience. Traditional KMP string matching algorithm to do, in fact, is next to an array of violence optimization algorithm. Another understanding is to be understood as dynamic programming algorithm KMP, not described in detail here.

I am speaking here divided into three parts.

  1. Violence Solution
  2. KMP algorithm
  3. How to get the next array

Violence Solution

Violence algorithm looks very simple, the actual coding and need to deal with some of the details, it is recommended to write about. Here str1 to a pointer i, j pointer to a str2. A first initial position i is 0, the initial position is a last str1.length - 1.

  1. str1 [i] and str2 [j] is equal to the: i and j are moved back one.

  2. str1 [i] and str2 [j] unequal, j return 0, i from the next comparison starts an initial position.

If j length to this position can be described from the 0th bit to the str2.length - 1 bits are already equal, the case return i - j, a first position index is present in str2 in str1.

If i finally reached an initial position, that is str1.length - 1, at this time there is no match, this explanation did not always match the way to str2. This time returns -1.

Code:

public int strStr(String str1, String str2) {
    int length1 = str1.length();
    int length2 = str2.length();
    if(length2 == 0) return 0;
    if(length1 < length2) return -1;
    int i = 0;
    while(i < length1){
        int j = 0;
        while(i < length1 && j < length2
        && str1.charAt(i) == str2.charAt(j)){
            i++;
            j++;
        }
        if(j == length2){
            return i-j;
        }
        i = i - j + 1;
    }
    return -1;
}

It is recommended that hands to write about.

KMP algorithm

To not being here is to discuss how the next. You need to know some of the information is stored in str2. Prefix string is equal to the front of his str2 all the characters formed equal to the maximum suffix. Here is wound, for example to illustrate:

index equal to 6 when the string was a b c a b c.

When taken before a suffix, the prefix is ​​a, the suffix is ​​c, ranging case. 1 can not take next.

2 is taken when the prefix and suffix, ab & prefix, suffix bc, next 2 can not take.

3 taken when the prefix and suffix, prefix ABC, ABC suffix, this time is equal to, next can take.

4 taken when the prefix and suffix, ABCA prefix, suffix cabc, next 4 can not be taken.

5 taken when the prefix and suffix, prefix abcab, suffix bcabc, next 5 can not take.

6 can not take the prefix and suffix. Because the former can not be a suffix string itself.

index:0 1 2 3 4 5 6 7 8 9

str1 = a a a a a b c a b c

str2 = a b c a b c a a

next:-1 0 0 0 1 2 3 1

Next is the process KMP algorithm. Solution in accordance with violence, we still have two pointers i and j.

  1. When the two elements are equal the: i and j a backward movement.
  2. Unequal two elements: j = next [j], if at this time Next [j] is equal to -1, indicating that the pointer j has been moved to the front.

We carefully understand this is not equal in both cases, there is difficulty.

next[j] != -1In this case, j pointer to jump directly str2[next[j]]to the. Why Doing so? for example.

index 0 1 2 3 4 5 6 7

str1 = a b c f a b c x

str2 = a b c f a b c y

next=-1 0 0 0 0 1 2 3

In the index was 6, when, i = j = 7, this time the two elements are not equal, we will jump to j str2[next[j]], that is j = 3. Substring and this time in front of the front of the substring str1 str2 are equal, they share a common next array. j jump to 3, this represents: y / x in front of this substring his first three and the last three are equal. So, the first three and the last three digits substring x of y our sub-string is not at this time do not need to compare, because the 3 acquiesced when they are equal. Then the top three (index of 012) will not need to compare the direct comparison of the fourth (index 3) bits. Here is the next core array. In the video left of God which speaks more intuitive.

str1 = a b c f a b c x

str2 = * * * * a b c f a b c y

Comparison of x is equal to f.

next[j] == -1In this case, j has come to the forefront, and no way to continue to move forward, so i can only move backward.

Code:
public static int getIndexOf(char str1[], char str2[]) { if(str1.length == 0 || str1.length < str2.length) { return -1; } if(str2.length == 0) { return 0; } int i = 0; int j = 0; int next[] = getNextArray(str2); //对应三种情况 while( i < str1.length && j < str2.length) { if(str1[i] == str2[j]) { i++; //两个元素相等 j++; }else if(next[j] == -1) { i++; //next[j] == -1 }else { j = next[j];//next[j] != -1 } } return (j == str2.length) ? i-j : -1; }
## the Next Array

str2 = a b c f a b c y

next=-1 0 * * * * * *

The first default is -1. Because the first element has no substring.

The second bit defaults to 0. Because the child element of the second string is only one element, and that he is equal to the maximum number of prefixes and suffixes can only be 0.

Next is the third, the third substring is a b, this is difficult. How to find out its next value.j = 3

With j - 1 of the next value, cn = next[j-1]the corresponding element str2, and str2[j-1]comparison. Where cn = 0, that is, the value of the No. 1 element and the comparison element No. 0. Comparison of two cases out there must be equal, not equal. In the unequal time but also two cases.

index 0 1 2 3 4 5 6 7

str2 = a b c f a b c y

next=-1 0 0 0 0 1 * *

To see more intuitive, I another example. j = 6.

cn = next[j-1] = 1, str2[cn] = b

str2[j-1] = b

This time is equal, therefore next[6] = ++cn = 2. why?

What is this cn represent? cn is the representative j-1bit the next value, that value represents j-1the prefix and suffix bits maximum. The maximum value is 1, indicating that he was the first and last are equal. Then compare his second ( str2[cn]) and the last bit of the next ( str2[j-1]) are equal. Equal,next[6] = ++cn = 2 . Unequal how to do? Divided into two cases.

  1. cn > 0,cn = next[cn]
  2. cn<= 0,next[j] = 0

Here is why, is to continue in the case of sub-sub-strings, and to find str[j-1]an equal cn, if not find it? How to do that next[j] = 0.

Code:

public static int[] getNextArray(char []str) {
    if(str.length == 1) {
        return new int [] {-1};
    }
    int next[] = new int [str.length];
    next[0] = -1;
    next[1] = 0;
    int i = 2;
    int cn = 0;
    while( i < str.length) {
        if(str[i-1] == str[cn]) {
            next[i++] = ++cn;
        }else if(cn > 0) {
            cn = next[cn];
        }else {
            next[i++] = 0;
        }
    }
    return next;
}

Summary of what

  1. Violent solution, and more to write, write it twice skilled.
  2. KMP specific implementation, there are three cases. Equal to the element, the element and the next range is not equal to -1, and the next element unequal equal to -1.
  3. Solving method next, but also three cases. Cn and j-1 is equal to the corresponding elements, and the elements corresponding ranging cn> 0, and the corresponding elements of unequal cn <= 0.

No. stul public update synchronization algorithm learning process, welcome attention.

Guess you like

Origin www.cnblogs.com/stul/p/11815913.html