JAVA data structures and algorithms: KMP

Summary

KMP algorithm is an improved string matching algorithm, proposed by DEKnuth, JHMorris and VRPratt, so people call Knuth - Morris - Pratt operation (referred KMP algorithm). The core KMP algorithm using information fails to match, to minimize the number of main string pattern string match to achieve the purpose of fast matching. Specific implementation is implemented by a function next (), local matching function itself contains information pattern string. KMP algorithm time complexity of O (m + n).

Brief introduction

Matching string is the basic operation of the string, the most direct way is to traverse the full backtracking search, but its high complexity, and as people seek to match the efficiency and optimize the match, matching its complexity is gradually reduced.
In this process, when the most important part of the KMP, he will reduce the complexity of the original algorithm to O (n + m) wherein n, m is the length of the two strings.

Detailed

The most direct backtracking

Start matching string from the left, if you encounter the same sc j +1 and colleagues, until all been traversed when to return to the p subscript, if there is not the same as the matching process carried back to sc i.

public class Match_kmp {
    public static void main(String[] args) {
        System.out.println(indexOf("aaavbvdd","vbv"));
    }
    private static int indexOf(String s,String p){
        int i = 0;
        int sc = i;
        int j = 0;
        while (sc<s.length()){
            if(s.charAt(sc)==p.charAt(j)){
                j++;
                sc++;
                if(j==p.length()){
                    return i;
                }
            }else
            {
                i++;
                sc=i;
                j=0;
            }
        }
        return -1;
    }
}

As a result, the first match is returned subscript i

  • This method of solving the complexity reached O (n * m), KMP algorithm described below.

KMP match

Diagram

1571297220130

① match left to right, when i = 0, j = 0; if the same j ++, i ++, j if not identical Backtracking

② a different match to the last time backtracking time if violence matching i ++, so j = 0;
but before we can make use of the string has been more of T, backtracking. This is more time will be reduced complexity. j depends on the back next to the matched string T [Array]. Next Solution below said array into .next = {0,0,0,0,2,2}
right shift of Big small = n e x t [ j ] = Right size of the matched string -next [j]
so next [5] = 2, the right size = 5-2, j at this time is also backtracking
j = n e x t [ j 1 ] j = next[j-1]
③ so, then j = 2; compared are not identical, backtracking j = 2-next [j-

KMP

public class MatchKMP {
    public static void main(String[] args) {
        int ne[] =getNext("abcdabd");
        int res = kmp("ssdfgasdbababa","bababa",ne);
        System.out.println(res);
    }
    private static int kmp(String s,String t,int[] next){
        for (int i = 0,j=0;i<s.length();i++){
            while(j>0&&s.charAt(i)!=t.charAt(j)){
                j=next[j-1];
            }if(s.charAt(i)==t.charAt(j)){
                j++;
            }
            if(j==t.length()){
                return i-j+1;
            }
        }
        return 0;
    }
    private static int[] getNext(String t){
        int next[]=new int[t.length()];
        next[0]=0;
        for (int i = 1,j=0; i < t.length(); i++) {
            while (j>0&&t.charAt(j)!=t.charAt(i))
                j=next[j-1];
            if(t.charAt(i)==t.charAt(j))
                j++;
            next[i]=j;
        }
        return next;
    }
}

Solving next array

  • Prefix is ​​a collection of characters except the last one
  • In addition to a first set of suffix character of the
    prefix and suffix of abcabd calculated
    before a character suffixes are 0
    ab & prefix is [a] a suffix [b] 0 is the same
    prefix is abc [a, ab] suffix is [bc, c] the same as the 0
    prefix ABCA is [a, ab, abc] suffix is [BCA, CA, a] 0
    abcab the prefix [a, ab, abc, abca ] suffix is [bcab, cab, ab, b] the maximum the same as ab next [4] = 2
    prefix abcabd is [a, ab, abc, abcb , abcabd]
    suffix [bcabd, cabd, abd, bd , d] maximum total length or ab it next [ 5] = 2;
  private static int[] getNext(String t){
        int next[]=new int[t.length()];
        next[0]=0;
        for (int i = 1,j=0; i < t.length(); i++) {
            while (j>0&&t.charAt(j)!=t.charAt(i))
                j=next[j-1];
            if(t.charAt(i)==t.charAt(j))
                j++;
            next[i]=j;
        }
        return next;
    }

summary

KMP learning algorithm I spent a lot of time, particularly when read has not been understood. Later, after his own drawing, and instantly understand; things still have to do it.

reference

Baidu Encyclopedia KMP algorithm

Drawing myself, and instantly understand; things still have to do it.

reference

Baidu Encyclopedia KMP algorithm

Blog Ruan Yifeng's string matching algorithm KMP

Published 91 original articles · won praise 9 · views 10000 +

Guess you like

Origin blog.csdn.net/WeDon_t/article/details/102611537