Java Implementation of KMP Algorithm

foreword

The KMP algorithm is still quite confusing after reading several blog posts, mainly because this algorithm is difficult to express clearly. Ruan Yifeng's blog also has an article written by a CSDN boss . These two articles feel that they are written relatively clearly. The hyperlink has been added. You can go and see it. I don't need to write it again.
I feel that the most important thing is to understand the concepts clearly, and then look at the code, it will be simpler. Let’s take a look at some of the more important concepts of KMP:

  1. What is the prefix and suffix?
  2. How to find partial matching value based on prefix and suffix?
  3. How to find next to move the array after finding partial matching values?

    If you understand these three questions, you should be able to understand KMP.

Take is cheap, show me the code!

The reason why it was done with java at that time was because I thought that I might use some more advanced string functions, but in the end I didn't use anything, I just asked for the string length and substring, so you can also use C/C++ to implement it, it's not very complicated .

coding

Take String str = "BBC ABCDAB ABCDABCDABDE"and for String pattern = "ABCDABD"example .

count moving steps

First calculate the partial matching value, split the pattern string, and calculate the partial matching value of 7 substrings from the beginning . Partial match values ​​refer to the length of the longest string with the same prefix and suffix . Definitions such as suffixes and suffixes are suggested to first look at the previous two blogs, and will not go into details here.

Why split 7 substrings from scratch?
The reason is that the 7 substrings correspond to the number of positions where there is a mismatch , and then we can move according to the matching value of the pattern before and after the suffix, which is the so-called use of the nature of the pattern itself on the Internet.
It's like ABCDABCDA, the pattern is ABCDABD, I must start with A to match, then I can actually move 4 digits to the next A to match after the first A.

Briefly explain:
1. The first substring starting from the beginning A, without the same prefix and suffix, is assigned the value 0
2. The second one AB, without the same prefix and suffix, is assigned the value 0
3. The third one ABC, without the same prefix and suffix, is assigned the value 0 0
4. The fourth ABCD, without the same prefix and suffix, is assigned the value 0
5. The fifth ABCDA, the longest identical prefix and suffix A, the assignment is 1
6. The sixth ABCDAB, the longest identical prefix and suffix AB, the assignment is 2
7. The first Seven ABCDABD, without the same prefix and suffix, assign a value of 0
Partial match value
and use this formula:

Number of shifts = number of characters matched - corresponding partial match value

Then we know:
1. The first substring from the beginning A, of length 1, and the number of shifts 1-0 = 1
2. The second AB, of length 2, the number of shifts 2-0 = 2
3. The third one ABC, length 3, shift bits 3-0 = 3
4. fourth ABCD, length 4, shift bits 4-0 = 4
5. fifth ABCDA, length 5, shift bits 5-1 = 4
6. The sixth ABCDAB, the length is 6, the number of shifts 6-2 = 4
7. The seventh ABCDABD, the length is 7, the number of shifts 7-0 = 7

Get the pattern moving array code, because I am too lazy to new objects, I wrote static, as follows:

/**
     * movelen = arrlen - maxmatchlen
     * @param pattern 匹配字符串
     * @return pattern对应的移动数组
     */
    static int[] getMoveArr(String pattern){
        //初始化moveArr数组并赋初值为0
        int[] moveArr = new int[pattern.length()];
        Arrays.fill(moveArr, 0);

        //将pattern所有子串提出
        for(int i = 0; i < pattern.length(); i++){
            String substr = pattern.substring(0, i+1);
//          System.out.println(i + substr + ":");
            //比较子串前缀、后缀是否相同,从长度为1开始
            for(int j = 0; j < substr.length()-1; j++){
                String prefixStr = substr.substring(0, j+1);
                String suffixStr = substr.substring(substr.length()-1-j, substr.length());
                //如果相同将部分匹配值赋给moveArr
                if(prefixStr.equals(suffixStr)){
//                  System.out.println(headStr + "-" + tailStr);
                    moveArr[i] = prefixStr.length();
                }
            }
            //利用movelen = arrlen - maxmatchlen公式
            moveArr[i] = substr.length() - moveArr[i];
        }

        return moveArr;
    }

Get the index of the first match

In fact, the next step is very simple, match pattern from the beginning, and extract a substring of length 7 (pattern.length()) each time to match pattern. If the match is successful, return the index; if the match fails, find the first mismatched position, and move according to the above moving array to reduce the number of matches. Finally, if not found, return -1.

static int indexKMP(String str, String pattern){
        //获取当移动数组
        int[] moveArr = getMoveArr(pattern);

        //从开始遍历str
        for(int strIndex = 0; strIndex+pattern.length() <= str.length(); ){
            //从str中提取出pattern长度的字符串
            String subStr = str.substring(strIndex, strIndex+pattern.length());
            //匹配成功,返回结果
            if(subStr.equals(pattern)){
                return strIndex;
            } else {
                //匹配失败,找到第一次失败子串的长度,strIndex使用移动数组移动
                for(int j = 0; j < pattern.length(); j++){
                    String strSub = subStr.substring(0, j+1);
                    String patternSub = pattern.substring(0, j+1);
                    //此处在该循环中必然会成立一次,因此for循环自增不需要
                    if(!strSub.equals(patternSub)){
                        strIndex += moveArr[j];
//                      strIndex--;//均衡for循环自增
                        break;
                    }
                } 
            }
        }
        //失败返回-1
        return -1;
    }

Main function:

    public static void main(String[] args) {
        String str = "BBC ABCDAB ABCDABCDABDE";
        String pattern = "ABCDABD";
        System.out.println(indexKMP(str, pattern));
    }

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325600911&siteId=291194637