KMP algorithm (java implementation)

The kmp algorithm is an improved string matching method. The core of the algorithm is to useAfter the string match failsofAvailable information, Try to reduce the matching times between the pattern string and the main string to achieve the purpose of fast matching. The specific implementation is through a next() function, which itself contains the local matching information of the pattern string.


∗ * What is "string matching failed"?
First of all, we must first understand the main matching mechanism of kmp-kmp's matching mechanism is optimized on the BF algorithm, then, what is the BF algorithm?
BF (brute force) algorithm is a brute force matching method for strings: each character of the main string is matched one by one, and the matching method is also checked character by character

 public boolean BF(String query,String target){
    
    
        for(int i=0;i<query.length();i++){
    
    
            int start=i;
            int ptr=0;
            while (ptr < target.length()) {
    
    
                if(query.charAt(start+ptr)!=target.charAt(ptr))
                    break;
                ptr++;
            }
            if(ptr==target.length())
                return true;
        }
        return false;
   }

String matching failure refers to the failure to match a character of the main string. For example, the main string is "abcdcdgadk" and the pattern string is "cdcdk". When matching the first character c of the main string:
Insert picture description here




∗ * What is the "available information" after a failed match?
The available information refers to the length of the common matchable string of the prefix and suffix of the pattern string (for the pattern string "cdc" the common matchable prefix suffix length is 1, and "cdcd" is 2)


∗ * Then how to use the available information after the match fails to achieve the purpose of reducing the number of matches?
Similarly, take the main string "abcdcdgadk" and the pattern string "cdcdk" at the first "c" of the main string as an example. When the matching fails, when the next round of matching is performed, the common matchable prefix of the pattern string substring must be combined with Only when the suffixes overlap, the correct match can be obtained.
Insert picture description here

All in all, in the BF algorithm, there are multiple characters in the pattern string and several consecutive characters in the main string are all equal, but when the last character is not equal, the comparison position of the main string needs to be rolled back to the bottom of the comparison starting point. A location. In the above-mentioned situation, the KMP algorithm does not need to roll back the main string position, which can greatly improve the efficiency.

When designing the kmp algorithm to solve the problem, how should we implement the kmp algorithm operation? We need to maintain an int next[i] array in advance to store the common prefix suffix of the substring before the i-th character in the pattern string. The length of the matched string.

Example: For the pattern string "cdcdk"
next[0]=0, next[1]=0, next[2]=1, next[3]=2, next[4]=0


When dealing with the next[] array, we can use the dynamic programming algorithm:

//target是模式字符串(String target)
 int[] next=new int[target.length()];
        for(int i=1;i<target.length();i++){
    
    
           if(target.charAt(i)!=target.charAt(next[i-1]))
               next[i]=target.charAt(i)==target.charAt(0)?1:0;
           else
               next[i]=next[i-1]+1;
        }

KMP algorithm complete code:

public boolean kmp(String query,String target){
    
    
        int longLen=query.length();
        int shortLen=target.length();
        //处理next数组,记录模式串所有字串的公共前后可匹配字符串长度
        int[] next=new int[shortLen];
        for(int i=1;i<shortLen;i++){
    
    
           if(target.charAt(i)!=target.charAt(next[i-1]))
                next[i]=target.charAt(i)==target.charAt(0)?1:0;
           else
               next[i]=next[i-1]+1;

        }
        //match是当前模式字符串在主串中被匹配到的字符的index
        int match=0;
        for(int i=0;i<longLen;i++){
    
    
            if(query.charAt(i)==target.charAt(match))
                match++;
            else{
    
    
                /*
                 *如果match==0时,依然无法匹配,
                 *则无需重新匹配主串当前字符,自动前移,
                 *否则会出现越界问题以及死循环问题
                 */
                if(match!=0) {
    
    
                    /*
                    * 依据已被成功匹配的模式字串的公共前后缀可匹配字符串的长度,
                    * 减少无用的匹配(index在next[match-1]之前的字符已经
                    * 可以成功匹配了)
                    */
                    match = next[match - 1];
                    //i--是为了主串的当前字符重新匹配
                    i--;
                }
            }
            /*
             *如果match的值超过了模式字符串的最大index,
             *则说明模式串被匹配完了,返回true
             */
            if(match==shortLen)
                return true;
     }
        return false;

   }

Guess you like

Origin blog.csdn.net/CY2333333/article/details/108203076