KMP algorithm------string matching problem

KMP algorithm

Application scenario-string matching problem

There is a string str1 = "Henan Henan University of Software, Henan University of Science and Technology, Henan University of Science and Technology" and str2 = "Henan University of Science and Technology"

Now we need to judge whether str1 contains str2, if it exists, return the position of the first occurrence, if not, return -1

This is a lot of examples of string matching problems

The first thing we think of is violent matching

If we use the idea of ​​violent matching, and assume that str1 now matches the i position and the substring str2 matches the j position, then there are:

1. If the current character matches successfully (ie str1[i] == str2[j] ), then i++, j++, continue to match the next character

2. If there is a mismatch ( ie str1[i]!= str2[j] ), set i=i-(j-1) and j=0. Equivalent to every time a match fails, i goes back and j is set to 0 .

3. If you use violent methods to solve it, there will be a lot of backtracking, and only move one bit at a time. If it does not match, move to the next one and then judge, wasting a lot of time. (Not feasible!)

package 字符串匹配问题;
public class ViolenceMatch {
    
    
	public static void main(String[] args) {
    
    
		// TODO Auto-generated method stub
		String str1 = "河南河南软件河南大学河南科技大学软件学院";
		String str2 = "河南科技大学";
		int index = violenceMatch(str1, str2);
		System.out.println(index);
	}
	//暴力匹配算法
	public static int violenceMatch(String str1,String str2){
    
    
		char[] s1 = str1.toCharArray();
		char[] s2 = str2.toCharArray();
		
		int s1Len = s1.length;
		int s2Len = s2.length;
		
		int i = 0;//指向s1
		int j = 0;//指向s2
		while(i<s1Len && j<s2Len){
    
    //保证匹配不越界
			if(s1[i] == s2[j]){
    
    
				//匹配成功
				i++;
				j++;
			}else{
    
    
				i = i - (j-1);
				j = 0;
			}
		}
		//判断是否匹配成功
		if(j == s2Len){
    
    
			return i - j;
		}else{
    
    
			return -1;
		}
	}
}

KMP algorithm introduction

KMP is a classic algorithm for solving whether the pattern string has appeared in the text string, and if it has appeared, the position of the earliest occurrence

Before the algorithm is determined using the method of KMP through information through a next array , stored length of the longest common subsequence longitudinal string pattern , each time back by the array to find the next, over the front of the matching location, eliminating the large amount of calculation time

Case study

There is a string str1="BBC ABCDAB ABCDABCDABDE", and a substring str2="ABCDABD"

Now we need to judge whether str1 contains str2, if it exists, return the position of the first occurrence, if not, return -1.
Requirements: Use the KMP algorithm to complete the judgment, not a simple brute force matching algorithm.

I will not write the text narration process. After all, many big guys write it very clearly. If you are interested here, you can search for it. I posted a link to an old man https://blog.csdn.net/dark_cy/article/ details/88698736

package 字符串匹配问题;
import java.util.Arrays;
public class KMPAlgorithm {
    
    

	public static void main(String[] args) {
    
    
		// TODO Auto-generated method stub
		String str1 = "BBC ABCDAB ABCDABCDABDE";
		String str2 = "ABCDABD";
		
		int next[] = kmpNext("ABCDABD");  //[0,1]
		System.out.println("next"+Arrays.toString(next));
		int index = kmpSearch(str1, str2, next);
		System.out.println(index);
	}
	
	//获取到一个字符串(子串)的部分匹配值
	public static int[] kmpNext(String dest){
    
    
		//创建一个next数组保存部分匹配值
		int[] next = new int[dest.length()];
		next[0] = 0; //如果字符串是长度为1部分匹配值就是0
		for(int i = 1,j = 0;i < dest.length();i++){
    
    
			//当dest.charAt(i) != dest.charAt(j)
			//我们需要从next[j-1]获取新的j
			//知道我们发现有dest.charAt(i) == dest.charAt(j)成立才推出
			while(j > 0 && dest.charAt(i) != dest.charAt(j)){
    
    
				j = next[j-1];
			}
			//当dest.charAt(i) == dest.charAt(j)
			if(dest.charAt(i) == dest.charAt(j)){
    
    
				//部分匹配值就需要+1
				j++;
			}
			next[i] = j;
		}
		return next;
	}
	//写出KMP搜索算法
	/**
	 * 
	 * @param str1			原字符串
	 * @param str2			需要找的子串
	 * @param next			部分匹配表(子串对应的)
	 * @return				找到返回第一次出现的位置,没有匹配到返回-1
	 */
	public static int kmpSearch(String str1,String str2,int[] next){
    
    
		//遍历
		for (int i = 0,j = 0; i < str1.length(); i++) {
    
    
			//需要考虑不相等的情况    核心之处str1.charAt(i) != str2.charAt(j)
			while(j > 0 && str1.charAt(i) != str2.charAt(j)){
    
    
				j = next[j-1];
			}
			if(str1.charAt(i) == str2.charAt(j)){
    
    
				j++;
			}
			if(j == str2.length()){
    
    
				//找到了
				return i - j + 1;
			}
		}
		return -1;
	}
}

Personal understanding

The difficulty of the KMP algorithm lies in this part of the matching value. You have to understand this by yourself. Those who are interested can search on the Internet. The same as the general matching method is that when the characters are equal, they are all moved backward. , Continue to judge. The difference from the brute force method is that when the two characters are not equal, brute force is to put the position where we started to match to the position after the starting point , and then continue to match from this position. , It will be very troublesome, and the KMP algorithm is based on the return value of our partial matching value to find us the location of rematching. If you understand how this matching value came from, you will also know why you want to move to that location . In general, the core of the KMP algorithm lies in the partial matching value and the following code

		while(j > 0 && str1.charAt(i) != str2.charAt(j)){
    
    
			j = next[j-1];
		}

That is to say, when our substring matches to the position where the subscript is greater than 0, and the character with the subscript i of the original string is not equal to the character with the current substring position j, we have to re-match, and this position is from The value in our partial matching table is taken out, that is to say, the original string corresponding to our i has been going backwards, and we did not want to brute force the way back in one position.

Guess you like

Origin blog.csdn.net/qq_22155255/article/details/113888417