Brute force matching or KMP algorithm to solve string matching problem

1. String matching problem

  • Given a string S and a string T, querying the position of T in S is a string matching problem;
    insert image description here

2. Solutions

2.1 Violent matching algorithm

2.1.1 Algorithm steps

  • Use double pointers, pointer i points to S string for traversal, pointer j points to T for traversal;
  • Once S[I]!=T[j] at a certain moment, it means that the current substring does not match, and the match needs to be restarted, that is, i=i-j+1 (the new starting position of the matching operation in S), j=0 (T matches from the beginning);
  • When j=T.length, the match is successful;
    insert image description here

2.1.2 Code implementation

package com.northsmile.string;

/**
 * @author NorthSmile
 * @version 1.0
 * @date 2023/8/22&1:20
 * 暴力匹配算法解决字符串匹配问题
 */
public class StrMatch {
    
    
    public static void main(String[] args) {
    
    
        String s="abdbcabcdef";
        String t="abc";
        System.out.println(match(s,t));
    }

    public static int match(String s,String t){
    
    
        if (t.length()>s.length()){
    
    
            return -1;
        }
        if (s.equals(t)){
    
    
            return 0;
        }
        int i=0,j=0;
        while (i<s.length()&&j<t.length()){
    
    
            if (s.charAt(i)==t.charAt(j)){
    
    
                i++;
                j++;
            }else{
    
    
                i=i-j+1;
                j=0;
            }
        }
        return j==t.length()?i-j:-1;
    }
}

2.2 KMP algorithm

  • Disadvantages of the violent matching algorithm: in the matching process, once the matching fails, the text string needs to use the current matching starting point + 1 as a new starting point. When the length of the text string and pattern string is large, the performance is relatively low;
  • Use the KMP algorithm for string matching, and reduce unnecessary invalid matching operations by using the longest common substring of the prefix and suffix of the string, which can improve the matching speed;
    insert image description here

2.2.1 Algorithm steps

insert image description here

2.2.2 next array calculation

  • Calculate the maximum common length of the prefix and suffix corresponding to each position, and obtain the maximum common length table;
  • Shift the maximum length by one bit to the right, and fill the first position with -1 (if the first position requires 0 to start, just add 1 to the next array elements of this matter);
    insert image description here
    insert image description here

2.2.2 Code implementation

package com.northsmile.string;

import java.util.Arrays;

/**
 * @author NorthSmile
 * @version 1.0
 * @date 2023/8/22&1:20
 * KMP算法
 * 目的:i不回退,j回退到特定的位置
 */
public class KMP{
    
    
    public static void main(String[] args) {
    
    
        String str="abdbcabcdef";
        String pattern="abc";
//        String pattern="abcababcabc";
//        String str="BBC ABCDAB ABCDABCDABDE";
//        String pattern="ABCDABD";
//        String pattern="AAAB";
        System.out.println(Arrays.toString(calNext(pattern)));
        System.out.println(match(str,pattern,0));
    }

    /**
     * 从str的pos位置查找pattern
     * @param str
     * @param pattern
     * @param pos
     * @return
     */
    public static int match(String str,String pattern,int pos){
    
    
        if (str==null||pattern==null){
    
    
            return -1;
        }
        if (pattern.length()>str.length()){
    
    
            return -1;
        }
        if (pos<0||pos>=pattern.length()){
    
    
            return -1;
        }
        if (str.equals(pattern)){
    
    
            return 0;
        }
        int[] next=calNext(pattern);
        // i指向文本串,j指向模式串
        int i=pos,j=0;
        while (i<str.length()&&j<pattern.length()){
    
    
        	// j=-1表示两个串第一个字符就不匹配
            if ((j==-1)||str.charAt(i)==pattern.charAt(j)){
    
    
                i++;
                j++;
            }else{
    
    
                // 回退j
                j = next[j];
            }
        }
        return j==pattern.length()?i-j:-1;
    }

    // 字符串对应next数组的计算
    public static int[] calNext(String str){
    
    
        int n=str.length();
        int[] next=new int[n];
        next[0]=-1;
        next[1]=0;
        for (int i=2,k=next[1];i<n;i++){
    
    
        	// k=-1表示前缀和后缀没有公共串
            if (k==-1||str.charAt(i-1)==str.charAt(k)){
    
    
                next[i]=k+1;
                k=next[i];
            }else{
    
    
                k=next[k];
                i--;
            }
        }
        return next;
    }
}



3. Real questions

3.1 Lituo 28. Find the subscript of the first match in the string

class Solution {
    
    
    public int strStr(String haystack, String needle) {
    
    
        return kmp(haystack,needle);
    }

    public int kmp(String str, String pattern){
    
    
        if(str==null||pattern==null){
    
    
            return -1;
        }
        if(str.length()==0||pattern.length()==0){
    
    
            return -1;
        }
        if(pattern.length()>str.length()){
    
    
            return -1;
        }
        if(pattern.equals(str)){
    
    
            return 0;
        }
        // 计算模式串的next数组
        int[] next=getNext(pattern);
        // 匹配查找
        int i=0,j=0;
        while(i<str.length()&&j<pattern.length()){
    
    
            if(j==-1||str.charAt(i)==pattern.charAt(j)){
    
    
                i++;
                j++;
            }else{
    
    
                // j回退
                j=next[j];
            }
        }
        return j==pattern.length()?i-j:-1;
    }

    public int[] getNext(String str){
    
    
        int n=str.length();
        if(n==1){
    
    
            return new int[]{
    
    -1};
        }
        if(n==2){
    
    
            return new int[]{
    
    -1,0};
        }
        int[] next=new int[n];
        next[0]=-1;
        next[1]=0;
        // k用于记录i-1位置需要回退的位置
        int i=2,k=0;
        while(i<n){
    
    
            if(k==-1||str.charAt(i-1)==str.charAt(k)){
    
    
                next[i]=k+1;
                k++;
                i++;
            }else{
    
    
                // k回退
                k=next[k];
            }
        }
        return next;
    }
}

3.2 Leetour 459. Repeated substrings

class Solution {
    
    
    // 字符串匹配
    public boolean repeatedSubstringPattern(String s) {
    
    
        return kmp(s+s,s,1)!=s.length();
    }

    public int kmp(String str, String pattern,int pos){
    
    
        if(str==null||pattern==null){
    
    
            return -1;
        }
        if(str.length()==0||pattern.length()==0){
    
    
            return -1;
        }
        if(pattern.length()>str.length()){
    
    
            return -1;
        }
        if(pattern.equals(str)){
    
    
            return 0;
        }
        // 计算模式串的next数组
        int[] next=getNext(pattern);
        // 匹配查找
        int i=pos,j=0;
        while(i<str.length()&&j<pattern.length()){
    
    
            if(j==-1||str.charAt(i)==pattern.charAt(j)){
    
    
                i++;
                j++;
            }else{
    
    
                // j回退
                j=next[j];
            }
        }
        return j==pattern.length()?i-j:-1;
    }

    public int[] getNext(String str){
    
    
        int n=str.length();
        if(n==1){
    
    
            return new int[]{
    
    -1};
        }
        if(n==2){
    
    
            return new int[]{
    
    -1,0};
        }
        int[] next=new int[n];
        next[0]=-1;
        next[1]=0;
        // k用于记录i-1位置需要回退的位置
        int i=2,k=0;
        while(i<n){
    
    
            if(k==-1||str.charAt(i-1)==str.charAt(k)){
    
    
                next[i]=k+1;
                k++;
                i++;
            }else{
    
    
                // k回退
                k=next[k];
            }
        }
        return next;
    }
}

3.3 NC149 kmp algorithm

import java.util.*;


public class Solution {
    
    
    /**
     * 代码中的类名、方法名、参数名已经指定,请勿修改,直接返回方法规定的值即可
     *
     * 计算模板串S在文本串T中出现了多少次
     * @param S string字符串 模板串
     * @param T string字符串 文本串
     * @return int整型
     */
    static int count=0;
    public int kmp (String S, String T) {
    
    
        kmp(T,S,0);
        return count;
    }
    
    public void kmp (String s, String p,int pos) {
    
    
        if(s==null||p==null){
    
    
            return;
        }
        if(s.length()==0||p.length()==0){
    
    
            return;
        }
        if(pos<0||pos>=s.length()){
    
    
            return;
        }
        int[] next=getNext(p);
        int i=pos,j=0;
        while(i<s.length()&&j<p.length()){
    
    
            if(j==-1||s.charAt(i)==p.charAt(j)){
    
    
                i++;
                j++;
            }else{
    
    
                j=next[j];
            }
            if(j==p.length()){
    
    
                count++;
                j=next[j];
            }
        }
    }

    public int[] getNext(String s){
    
    
        int n=s.length();
        if(n==1){
    
    
            return new int[]{
    
    -1};
        }
        if(n==2){
    
    
            return new int[]{
    
    -1,0};
        }
        int[] next=new int[n+1];
        next[0]=-1;
        next[1]=0;
        int i=2,k=0;
        while(i<=n){
    
    
            if(k==-1||s.charAt(i-1)==s.charAt(k)){
    
    
                next[i]=k+1;
                k++;
                i++;
            }else{
    
    
                k=next[k];
            }
        }
        return next;
    }
}

3.4 KMP algorithm

import java.util.*;

// 注意类名必须为 Main, 不要有任何 package xxx 信息
public class Main {
    
    
    public static void main(String[] args) {
    
    
        Scanner in = new Scanner(System.in);
        // 注意 hasNext 和 hasNextLine 的区别
        while (in.hasNextLine()) {
    
     // 注意 while 处理多个 case
            String str = in.nextLine();
            String match = in.nextLine();
            List<Integer> ans=kmp(str,match,0);
            if(ans.size()==0){
    
    
                System.out.println(-1);
                return;
            }
            for(int i=0;i<ans.size();i++){
    
    
                System.out.print(ans.get(i));
                if(i!=ans.size()-1){
    
    
                    System.out.print(" ");
                }else{
    
    
                    System.out.println();
                }
            }
        }
    }

    public static List<Integer> kmp (String s, String p,int pos) {
    
    
        List<Integer> ans=new ArrayList<>();
        if(s==null||p==null){
    
    
            return ans;
        }
        if(s.length()==0||p.length()==0){
    
    
            return ans;
        }
        if(pos<0||pos>=s.length()){
    
    
            return ans;
        }
        int[] next=getNext(p);
        int i=pos,j=0;
        while(i<s.length()&&j<p.length()){
    
    
            if(j==-1||s.charAt(i)==p.charAt(j)){
    
    
                i++;
                j++;
            }else{
    
    
                j=next[j];
            }
            if(j==p.length()){
    
    
                ans.add(i-j);
                j=next[j];
            }
        }
        return ans;
    }

    public static int[] getNext(String s){
    
    
        int n=s.length();
        if(n==1){
    
    
            return new int[]{
    
    -1,0};
        }
        int[] next=new int[n+1];
        next[0]=-1;
        next[1]=0;
        int i=2,k=0;
        while(i<=n){
    
    
            if(k==-1||s.charAt(i-1)==s.charAt(k)){
    
    
                next[i]=k+1;
                k++;
                i++;
            }else{
    
    
                k=next[k];
            }
        }
        return next;
    }
}

Refer to the link https://www.zhihu.com/question/21923021/answer/769606119 , this blogger is very clear about the next array calculation!

Guess you like

Origin blog.csdn.net/qq_43665602/article/details/132415349