String search algorithms: brute force search, KMP

Foreword nonsense

  Recently, my mind is a little dizzy, but drinking some white wine soaked in red dates has miraculously improved, and I feel very comfortable. It seems that drinking a small amount of alcohol can make people more sober, and long-term drinking may have a health-preserving effect? I wrote here and went to Baidu, and found that red dates really have a health-preserving effect. For people who sit by the computer for a long time, not only the eyes, but the whole body is doomed to be in poor condition, so we still need to pay attention to health preservation. Although the entire industry is very cumbersome and the Internet is in a cold winter, I still try to spare a little time to go out for a walk and exercise, so that people will be more energetic and efficient in doing things. Some time ago, a boss died of a myocardial infarction in his 40s. It should be because he didn’t pay attention to his body or didn’t have the energy to take care of his health problems? To some extent, the profession of programmer is already a high-risk profession, with a mid-life unemployment crisis and an accompanying mid-life death crisis. .
  Although there are various crises in this line of work, you can still get fun from simply writing code. This is a bit of a benefit for coders.

brute force search

  After talking nonsense, back to the topic, when it comes to string search, the most simple and direct one that everyone can think of: violent search. But this search algorithm everyone knows that its efficiency is relatively low, and the time complexity is O(M * N).

The violent search process is as shown in the figure below:

insert image description here

code:


    public boolean searchStr(String text,String p){
    
    
        if(text == null || text.equals("")|| p == null||p.equals("") )return false;
        int i = 0,j = 0;
        while (i <text.length()) {
    
    
            if(text.charAt(i) == p.charAt(j)){
    
    
                if(j == p.length()-1){
    
    
                    System.out.println("找到了:"+text.substring(i-j,i+1));
                    return true;
                }
                j++;
                i++;
            }else{
    
    
                i= i-j+1;
                j=0;
            }
        }
        return false;
    }

  As can be seen from the dynamic graph above, brute force search has done a lot of repetitive search work. Or take the data of the dynamic graph as an example:

insert image description here
  The string is matched: the last of the three characters of edc is not matched. It is necessary to re-comparison from the strings that have been searched, which leads to a large number of repeated search processes in the process of text search. If you are searching for a specific string in a large text, using this algorithm to search is a disaster.

KMP algorithm

  The reason for the inefficiency of the brute force search algorithm mentioned above is that it is necessary to repeatedly search for characters that have already been searched. So is there an algorithm to solve this problem? The KMP algorithm is a way to solve this problem. The pointer of the KMP algorithm to search the text will only go forward and will not go back, which solves the problem of repeated searches caused by the pointer back during violent search.

insert image description here
  When the kmp algorithm encounters that the string is not completely matched, it is required k指针to return to the correct position, and the pointing text 指针iremains unchanged. Then continue to compare the values ​​of p[k] and text[i].

  Question: How to find out 指针Kwhich position should be returned after the pattern string P fails to match?

insert image description here
  In this example abab matches, so how should we move it? Looking at the picture above, it is to find out p【abab】how text【abab】much the longest can match. This is actually to find the longest common prefix and suffix of the string [abab]. As for why you need to find this, I will give an example later; let me briefly talk about what a prefix and a suffix are.

  Take abcab as an example:

  • Prefixes : a, ab, abc, abca. These are prefixes of abcab. have one thing in common: must have the first character, and cannot contain the last character.

  • Suffixes : b, ab, cab, bcab. They are all suffixes of abcab. Suffix features: must contain the last character and cannot have the first character.

  • Longest common prefix and suffix : Find the same string in the prefix and suffix, and find the longest one among these same strings.

  As an example:

insert image description here

  In this example, text[4] != p[4], so the pointer to p is rolled back. 现在假设指向p的指针可以回退到下标为3的位置,那么它必须满足p[0,2] = text[1,3],只有这部分相同才能继续往后面比较.

  The problem to be solved in the kmp algorithm is that the pointer to the text cannot be rolled back, so the pointer to p can only be rolled back, so that the prefix string of p[0,3] matches the suffix string of text[0,3]. And p[0,3] = text[0,3]; so the suffix of text[0,3] is equal to the suffix of p[0,3]; in fact, it is to find the common prefix and suffix of p[0,3].

  The above paragraph is relatively convoluted, and it is one of the key points to understand the kmp algorithm. Once you understand this point, the rest will be easier.

  For any string p, any number of characters may be matched when matching the text. Possible matches: 0, 1, 2, 3. . . [p. length()] pieces. If n characters are matched, then the longest common prefix and suffix of p[0,n] should be calculated.

  For example: p = ababc, then the characters to be calculated are: a, ab, aba, abab respectively calculate the longest common prefix and suffix of these strings. Among them: a has only one character that does not meet the definition of prefix and suffix, so there is no common prefix and suffix.

  


    public int[] next(String p){
    
    
        int[] next = new int[p.length()];
        if(p.length() <2)return next;
        int i = 1,k =0;
        while (i < p.length()) {
    
    
            if(p.charAt(i) == p.charAt(k)){
    
    
                k++;
                next[i++] = k;
            }else {
    
    
                if(k ==0)i++;
                else k=0;
            }
        }
        return next;
    }

    public boolean kmp(String text,String p){
    
    
        int[] next = next(p);
        int k = 0;
        int searchCount = 0;
        for(int i = 0;i <text.length(); ){
    
    
            searchCount++;
            if(text.charAt(i) == p.charAt(k)){
    
    
                if(k == p.length()-1){
    
    
                    System.out.println("kmp搜索次数:"+searchCount);
                    return true;
                }
                k++;
                i++;
            }else{
    
    
                if(k!= 0)
                     k = next[k-1];
                else
                    i++;
            }
        }
        System.out.println("kmp搜索次数:"+searchCount);
        return false;
    }


  Finding the next array is a bit tricky, but it is not easy to describe in words. At that time, I drew a picture manually to understand how the next should be generated. After thinking for a long time, I didn't come up with an easy-to-understand way to express it. Maybe the best way is to draw pictures to understand. As for the main body search process of kmp, it is very simple, similar to the violent search process, but there is no need to roll back the text pointer i, and the k pointer can also be obtained directly by using the next array.

Guess you like

Origin blog.csdn.net/m0_37550986/article/details/130899436