Code Capriccio - String - Implementing strStr()

Implement strStr()

This section corresponds to the Code Random Notes: Code Random Notes , Explanatory Video: Help you learn the KMP algorithm thoroughly! (Theory)_bilibili_bilibili , help you learn the KMP algorithm thoroughly! (Seeking next array code)_bilibili_bilibili

exercise

Question link: 28. Find the subscript of the first matching item in the string - LeetCode

Given two strings haystack and needle, please find the subscript of the first matching item of the needle string in the haystack string (the subscript starts from 0). If needle is not part of haystack, -1 is returned.

示例 1:
输入:haystack = "sadbutsad", needle = "sad"
输出:0
解释:"sad" 在下标 0 和 6 处匹配。
第一个匹配项的下标是 0 ,所以返回 0 。

brute force solution

The question means whether the needle string appears in the haystack string. If it does, return the position of the first occurrence.

Then the intuitive solution is to traverse the haystack string and continuously compare the current string with the first character of needle. If they are the same, then compare the subsequent length elements in sequence to see if they are still equal to the elements at the corresponding positions of needle. It should be noted that the boundary condition for traversing the haystack string is i+needle.size()<=haystack.size()because once the remaining haystack string is less than the length of needle, it will definitely not be matched to avoid haystack[i+j]possible array out-of-bounds situations.

class Solution {
    
    
public:
    int strStr(string haystack, string needle) {
    
    
        int n = haystack.size(), m = needle.size();
        // 循环遍历haystack,i表示当前检查的位置
        for (int i = 0; i + m <= n; i++) {
    
    
            bool flag = true;
            // 循环遍历needle字符串的每个字符
            for (int j = 0; j < m; j++) {
    
    
                if (haystack[i + j] != needle[j]) {
    
     
                    // 如果在某个字符处匹配失败,则标记flag为false,跳出循环
                    flag = false;
                    break;  
                } 
            }
            if (flag) {
    
     // 如果整个needle字符串都匹配上了,返回起始位置i
                return i;
            }
        }
        return -1;// 如果找不到needle字符串,返回-1
    }
};
  • Time complexity: O( m ∗ nm*nmn ). String matching is achieved through two levels of loops. The number of times in the outer loop is n - m + 1 (where n is the length of haystack and m is the length of needle). The number of times in the inner loop is m, the length of needle. Therefore, the time complexity of this algorithm is O(nm)
  • Space complexity: O( 1 11 ). Additional storage space at the constant level is used because only a few integer variables and two string parameters are used, which does not change as the amount of input data changes. Therefore, the space complexity of this algorithm is O(1)

kmp solution

This question mainly examines KMP. In the above brute force solution, once there is a mismatch, we move one bit backward and try to match again, while kmp optimizes this moving process and moves more bits backward to improve efficiency. To put it simply, as shown in the figure below, kmp moves the prefix of the common suffix to the suffix position instead of just moving one position like ①.

Regarding the theory of kmp, it is recommended to watch this video first: [Tianqin Postgraduate Entrance Examination] Easy-to-understand version of KMP algorithm_bilibili_bilibili to understand the principle. Then watch the video recorded by Code Caprice to familiarize yourself with how to write code.

Insert image description here

So how much should be moved each time there is a mismatch? This involves the construction of the next array. When performing string matching, we first construct the next array to record the common suffix length of each position, and then move directly according to the next array when there is no match.

class Solution {
    
    
public:
    // 获取next数组,用于字符串匹配
    void getNext(int* next, const string& s) {
    
    
        // ①初始化next数组第一个值为0
        int j = 0;
        next[0] = 0; 
        // 循环遍历s中每个字符
        for(int i = 1; i < s.size(); i++) {
    
     
            // ②前后缀不相同
            while (j > 0 && s[i] != s[j]) {
    
    
                j = next[j - 1]; //若匹配失败则回溯到之前的状态继续匹配
            }
            // ③前后缀相同
            if (s[i] == s[j]) {
    
     //若当前字符和目标匹配
                j++; //将匹配数量+1
            }
            // ④更新next数组
            next[i] = j; //将新的匹配数量重新赋值至next数组
        }
    }
    // 实现字符串匹配算法
    int strStr(string haystack, string needle) {
    
    
        int next[needle.size()];
        getNext(next, needle); //获取needle字符串的next数组
        int j = 0;  //j代表子串needle中已经匹配到的字符个数
        // 循环遍历haystack中的每个字符
        for (int i = 0; i < haystack.size(); i++) {
    
     
            while(j > 0 && haystack[i] != needle[j]) {
    
    
                j = next[j - 1];    //回溯,将j移动到目前匹配的最长公共前后缀的结尾处
            }
            if (haystack[i] == needle[j]) {
    
     //如果当前字符匹配成功
                j++; //继续匹配下一个字符
            }
            if (j == needle.size() ) {
    
     //如果匹配成功,返回子串在字符串中的位置
                return (i - needle.size() + 1);
            }
        }
        return -1; //匹配失败,返回-1
    }
};
  • Time complexity: O( m + n m+nm+n ). Where m and n are the lengths of haystack and needle strings respectively. In the strStr function, there is a while loop nested in the for loop. The maximum number of loops is haystack string length m plus needle string length n, so the time complexity is O(m+n)
  • Space complexity: O( nnn ). An int array next is defined, its length is needle string length n, so the space complexity is O(n)

Guess you like

Origin blog.csdn.net/zss192/article/details/129958819