KMP algorithm---solve string matching problem

Problems that KMP Algorithm Can Solve

Solve the problem of string matching, given two strings, find whether one of the strings contains the other string, and if so, return the starting position of the inclusion.
example:

char *str = "bacbababadababacambabacaddababacasdsd";
char *ptr = "ababaca";

There are two places in str that contain ptr.
write picture description here
10th and 26th positions, respectively.

Principle of KMP Algorithm

Generally, when matching strings, we select the substring with the same length as ptr (length m) from the first subscript of the target string str (assuming the length is n), and if it is the same, return the substring at the beginning. The subscript value is different. Select the next subscript of str, and also select a string of length n to compare until the end of str (in actual comparison, the subscript moves to nm). Such time complexity is O(n*m).

The KMP algorithm can be implemented with a complexity of O(n+m).

A next array is used in the KMP algorithm, and its meaning is that the longest prefix and longest suffix of a fixed string have the same length.

1. The longest prefix and the longest suffix
cbc, the longest prefix and the longest suffix are the same, which is c.
abcbc, the longest prefix and longest suffix do not exist.
abcsdsdsdabc, the longest prefix and longest suffix are the same, which is abc.

Note: The longest prefix starts with the first character, but does not include the last character. Therefore, there is no such thing as the longest prefix and longest suffix of a character. The same longest prefix and longest suffix of aaa is aa.

2. next array
Then, for a string ptr (ababaca), the length is 7, the next[0], next[1], next[2], next[3], next[4], next[ 5], next[6] are a, ab, aba, abab, ababa, ababac, aabaca, respectively, corresponding to the same longest prefix and longest suffix, "","","a","ab", "aba","","","a", so the value of the next array is [-1,-1,0,1,2,-1,0], where 1 means no existence, 0 means existence length 1, 2 means that there is a length of 3, which is to correspond to the code.

Code:

void cal_next(char *str, int *next, int len)
{
     next[0] = -1;//next[0]初始化为-1,-1表示不存在相同的最大前缀和最大后缀
     int k = -1;//k初始化为-1
     for (int q = 1; q <= len - 1; q++)//计算next数组的值
     {
          while (k > -1 && str[k + 1] != str[q])//如果下一个不同,那么k就变成next[k],注意next[k]是小于k的,无论k取任何值。
          {
              k = next[k];//往前回溯
          }
          if (str[k + 1] == str[q])//如果相同,k++
          {
              k = k + 1;
          }
          next[q] = k;//这个是把算的k的值(就是相同的最大前缀和最大后缀长)赋给next[q]
     }
}
int KMP(char *str, int slen, char *ptr, int plen)
{
     int *next = new int[plen];
     cal_next(ptr, next, plen);//计算next数组
     int k = -1;
     for (int i = 0; i < slen; i++)
     {
          while (k >-1 && ptr[k + 1] != str[i])//ptr和str不匹配,且k>-1(表示ptr和str有部分匹配)
              k = next[k];//往前回溯
          if (ptr[k + 1] == str[i])
              k = k + 1;
          if (k == plen - 1)//说明k移动到ptr的最末端
          {
              return i - plen + 1;//返回相应的位置
          }
     }
     return -1;
}
int main()
{
     char *str = "bacbababadababacambabacaddababacasdsd";
     char *ptr = "ababaca";
     int a = KMP(str, 36, ptr, 7);
     system("pause:");
     return 0;
}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325643773&siteId=291194637