KMP algorithm commonly used algorithms ten Programmer


Here Insert Picture Description

This article is the first programmers algorithm commonly used in the top ten, behind the algorithm are summarized in the blog! ! !

A. Scenarios

 string matching problem:

  1. There is a string str1 = "" Silicon Valley Silicon Valley, yet you still Silicon Valley yet you still you still Silicon Valley hello "", and a substring str2 = "yet you still Silicon Valley you."
  2. To determine whether it contains now str1 str2, if present, it returns the position of the first occurrence, if not, it returns -1

II. Violence matching algorithm

2.1 Analysis of ideas

If the idea of ​​violence matching, and assuming now str1 matched to the position i, j matched substring str2 position, there are:

  1. If the current character matches the success (ie str1 [i] == str2 [j ]), then ++ i, J ++ , continues to match the next character
  2. If a mismatch (i.e. str1 [i] = str2 [j ]!), So that I = I - (J -. 1) , J = 0 . When the failure corresponding to each match, i backtracking, j is set to 0.
  3. The solution is to use violence, then there will be a lot of backtracking , a time to move one, if not match, then move to the next judge, wasting a lot of time. (Not possible!)

2.2 code implementation

package algorithm;

/**
 * @author 陌生人
 * @version V1.0
 * @Title:
 * @Package
 * @Description: (用一句话描述该文件做什么)
 * @date:
 */
public class ViolenceMatch {
   public static void main(String[] args) {
      String str1 = "硅硅谷 尚硅谷你尚硅 尚硅谷你尚硅谷你尚硅你好";
      String str2 = "尚硅谷你尚硅";
      int index = violenceMatch(str1, str2);
      System.out.println("index=" + index);
   }

   public static int violenceMatch(String s1, String s2) {
      char[] c1 = s1.toCharArray();
      char[] c2 = s2.toCharArray();
      int i = 0;
      int j = 0;
      int s1len = s1.length();
      int s2len = s2.length();
      while (i < s1len && j < s2len) {

         if (c1[i] == c2[j]) {
            i++;
            j++;
         } else {
            i = i - j + 1;
            j = 0;
         }
      }
      if (j == s2len) {
         return i - j;
      } else {
         return -1;
      }

   }
}

III. Algorithm introduced

  1. KMP is a model to solve the string in the text string whether there have been, if occurred, the location of the earliest classical algorithm
  2. Knuth-Morris-Pratt string search algorithm, referred to as "the KMP algorithm", often used to find the position of a pattern string P appears in a text string S, the algorithm of Donald Knuth, Vaughan Pratt, James H. Morris on three in 1977 jointly published, so it chose the three people named after the algorithm.
  3. Before the algorithm is determined using the method of KMP through information through a next array, the length of the longest common subsequence stored before and after the pattern string, each time back by the array to find the next, over the front of the matching location, eliminating the large amount of calculation time

Four .KMP algorithm optimal application

4.1 string matching problem

String matching problem:

  1. A string str1 = "BBC ABCDAB ABCDABCDABDE", and a substring str2 = "ABCDABD"
  2. To determine whether it contains now str1 str2, if present, it returns the position of the first occurrence, if not, it returns -1
  3. Requirements: KMP algorithm using the complete judgment, simply can not use violence matching algorithm.

4.2 Analysis of graphic ideas

For example, a string Str1 = "BBC ABCDAB ABCDABCDABDE", is determined, which contains another string Str2 = "ABCDABD"?

  1. First, Str1 first character and the first character of Str2 to compare, it does not, move backwards a keyword## 4.3 code implementation

  2. Repeat step or not, and then moved back

  3. Is repeated until Str1 have a character with the first character meets up Str2
    4.

  4. Then compare the string and the next character in the search term, or in line.
    Here Insert Picture Description

  5. Str1 have encountered a character with Str2 corresponding characters do not match.
    Here Insert Picture Description

  6. At this time, we expect that to continue to traverse the next character Str1 Repeat step 1. (In fact, is very unwise, because the BCD have compared, no need to do repetitive work, a basic fact is that when space and D do not match, you actually know the first six characters are "ABCDAB". KMP algorithm idea is to try to use this known information, not to the "Search position" to move back have compared the position, it continues to move backward, thus improving efficiency.)

Here Insert Picture Description

  1. How do just repeat the steps omitted? Str2 may calculate a "partial match table"

Here Insert Picture Description

  1. D is known space and does not match the first six characters "ABCDAB" match. Look-up table can be seen, the last matching character B corresponding to a "partial matching value" is 2, so the number of bits calculated by the backward movement of the following equation:
    shift bit number = number of characters matched - a portion corresponding to the matching value
    as 6 - 2 equals 4, so the search term move back four.
  2. Because spaces and C do not match the search term will continue backward. In this case, the number of characters matched is 2 ( "AB"), corresponding to the "part of the matching value" is 0. Therefore, the shift bit number = 2--0, the result is 2, then two search word shift rearwardly.
    Here Insert Picture Description

10. Because space does not match with the A, after a continued shift.
Here Insert Picture Description

  1. Bit-wise comparison, until the discovery of C and D do not match. Then, the shift bit number = 6--2, search word continues to move rearwardly 4.
    Here Insert Picture Description

  2. Bit-wise comparison, until the last bit of search terms, exact match, so the search is completed. If you also want to continue the search (ie, find all matching), moving median = 7-0, and then search for the word to move backward 7, will not repeat it here.
    Here Insert Picture Description

  3. What introduce "partial match table" describes how to produce a prefix, suffix
    Here Insert Picture Description

"Partial match value" is the length of "prefix" and "suffix" of the longest common elements.
In "ABCDABD" an example,
"A" prefix and suffix are empty set, the length of the common elements is 0;
prefix "AB" is [A], the suffix is [B], the length of the common elements is 0;
" ABC "prefix is [a, AB], the suffix [BC, C], the length of the common elements of 0;
" prefix ABCD "is [a, AB, ABC], the suffix [BCD, CD, D], a total of length of elements is 0;
prefix "ABCDA" is [a, AB, ABC, ABCD ], the suffix [BCDA, CDA, DA, a ], common elements of "a", length 1;
prefix "ABCDAB" of is [a, AB, ABC, ABCD , ABCDA], the suffix [BCDAB, CDAB, DAB, AB , B], common elements of "AB", a length of 2;
the prefix "ABCDABD" is [a, AB, ABC , ABCD, ABCDA, ABCDAB], the suffix [BCDABD, CDABD, DABD, ABD , BD, D], 0 is the length of the common elements.

  1. "Partial match" in essence, sometimes, the head and tail string will be repeated.
    For example, "ABCDAB" into two "AB", it "partial matching value" is 2 (length "AB") is. When moving the search word, the first "AB" backward movement 4 (string length - Partial match value), you can come to the second "AB" position.
    Here Insert Picture Description

Code

package algorithm;

import java.util.Arrays;

/**
 * @author 陌生人
 * @version V1.0
 * @Title:
 * @Package
 * @Description: (用一句话描述该文件做什么)
 * @date:
 */
public class KMPAlgorithm {
   public static void main(String[] args) {
      String str1 = "BBCABCDABABCDABCDABDE";
      String str2 = "ABCDABD";
//Stringstr2="BBC";
      int[] next = kmpNext("ABCDABD");//[0,1,2,0]
      System.out.println("next=" + Arrays.toString(next));
      int index = kmpSearch(str1, str2, next);
      System.out.println("index=" + index);//15了

   }

   public static int kmpSearch(String str1, String str2, int[] next) {
      //遍历
      for (int i = 0, j = 0; i < str1.length(); i++) {
         //需要处理
         // str1.charAt(i)!=str2.charAt(j),去调整j的大小
         // KMP算法核心点,可以验证...
         while (j > 0 && str1.charAt(i) != str2.charAt(j)) {
            j = next[j - 1];
         }

         if (str1.charAt(i) == str2.charAt(j)) {
            j++;
         }
         if (j == str2.length()) {//找到了//j=3i
            return i - j + 1;
         }
      }
      return -1;
   }

   //获取到一个字符串(子串)的部分匹配值表
   public static int[] kmpNext(String dest) {
      //创建一个next数组保存部分匹配值
      int[] next = new int[dest.length()];
      next[0] = 0;
      //如果字符串是长度为1部分匹配值就是0
      for (int i = 1, j = 0; i < dest.length(); i++) {
         //当dest.charAt(i)!=dest.charAt(j),我们需要从next[j-1]获取新的j
         // 直到我们发现有dest.charAt(i)==dest.charAt(j)成立才退出
         // 这时kmp算法的核心点
         while (j > 0 && dest.charAt(i) != dest.charAt(j)) {
            j = next[j - 1];
         }

         //当dest.charAt(i)==dest.charAt(j)满足时,部分匹配值就是+1
         if (dest.charAt(i) == dest.charAt(j)) {
            j++;
         }
         next[i] = j;
      }
      return next;
   }
}

Long way to go, JAVA as partners! ! !

Published 23 original articles · won praise 71 · views 5747

Guess you like

Origin blog.csdn.net/qq_43688587/article/details/105153345