Pattern Matching (Java)

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/qq_42267300/article/details/86695859

Pattern Matching (Java)


Pattern Matching

Pattern matching operation is a basic string data structure.
Because the string we learned, and most of the operations are relatively clear, but the pattern matching operation is relatively somewhat more difficult, so we are here simply be told.
DETAILED pattern matching operation is as follows: Given a substring (also referred to as a pattern string), the same requirements to identify all substrings in the substring in a string.

We are here to tell two kinds of common realization:

  1. Violence match
  2. KMP algorithm

Matching violence (BF algorithm)

The main idea: the first element in the main string starts, one by one compared with the pattern string comparing the first element, is equal to, if the different elements, a main string back to the next element, and an element pattern sequence is compared, followed by cycle.
For ease of discussion, we have to explain for the following cases:
require location-finding mode string first string appears in the main, not found -1 is returned.


We call this process is divided into five steps:

  1. First of all, we need to master string traversal.
  2. The main string for each iteration, the pattern string is compared with an element if the same comparison.
  3. If the end of the comparison string pattern, string pattern described successful match, return to the main index of the current string.
  4. If two different elements, described herein, the match fails, the next main string continue to traverse element.
  5. If the main string traverse end, not a successful match, then the main stream without the pattern string, returns -1.

We demonstrate an example, assuming that the main string abdabcda, pattern string ABCD, the pattern matching should be 3.

Main string element to the current iteration / subscript Main string elements Pattern string element Compare
a/0 a a Equal, compare the next element
a/0 b b Equal, compare the next element
a/0 d c Are not equal, back, continue traversing main string
b/1 b a Are not equal, back, continue traversing main string
d/2 d a Are not equal, back, continue traversing main string
a/3 a a Equal, compare the next element
a/3 b b Equal, compare the next element
a/3 c c Equal, compare the next element
a/3 d d Equal, compare the next element
a/3 a \0 Compare the end of the string pattern matching success

The complete code is as follows

public static int bruteForceStringMatch(String str, String pattern) {
    //如果主串长度不小于模式串,则进入模式匹配
    if (str.length() >= pattern.length()) {
        //获取两串的字符数组,以便遍历
        char strOfChars[] = str.toCharArray();
        char patternOfChars[] = pattern.toCharArray();

        //两个循环控制变量
        int loopOfStr, loopOfPattern;
        //遍历主串,任意一串遍历结束,则匹配结束
        for (loopOfStr = 0, loopOfPattern = 0 ; loopOfStr < str.length() && loopOfPattern < pattern.length() ;) {
            //如果两元素相同,比较下一个元素
            if (strOfChars[loopOfStr] == patternOfChars[loopOfPattern]) {
                loopOfStr++;
                loopOfPattern++;
            } else {
                loopOfStr -= loopOfPattern - 1;//主串下标回溯
                loopOfPattern = 0;//模式串下标重置
            }
        }

        //模式串匹配结束,表示匹配成功
        if (loopOfPattern == pattern.length()) {
            return loopOfStr - loopOfPattern;//主串中模式串第一次出现的位置
        }
    }

    //模式匹配失败
    return -1; 
}

The time complexity of design of the master pattern string and string lengths of m, n, then it is in the worst case time complexity is O (m * n).


KMP algorithm

KMP algorithm

Mainly to solve the problem of BF backtracking algorithm, which reduces the time complexity. His time complexity of O (m + n).

The main idea: the key KMP algorithm is the use of information after the match fails, try to reduce the number of matches two strings in order to achieve rapid match. Next through a [] array to find the longest and the same prefix and suffix, to reduce the number of matches.


We give an example, take a look at how the work of the KMP algorithm is
the main string: AAAAAB
pattern string: AAAB

When BF algorithm:
Here Insert Picture Description
we will find in the match, with the first string in the main string mode only the fourth element is not the same, the same as other elements. We found that the pattern string in the first three elements are the same, we might think about it, when the second match, the first two letters pattern string is also necessary to compare it?

Obviously compare these two is not necessary, then we will help next [] array to help.


Next AAAB the pattern string [] array is {0, 1}, we will explain later Next [] array how to strike. We first match, a different fourth element, the cursor moves to the next pattern string next [3] position, i.e. 2, i.e. from the next matching start position A of the third, skip the first two a a, to reduce the number of comparisons match. Followed by similar operations.

We will process KMP matching algorithm is also divided into five steps:

  1. First of all, we need to master string traversal.
  2. The main string for each iteration, the pattern string is compared with an element if the same comparison.
  3. If the end of the comparison string pattern, string pattern described successful match, return to the main index of the current string.
  4. If two different elements, described herein, the match fails, to update the pattern string subscripts Next [] value of the position, the main strings continue to traverse the next element.
  5. If the main string traverse end, not a successful match, then the main stream without the pattern string, returns -1. :

KMP algorithm solving:
Here Insert Picture Description

KMP algorithm code as follows

public static int KMP(String str, String pattern) {
	//如果主串长度不小于模式串,则进入模式匹配
    if (str.length() >= pattern.length()) {
    	//获取next数组
    	int next[] = getNext(pattern);
    
        //获取两串的字符数组,以便遍历
        char strOfChars[] = str.toCharArray();
        char patternOfChars[] = pattern.toCharArray();

        //两个循环控制变量
        int loopOfStr, loopOfPattern;
        //遍历主串,任意一串遍历结束,则匹配结束
        for (loopOfStr = 0, loopOfPattern = 0 ; loopOfStr < str.length() && loopOfPattern < pattern.length() ;) {
            //如果两元素相同,或模式串全部匹配失败,比较下一个元素
            if (loopOfPattern == -1 || strOfChars[loopOfStr] == patternOfChars[loopOfPattern]) {
                loopOfStr++;
                loopOfPattern++;
            } else {
                loopOfPattern = next[loopOfPattern];//模式串下标置为next值
            }
        }

        //模式串匹配结束,表示匹配成功
        if (loopOfPattern == pattern.length()) {
            return loopOfStr - loopOfPattern;//主串中模式串第一次出现的位置
        }
    }

    //模式匹配失败
    return -1; 
}

next [] array

After the above example we found, next [] array to strike, is the most important part of KMP algorithm, then the next [] array of exactly how should we seek?

Next [] array pattern string actually storing the maximum length of the prefix and suffix of the same for each element (not including itself), thus causing a jump when a match match. We still use the above model to explain the string: AAAB

Note : We default to the first element of the next set to -1

Prefix suffix The maximum length
0 -1
1 0
2 A A 1
3 A,AA AA,A 2

next [loopOfPattern] = nextValue, here we find the idea of ​​recursive value next [loopOfPattern + 1] is:

  1. 如果p[loopOfPattern] = p[nextValue],则next[nexValue+1] = next[nextValue] + 1;
  2. ! If p [loopOfPattern] = p [nextValue], so that the nextValue = next [nextValue], at this time if p [loopOfPattern] == p [nextValue], the next [loopOfPattern + 1] = nextValue + 1;
  3. If not equal, then the prefix index recursively, so nextValue = next [nextValue], is determined to continue until nextValue = -1 (i.e. nextValue = next [0]) or p [loopOfPattern] = p [nextValue] up

International practice, the codes


The method of realization getNext

private static int[] getNext(String pattern)
{
	//获取两串的字符数组,以便遍历
    char patternOfChars[] = pattern.toCharArray();
    //创建next数组
    int[] next = new int[pattern.length()];
    int nextValue = -1, loopOfPattern = 0;//初始化next值及模式串下标
    next[0] = -1;//这里采用-1做标识
    while(loopOfPattern < pattern.length() -1)
    {
	    //获取next数组
        if(nextValue == -1 || patternOfChars[loopOfPattern] == patternOfChars[nextValue])
        {
            nextValue++;
            loopOfPattern++;
            next[loopOfPattern] = nextValue;
        } else {
            nextValue = next[nextValue];
        }
    }
    return next;
}

Guess you like

Origin blog.csdn.net/qq_42267300/article/details/86695859