Planning the positive dynamic expression

Previous article "Dynamic Programming Detailed" received widespread acclaim, today wrote a practical application of dynamic programming: regular expressions. If readers of "dynamic programming" do not know, look at the suggestions above article.

Regular expression matching is a very sophisticated algorithm, but the difficulty is not small. In this paper, write two regular symbols algorithm: "." Dot and asterisk "*" If you've used regular expressions, should understand their usage, do not understand it does not matter, and so will be introduced. Finally, the article introduces a fast seen overlapping sub-solving skills.

Another important purpose of this article, the reader is to teach how to design algorithms. We usually look at the solution, see the complete answer to a direct-rounded, always feel can not understand, and even think the problem is too hard, too much food themselves. I try to show readers, algorithm design is a spiral, stepwise refinement process, not a one-step algorithm will be able to write correctly. This article will take you to solve the more complex problems, to let you know how to make things simple, one by one break, set up a final answer from the most simple framework.

Framework of thinking many times previously emphasized, this design is in the process of gradual training. Following entered, first look at the title:

title

First, warm up

The first step, we do regardless of the regular symbol, if it is to compare two ordinary strings, how to match? I think this algorithm should anyone would write:

bool isMatch(string text, string pattern) {
    if (text.size() != pattern.size()) 
        return false;
    for (int j = 0; j < pattern.size(); j++) {
        if (pattern[j] != text[j])
            return false;
    }
    return true;
}

Then, what I slightly modified the code above, a little bit complicated, but the meaning is the same, it is easy to understand:

bool isMatch(string text, string pattern) {
    int i = 0; // text 的索引位置
    int j = 0; // pattern 的索引位置
    while (j < pattern.size()) {
        if (i >= text.size()) 
            return false;
        if (pattern[j++] != text[i++])
            return false;
    }
    // 相等则说明完成匹配
    return j == text.size();
}

As rewritten, in order to be transformed into the algorithm recursive algorithm (pseudo code):

def isMatch(text, pattern) -> bool:
    if pattern is empty: return (text is empty?)
    first_match = (text not empty) and pattern[0] == text[0]
    return first_match and isMatch(text[1:], pattern[1:])

If you can understand this code, congratulations, you recursive thinking already in place, the regular expression algorithms though a bit complicated, but this is based on a recursive code is gradually transformed from.

Second, the process dot "." Wildcard

The ID can match any one character, Tiger Balm Well, in fact, it is the most simple, slightly modified to:

def isMatch(text, pattern) -> bool:
    if not pattern: return not text
    first_match = bool(text) and pattern[0] in {text[0], '.'}
    return first_match and isMatch(text[1:], pattern[1:])

Third, the deal with "*" wildcard

The asterisk wildcard before you can make a character repeated any number of times, including zero. That in the end is repeated several times it? It seems a little difficult, but do not worry, we can at least build a framework for further:

def isMatch(text, pattern) -> bool:
    if not pattern: return not text
    first_match = bool(text) and pattern[0] in {text[0], '.'}
    if len(pattern) >= 2 and pattern[1] == '*':
        # 发现 '*' 通配符
    else:
        return first_match and isMatch(text[1:], pattern[1:])

The asterisk in front of the character in the end to be repeated several times it? This requires brute force computer calculation, it is assumed that repeated N times. Previously repeatedly stressed, write recursive technique is to oversee the moment, after things thrown recursion. Specifically here, no matter how much N is currently only two choices: Match 0, 1 match. So can this treatment:

if len(pattern) >= 2 and pattern[1] == '*':
    return isMatch(text, pattern[2:]) or \
            first_match and isMatch(text[1:], pattern)
# 解释：如果发现有字符和 '*' 结合，
    # 或者匹配该字符 0 次，然后跳过该字符和 '*'
    # 或者当 pattern[0] 和 text[0] 匹配后，移动 text

You can see that we are by reservation pattern of "*", and goes back text, to achieve "*" character will be repeated many times in the match function. As a simple example of this logic can understand. Hypothesis pattern = a*, text = aaa, draw a map and see the matching process:

example

So far, the regular expression algorithm is basically completed,

Fourth, dynamic programming

I chose to use "memo" recursion to reduce complexity. With violent solution, and the optimization process is simple, is to use two variables i, j to record the current location matches, avoiding the use of sub-string sections, and the i, j is stored in the memo, to avoid double counting.

I'll violent solution and optimization solution together, allowing you to compare, you can find the optimal solution is nothing more than the violent solution "translate" again, added a memo as a memo, nothing more.

# 带备忘录的递归
def isMatch(text, pattern) -> bool:
    memo = dict() # 备忘录
    def dp(i, j):
        if (i, j) in memo: return memo[(i, j)]
        if j == len(pattern): return i == len(text)

        first = i < len(text) and pattern[j] in {text[i], '.'}
        
        if j <= len(pattern) - 2 and pattern[j + 1] == '*':
            ans = dp(i, j + 2) or \
                    first and dp(i + 1, j)
        else:
            ans = first and dp(i + 1, j + 1)
            
        memo[(i, j)] = ans
        return ans
    
    return dp(0, 0)

# 暴力递归
def isMatch(text, pattern) -> bool:
    if not pattern: return not text

    first = bool(text) and pattern[0] in {text[0], '.'}

    if len(pattern) >= 2 and pattern[1] == '*':
        return isMatch(text, pattern[2:]) or \
                first and isMatch(text[1:], pattern)
    else:
        return first and isMatch(text[1:], pattern[1:])

Some readers may ask, how do you know this issue is a dynamic programming problem then, how do you know it exists, "overlapping sub-problems" it, this seems not easy to see it that way?

Answer this question, the most intuitive input should be just a hypothesis, and then draw a recursive tree would be sure to find the same node. It belongs to the quantitative analysis, in fact, not too much trouble, I'll teach you the following qualitative analysis, one can see the nature of "overlapping sub-problems."

The easiest acquire that deed Fibonacci number column, for example, we abstract frame recursive algorithm:

def fib(n):
    fib(n - 1) #1
    fib(n - 2) #2

Looking at this framework, I ask the original question f (n) How to touch up the sub-questions f (n - 2)? There are two routes, one is f (n) -> # 1 -> # 1, and second, f (n) -> # 2 . After two former recursive, recursive just been to the latter. Two different calculation paths to reach the same problem, which is the "overlapping sub-problems", but it is certain that, as long as you find a path to repeat, repeat this path there must be 10 million, means that a huge amount of overlapping sub-problems .

Similarly, for this issue, we are still first abstract algorithm framework:

def dp(i, j):
    dp(i, j + 2)     #1
    dp(i + 1, j)     #2
    dp(i + 1, j + 1) #3

I asked a similar question, how dp from the original problem (i, j) contact of sub-problems dp (i + 2, j + 2)? There are at least two paths, one dp (i, j) -> # 3 -> # 3, the second is dp (i, j) -> # 1 -> # 2 -> # 2. Therefore, this issue must overlapping sub-problems, we need to optimize the dynamic programming techniques to deal with.

Fifth, concluded

In this article, you in-depth understanding of two common wildcard regular expression algorithm. In fact, the dot "." To achieve their simple, the key is to realize an asterisk "*" need to use dynamic programming techniques, slightly more complicated, but we also Jiabu Zhu layers dismantling of the problem, one by one break. In addition, you mastered the art of a quick analysis of the nature "of overlapping sub-problems", you can quickly determine whether a problem can be solved using dynamic programming routines.

Review the entire problem-solving process, you should be able to understand the algorithm design process: starting from simple similar problems, to gradually assemble a new basic framework of logic, and eventually become a more complex, sophisticated algorithms. So, readers do not have to fear some of the more complex algorithmic problems, think more and more the analogy, then the algorithm on tall in your eyes, but also a crackling.

If this article helpful, I welcome the attention of the public number labuladong, dedicated to algorithmic problems make it clear to you ~

I recently crafted an e-book " labuladong algorithm cheat sheet ", is divided into dynamic programming [] [] [data structures, algorithms thinking] [high-frequency] interview four chapters, a total of more than 60 original articles, is absolutely fine! Limited Time available for download , in my public number labuladong background Keywords reply [pdf] you can free download!

table of Contents

I welcome the clear stream of public concern number labuladong, number of public technology, adhere to the original, dedicated to the problem clearly!

labuladong