leetcode笔记动态规划在字符串匹配中的应用

0 参考文献

序号	标题
1	一招解决4道leetcode hard题，动态规划在字符串匹配问题中的应用
2	10.Regular Expression Matching

1. [10. Regular Expression Matching]

1.1 题目

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Note:

s could be empty and contains only lowercase letters a-z.
p could be empty and contains only lowercase letters a-z, and characters like . or *.

Example 1:

Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input:
s = "aa"
p = "a*"
Output: true
Explanation: '*' means zero or more of the precedeng element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

Example 3:

Input:
s = "ab"
p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Example 4:

Input:
s = "aab"
p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore it matches "aab".

Example 5:

Input:
s = "mississippi"
p = "mis*is*p*."
Output: false

1.2 思路&&方法

dp数组

首先建立一个二维数组，数P组的列是代表了字符串S，数组的行代表了字符串P。dp[i] [j] 表示P[0:i] 匹配S[0:j]。因此如果最后P能够匹配S，则dp [len(P)] [len(S)] == True 。注意dp[0] [0] 分别是S是空字符和P是空字符的时候。这个时候是必定匹配的，因此dp [0] [0] = True。

之后需要做的事情就是依次填满这个矩阵。为此需要初始化dp [ 0 ] [ j ] 和dp [ i ] [ 0 ]既第0行和第0列。

对于第0行因为P为空，则除了S是空以外其他的都不匹配。因此 dp [ 0 ] [ j ] = False
对于第0列，则需要判断下P是否能匹配S是空串的情况。在S是空串的情况下，之后P是空串或者P是带有 " * "的情况下才能匹配，因此只需要处理这两种情况。
1. P是空串的情况下，可以匹配S。因此dp [ 0 ] [ 0 ] = True
2. P是" * "的情况下，例如"abc"，因为可以是匹配0个或者多个字符。因此当在这种情况下，号其实可以消掉前面的字符变成""。因此dp [ i ] [ 0 ] = dp [ i-2 ] [ 0 ] and P [ i -1 ] == " "。这里为什么不是判断dp [ i -1 ] [ 0 ] 是否为True 而是判断dp [ i-2 ] [ 0 ]呢？是因为dp [ i-1 ] [ 0 ]是表示 P [ 0 : i - 2 ] 能够匹配S [ 0 ]，如果P [ 0 : i -2 ]能够匹配S [ 0 ]，那么当前字符" * " 消掉前一个字符便无法匹配S [ 0 ] (既dp [ i ] [ 0 ] == False)。如下图的示例，当P="a * b * " 是可以匹配""空字符的。那么当i = 2 ，j = 0 的时候，必须有dp [ 0 ] [ 0 ] == True 才能得到 dp [ 2 ] [ 0 ] == True。

1557325258465

到此，dp矩阵的初始化已经好了。这个时候，矩阵中的值如图所示。绿色部分是已经初始化的值。空白的部分是待填充的。

1557325258465

接下来就是填充dp矩阵的剩余部分。对于dp [ i ] [ j ] ( i>1, j>1)会有以下的几种情况：

P [ i - 1 ] == " * " ：

对于这种情况，还可以区别2中情况：
1. " * " 抵消前面的字符，既 " * "匹配空字符串：
  
  对于这种情况则和前文所述的方法一样，dp [ i ] [ j ] == dp [ i-2 ] [ j ]
2. " * "匹配前面的字符N次：
  
  对于这种情况，则需要在 ( ( P[ i - 1 ] == " . " ) or ( S[ j -1 ] == P [ i -2 ] ) )的情况下，dp [ i -1 ] [ j ] == True。这是为什么呢？原因在于如果要匹配0-N次，则代表了P[ 0 - i -2 ] (既dp [ i -1 ] [ XXXX ] ) 能完全匹配S[ 0 : j - 1 ]。
  
  如例子中的 "a." 能匹配 "abb"。
P [ i - 1 ] == " . " or P [ i - 1 ] == " 一个正常的字符 " :

如果是这种情况见简单的多，既( S [ j - 1 ] == P [ j -1 ] or P [ j -1 ] == " . " ) and dp [ i - 1 ] [ j -1 ] == True 。

1557325258465

1.3 实现

class Solution(object):
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        # dp[i][j] 代表了p字符串从0-i是否匹配s字符的0-j
        row = len(p) + 1
        col = len(s) + 1
        dp = [ [False for i in range( col ) ] for j in range( row ) ]
        dp[0][0] = True # dp[0][0] 代表了p是空串 s是空串
        # 当s时空串的情况下，p的不同，匹配的不同情况。为接下去匹配len(s) = 1 ,2 ,3 .... n 做准备
        # 当s为空串的时候，只有a*b*这种能匹配。
        # 因此dp[0][0] 为空串，所以i-1实际真正指向p的一个字符串的位置
        for i in range( 1, row):
            dp[i][0] = ( i > 1 ) and p[ i - 1 ] == "*" and dp[ i - 2 ][0]

        for i in range( 1, row ) :
            for j in range( 1, col ):
                # 如果当前遇到了 * 则几个条件要满足
                # 1. dp[ i - 2 ][i] == 1 此时*和p[0:i-2]位置的字符串代表了空串
                # 2. 如果1 不满足，则前面不是空串，则*号前面的字符串必须和s的字符相同，或者如果*前面的字符串是.号。当然还需要 p等匹配s的子串
                if p[ i - 1 ] =="*":
                    dp[i][j] = dp[ i - 2 ][j] or ( p[ i - 2  ] == s[ j - 1 ] or p[ i - 2 ] == ".") and  dp[i][j-1]

                else:
                    dp[i][j] = ( p[ i - 1 ] == "."  or p[ i - 1 ]  == s[ j - 1 ]) and dp[i-1][j-1]

        return dp[row-1][col-1]

2. [44. Wildcard Matching]

#!/bin/python

class Solution(object):
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        row = len(p) + 1
        col = len(s) + 1
        dp = [ [False for i in range( 0, col )] for j in range( 0, row ) ]

        dp[0][0] = True
        for j in range(1, col ):
            dp[0][j] = False
        for i in range( 1, row):
            if p[i-1] == "*":
                dp[i][0] = dp[i-1][0]

        for i in range( 1, row ):
            for j in range( 1, col ):
                if p[i-1] == "*":
                    dp[i][j] = dp[i-1][j] or dp[i][j-1]
                else:
                    dp[i][j] = (s[j-1] == p[i-1] or p[i-1] == "?") and dp[i-1][j-1]
        return dp[row-1][col-1]

if __name__ == "__main__":
    m = Solution()
    print "s:[aa],p[a] ret:"+str(m.isMatch("aa","a"))
    print "s:[aa],p[*] ret:"+str(m.isMatch("aa","*"))
    print "s:[cb],p[?a] ret:"+str(m.isMatch("cb","?a"))
    print "s:[],p[] ret:"+str(m.isMatch("",""))
    print "s:[acdcb],p[a*c?b] ret:"+str(m.isMatch("acdcb","a*c?b"))
    print "s:[adceb],p[*a*b] ret:"+str(m.isMatch("adceb","*a*b"))