面试准备——动态规划(1):编辑距离及其回溯路径

编辑距离leetcode代码网址:https://github.com/vivianLL/LeetCode

一、求编辑距离(Leetcode 72)

编辑距离(Edit Distance),是指两个字串之间,由一个转成另一个所需的最少编辑操作次数。允许对字符串中的字符进行的的操作只有替换、插入、删除三种操作。
编辑距离是自然语言处理中的重要的文本比较算法之一。也是从多个相似的字符串组中提取字符串的有利的武器。编辑距离算法,也称为LD算法。LD算法就是自然语言处理(NLP)里的“编辑距离”算法。俄国科学家Levenshtein提出的,故又叫Levenshtein Distance (LD算法)。
解题思路:
定义这样一个函数——edit(i, j),它表示第一个字符串的长度为i的子串到第二个字符串的长度为j的子串的编辑距离。
显然可以有如下动态规划公式:

  • if i == 0 且 j == 0,edit(i, j) = 0
  • if i == 0 且 j > 0,edit(i, j) = j
  • if i > 0 且j == 0,edit(i, j) = i
  • if i ≥ 1 且 j ≥ 1 ,edit(i, j) == min{ edit(i-1, j) + 1, edit(i, j-1) + 1, edit(i-1, j-1) + f(i, j) },当第一个字符串的第i个字符不等于第二个字符串的第j个字符时,f(i, j) = 1;否则,f(i, j) = 0。

由此可得二维矩阵。

python实现如下:

import heapq

class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        # # 法一:递归 复杂度高
        # if len(word1)==0:
        #     return len(word2)
        # elif len(word2)==0:
        #     return len(word1)
        # elif word1[-1]==word2[-1]:
        #     return self.minDistance(word1[:-1], word2[:-1])
        # else:
        #     return min(self.minDistance(word1,word2[:-1])+1,self.minDistance(word1[:-1],word2)+1,self.minDistance(word1[:-1],word2[:-1])+1)

        # #法二:二重循环 建立二维矩阵 时间复杂度为 O(mn), 空间复杂度 O(mn)
        # if len(word1) == 0:
        #     return len(word2)
        # elif len(word2)==0:
        #     return len(word1)
        # M = len(word1)
        # N = len(word2)
        # output = [[0] * (N + 1) for _ in range(M + 1)]
        # for i in range(M + 1):
        #     for j in range(N + 1):
        #         if i == 0 and j == 0:
        #             output[i][j] = 0
        #         elif i == 0 and j != 0:
        #             output[i][j] = j
        #         elif i != 0 and j == 0:
        #             output[i][j] = i
        #         elif word1[i - 1] == word2[j - 1]:
        #             output[i][j] = output[i - 1][j - 1]
        #         else:
        #             output[i][j] = min(output[i - 1][j - 1] + 1, output[i - 1][j] + 1, output[i][j - 1] + 1)
        # return output[M][N]

        # #法三 时间复杂度为 O(mn), 空间复杂度 O(m)
        # if len(word1) == 0:
        #     return len(word2)
        # elif len(word2) == 0:
        #     return len(word1)
        # M = len(word1)
        # N = len(word2)
        # tmp = [i for i in range(N + 1)]
        # value = None
        #
        # for i in range(M):
        #     tmp[0] = i + 1
        #     last = i
        #     for j in range(N):
        #         if word1[i] == word2[j]:
        #             value = last
        #         else:
        #             value = 1 + min(last, tmp[j], tmp[j + 1])
        #         last = tmp[j + 1]
        #         tmp[j + 1] = value
        # return value

        # 法三
        heap = [(0, word1, word2)]
        seen = set()

        while len(heap) > 0:
            dist, w1, w2 = heapq.heappop(heap)
            if w1 == w2:
                return dist
            if (w1, w2) not in seen:
                seen.add((w1, w2))
                while w1 and w2 and w1[-1] == w2[-1]:
                    w1 = w1[:-1]
                    w2 = w2[:-1]
                else:
                    heapq.heappush(heap, (dist + 1, w1[:-1], w2))
                    heapq.heappush(heap, (dist + 1, w1, w2[:-1]))
                    heapq.heappush(heap, (dist + 1, w1[:-1], w2[:-1]))


sol = Solution()
ans = sol.minDistance("se","")
print(ans)
ans = sol.minDistance("intention","execution")
print(ans)
ans = sol.minDistance("dinitrophenylhydrazine","benzalphenylhydrazone")
print(ans)

二、根据回溯路径求编辑操作

举例,如

str1=bcdabcdef
str2=abcdefbcd

构建矩阵如图:
在这里插入图片描述
解题思路:
寻找回溯路径时要从右下角的元素开始,依次看当前元素是如何得到的,有时一个元素可能有多种得到的方式,即表明可以有多种操作可以得到相同的结果。上图的红色箭头即为回溯路径。将回溯路径再反过来就可得到实际编辑操作的路径。

  • 向左走,即 dp[m][n]=dp[m-1][n]+1 , 表示删除一个字符
  • 斜向下,且值未变,说明相同,不用操作
  • 斜向下,且值加1,即dp[m][n]=dp[m-1][n-1]+1,表示替换一个字符
  • 向上走,即dp[m][n]=dp[m][n-1]+1,表示添加一个字符

若ai≠bj,回溯到左上角、上边、左边中值最小的单元格,若有相同最小值的单元格,优先级按照左上角、上边、左边的顺序。
若回溯到左上角单元格,将Ai添加到匹配字串A,将Bj添加到匹配字串B;
若回溯到上边单元格,将Bi添加到匹配字串B,将_添加到匹配字串A;
若回溯到左边单元格,将_添加到匹配字串B,将Aj添加到匹配字串A;
搜索晚整个匹配路径,匹配字串也就完成了。

def minDistance(word1, word2) -> int:
    if len(word1) == 0:
        return len(word2)
    elif len(word2) == 0:
        return len(word1)
    M = len(word1)
    N = len(word2)
    output = [[0] * (N + 1) for _ in range(M + 1)]
    for i in range(M + 1):
        for j in range(N + 1):
            if i == 0 and j == 0:
                output[i][j] = 0
            elif i == 0 and j != 0:
                output[i][j] = j
            elif i != 0 and j == 0:
                output[i][j] = i
            elif word1[i - 1] == word2[j - 1]:
                output[i][j] = output[i - 1][j - 1]
            else:
                output[i][j] = min(output[i - 1][j - 1] + 1, output[i - 1][j] + 1, output[i][j - 1] + 1)
    return output

def backtrackingPath(word1,word2):
    dp = minDistance(word1,word2)
    m = len(dp)-1
    n = len(dp[0])-1
    operation = []
    spokenstr = []
    writtenstr = []

    while n>=0 or m>=0:
        if n and dp[m][n-1]+1 == dp[m][n]:
            print("insert %c\n" %(word2[n-1]))
            spokenstr.append("insert")
            writtenstr.append(word2[n-1])
            operation.append("NULLREF:"+word2[n-1])
            n -= 1
            continue
        if m and dp[m-1][n]+1 == dp[m][n]:
            print("delete %c\n" %(word1[m-1]))
            spokenstr.append(word1[m-1])
            writtenstr.append("delete")
            operation.append(word1[m-1]+":NULLHYP")
            m -= 1
            continue
        if dp[m-1][n-1]+1 == dp[m][n]:
            print("replace %c %c\n" %(word1[m-1],word2[n-1]))
            spokenstr.append(word1[m - 1])
            writtenstr.append(word2[n-1])
            operation.append(word1[m - 1] + ":"+word2[n-1])
            n -= 1
            m -= 1
            continue
        if dp[m-1][n-1] == dp[m][n]:
            spokenstr.append(' ')
            writtenstr.append(' ')
            operation.append(word1[m-1])
        n -= 1
        m -= 1
    spokenstr = spokenstr[::-1]
    writtenstr = writtenstr[::-1]
    operation = operation[::-1]
    # print(spokenstr,writtenstr)
    # print(operation)
    return spokenstr,writtenstr,operation

参考网址:
编辑距离及编辑距离算法
编辑距离及其回溯路径
编辑距离算法(LD)详解
文本比较算法:编辑距离

另一种文本比较算法——最长公共子序列,也是基于动态规划。

发布了143 篇原创文章 · 获赞 161 · 访问量 29万+

猜你喜欢

转载自blog.csdn.net/vivian_ll/article/details/93168926