版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/qq_32768743/article/details/89739394
编辑距离 A->B 成本
DP实现
大问题 -> 子问题
输入字符串s,长度为m
模板字符串t, 长度为n
dp[i,j] s[1:i]和t[1:j]之间的距离
求dp[m,n]
- s[i]=t[j]
dp[i, j] = dp[i-1, j-1]
- s[i]!=t[j]
替换dp[i, j] = dp[i-1, j-1]+1
添加dp[i, j] = dp[i, j-1] + 1
删除dp[i, j] = dp[i-1, j] + 1
求最小
代码实现
def edit_dist(str1, str2):
m, n = len(str1), len(str2)
dp = [[0 for i in range(n + 1)] for j in range(m + 1)]
for i in range(m + 1):
for j in range(n + 1):
if i == 0:
dp[i][j] = j
elif j == 0:
dp[i][j] = i
elif str1[i - 1] == str2[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
else:
dp[i][j] = 1 + min(
dp[i - 1][j - 1],
dp[i][j - 1],
dp[i - 1][j]
)
return dp[m][n]
问题在于,如果词典太大,无法做到实时性
采用生成词的方式,
生成距离为1的词
反过来操作,替换,添加,删除
代码实现
def generate_dist_one(str):
letters = 'abcdefhijklmnopqrstuvwxyz'
t_list = []
for i in range(len(str) + 1):
t_list.append((str[:i], str[i:]))
inserts = []
replaces = []
deletes = []
for L, R in t_list:
if R:
deletes.append(L + R[1:])
for c in letters:
inserts.append(L + c + R)
if R:
replaces.append(L + c + R[1:])
return set(inserts + replaces + deletes)
然后这个代码可以简写到
def generate_dist_one(str):
letters = 'abcdefhijklmnopqrstuvwxyz'
t_list = [(str[:i], str[i:]) for i in range(len(str) + 1)]
inserts = [L + c + R for L, R in t_list for c in letters]
replaces = [L + c + R[1:] for L, R in t_list if R for c in letters]
deletes = [L + R[1:] for L, R in t_list if R]
return set(inserts + replaces + deletes)
我都惊呆了
如果需要生成距离为2的,就再根据距离为1的生成一次
数学上
学完就只感觉好厉害,好厉害