MATLAB中实现编辑距离并求相似度

编辑距离,又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。

好像R2018a已经集成了编辑距离的API ,但是没有安装2018a,

dist = edr(x,y,tol)
[dist,ix,iy] = edr(x,y,tol)
[___] = edr(x,y,maxsamp)
[___] = edr(___,metric)
edr(___)

所以没办法用,只能用手写的了。

代码如下:

function [V,m,n] = EditDist(string1,string2)
% Edit Distance is a standard Dynamic Programming problem. Given two strings s1 and s2, the edit distance between s1 and s2 is the minimum number of operations required to convert string s1 to s2. The following operations are typically used:
% Replacing one character of string by another character.
% Deleting a character from string
% Adding a character to string
% Example:
% s1='article'
% s2='ardipo'
% EditDistance(s1,s2)
% > 4
% you need to do 4 actions to convert s1 to s2
% replace(t,d) , replace(c,p) , replace(l,o) , delete(e)
% using the other output, you can see the matrix solution to this problem
%
%
% by : Reza Ahmadzadeh ([email protected] - [email protected])
% 14-11-2012

m=length(string1);
n=length(string2);
v=zeros(m+1,n+1);
for i=1:1:m
    v(i+1,1)=i;
end
for j=1:1:n
    v(1,j+1)=j;
end
for i=1:m
    for j=1:n
        if (string1(i) == string2(j))
            v(i+1,j+1)=v(i,j);
        else
            v(i+1,j+1)=1+min(min(v(i+1,j),v(i,j+1)),v(i,j));
        end
    end
end
V=v(m+1,n+1);
end

怎么根据得到的最小编辑距离,求两个字符串的相似度呢?

[mindist m n]=EditDist(final_code,final_code2);
fprintf('the similarity is : %d\n',1-mindist/max(m,n ))


猜你喜欢

转载自blog.csdn.net/zhaomengszu/article/details/80493517