Levenshtein algorithm for text similarity

The levenshtein() function returns the Levenshtein distance between two strings.

 

The Levenshtein algorithm is an algorithm for calculating the minimum edit distance between two strings. The so-called minimum edit distance is the minimum steps required to convert string A into B by adding, deleting, and replacing characters. Russian scientist Vladimir Levenshtein proposed this concept in 1965, so it is called the Levenshtein algorithm.

 

Levenshtein distance, also known as edit distance, refers to the minimum number of edit operations required to convert two strings from one to the other. Permitted editing operations include replacing one character with another, inserting a character, and deleting a character.

 

The flow of the Levenshtein algorithm:

1: Calculate the length n of strA and the length m of strB

2: If n=0, the minimum edit distance is m, and if m=0, the minimum edit distance is n

3: Construct a (m+1)*(n+1) matrix Arr, and initialize the first row and first column of the matrix to be 0-n, 0-m respectively

4: Double loop, traverse strA, traverse strB on this basis, if strA[i]=strB[j], then cost=0, otherwise cost=1, judge Arr[j-1][i]+1, Arr[j][i-1]+1, the minimum value of Arr[j-1][i-1]+cost, and assign the minimum value to Arr[j][i].

5: After the loop is over, the last element of the matrix is ​​the minimum edit distance.



 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326765407&siteId=291194637