[Dynamic programming] Explain the problem of edit distance in the most simple and easy-to-understand language of the whole C station


Edit distance question! It is a problem that seems to be difficult in dynamic programming, but it is actually not that simple, but once you master the idea, the writing of the code will become extremely simple. This time, the blogger will use the most simple and easy-to-understand language in the entire C station. ——In Chinese, let me explain this question to everyone.

insert image description here

Bull link

Topic description

Given two word sums word1, word2calculate how many operations it will take to word1convert to at least. You can do 3 things with a word: a) insert a character in the word b) delete a character in the word c) replace a character in the wordword2



Example:

word1 :“a”

word2:“c”

return value 1

There are three methods (consider the simplest operation):

  1. First delete the character a in word1, then insert the character c, two steps
  2. Insert character c into word1 first, then delete character a, two-step operation
  3. Directly replace the character a in word1 with the character c, one step operation

Thought analysis

If you want to know the minimum operation cost to convert word1 as a whole into word2 as a whole, you can proceed step by step, 先求word1字符串的局部转换为word2的局部所花费的最少操作, on this basis 进行一步转移操作, or insert or delete or replace, and choose the option with the least number of operation steps 一步步逼近最后的结果.

If the first i characters of word1 are converted into the first j characters of word2, and the required edit distance is set to state F(i,j), then if you want to obtain the value of F(i, j), you can obtain the value of F(i, j) by one -step transfer of three F(i,j-1)states .F(i-1,j)F(i-1,j-1)

  • F(i,j-1)转移至F(i,j), which means that it is known that the first i characters of word1 are transferred to the first j-1 characters of word2. The minimum edit distance is F(i, j-1), but now I want more, I want to transfer the first i characters of word1 The characters are converted into the first j characters of word2, then only the jth character of word2 can be inserted into word1.
  • F(i-1,j)转移至F(i,j), which means that it is known that the first i-1 characters of word1 are transferred to the first j characters of word2. The minimum edit distance is F(i-1, j). If you want to ask for F(i, j), you can only use the extra in word1. delete the i-th character of
  • F(i-1,j-1)转移至F(i,j), which means that it is known that the first i-1 characters of word1 are transferred to the first j-1 characters of word2. The minimum edit distance is F(i-1, j-1). If you want to ask for F(i, j), you need to judge Whether the i-th character of word1 and the j-th character of word2 are the same, if they are the same, then there is no need to perform any operation, F(i, j) is equal to F(i-1, j-1), if not equal, You need to replace the i-th character of word1 with the j-th character of word2

In summary,

Four angles of dynamic programming:

  1. 状态定义F(i,j): The edit distance required to convert the first i characters of word1 into the first j characters of word2

  2. 状态间的转移方程定义F(i,j):min (insert, delete, replace or no operation)

    min(F(i, j-1), F(i-1, j), F(i-1, j-1) + (word1[i] == word2[j] ? 0:1)) (here i and j in word1[i] and word2[j] refer to the i-th character, the j-th character, not the subscript index)

  3. 状态的初始化F(i,0)= i, F(0,j)= j

(The former means that the first i characters in word1 are converted into empty strings, and only a few characters can be deleted to obtain the minimum edit distance; the latter is the same, converting an empty string into the first j characters of word2 , to get the minimum edit distance, you can only insert a few characters if a few characters are missing)

  1. 返回结果F(word1.length(),word2.length())

Legend display

Example:

word1:as

word2:ears

insert image description here

Code display

import java.util.*;
public class Solution {
    
    
    public int minDistance (String word1, String word2) {
    
    
        int row = word1.length(); //word1的长度
        int col = word2.length(); //word2的长度
        int[][] ret = new int[row + 1][col + 1]; //存放状态的数组
        //第一列初始化
        for(int i = 0;i <= row ;i++) {
    
    
            ret[i][0] = i;
        }
        //第一行初始化
        for(int j = 1;j <= col;j++) {
    
    
            ret[0][j] = j;
        }
        for(int i = 1;i <= row ;i++) {
    
    
            for(int j = 1;j <= col;j++) {
    
    
                if(word1.charAt(i-1) == word2.charAt(j-1)) {
    
    
                    ret[i][j] = ret[i-1][j-1];  //如果word1的第i个字符和word2的第j个字符相同,不需要进行任何操作
                }else {
    
    
                    ret[i][j] = Math.min(ret[i][j-1]+1,ret[i-1][j]+1); //插入和删除操作中选操作数较少的
                    ret[i][j] = Math.min(ret[i][j],ret[i-1][j-1] + 1); //上面的结果和替换操作中选编辑数小的
                }   
            }
        }
        return ret[row][col]; //返回结果
    }
}

Finish!

Guess you like

Origin blog.csdn.net/weixin_46103589/article/details/122136228