Algorithm-the difference between two strings

This type of problem can be achieved with the help of dynamic programming, where string 1 is used as the x-axis and string 2 is used as the y-axis

To summarize, two strings support four operations, namely add, modify, delete, and change

1. Edit distance problem

72. Edit distance

给你两个单词 word1 和 word2,请你计算出将 word1 转换成 word2 所使用的最少操作数 。

你可以对一个单词进行如下三种操作:

插入一个字符
删除一个字符
替换一个字符
 

示例 1:

输入:word1 = "horse", word2 = "ros"
输出:3
解释:
horse -> rorse (将 'h' 替换为 'r')
rorse -> rose (删除 'r')
rose -> ros (删除 'e')
示例 2:

输入:word1 = "intention", word2 = "execution"
输出:5
解释:
intention -> inention (删除 't')
inention -> enention (将 'i' 替换为 'e')
enention -> exention (将 'n' 替换为 'x')
exention -> exection (将 'n' 替换为 'c')
exection -> execution (插入 'u')

With the help of this picture, you can better understand:
Insert picture description here

Edit distance supports insertion, deletion and replacement operations. We establish a dynamic programming equation dp[i][j] to represent the matching of strings at the i-1 and j-1 positions, which can be divided into two cases

cs1[i-1]==cs2[j-1], indicating that the string matches and does not need to be edited, then:

dp[i][j]=dp[i-1][j-1]

cs1[i-1]!=cs2[j-1], you need to perform insert, delete, and replace operations. Which one is better? We should calculate the minimum value and +1 step.

dp[i][j]=Math.min(dp[i-1][j-1],Math.min(dp[i-1][j],dp[i][j-1]))+1;

How to determine the initial state of the dynamic programming equation?
We know that there is only one possibility to change from an empty string to a certain string, which is to add characters. Therefore, the first row and the first column of the array are equal to the corresponding row and column values.

...
dp[i][0]=i;
...
dp[0][j]=j;
...

At this point, the problem is solved:

    public int minDistance(String word1, String word2) {
    
    
        char[] cs1=word1.toCharArray();
        char[] cs2=word2.toCharArray();
        int[][] dp=new int[cs1.length+1][cs2.length+1];
        for(int i=0;i<=cs1.length;i++){
    
    
            dp[i][0]=i;
        }
        for(int j=0;j<=cs2.length;j++){
    
    
            dp[0][j]=j;
        }

        for(int i=1;i<=cs1.length;i++){
    
    
            for(int j=1;j<=cs2.length;j++){
    
    
                if(cs1[i-1]==cs2[j-1]){
    
    
                    dp[i][j]=dp[i-1][j-1];
                }else{
    
    
                    dp[i][j]=Math.min(dp[i-1][j-1],Math.min(dp[i-1][j],dp[i][j-1]))+1;
                }
            }
        }
        return dp[cs1.length][cs2.length];
    }

2. The longest common subsequence problem

1143. Longest Common Subsequence

给定两个字符串 text1 和 text2,返回这两个字符串的最长公共子序列。

一个字符串的 子序列 是指这样一个新的字符串:它是由原字符串在不改变字符的相对顺序的情况下删除某些字符(也可以不删除任何字符)后组成的新字符串。
例如,"ace" 是 "abcde" 的子序列,但 "aec" 不是 "abcde" 的子序列。两个字符串的「公共子序列」是这两个字符串所共同拥有的子序列。

若这两个字符串没有公共子序列,则返回 0。

示例 1:
输入:text1 = "abcde", text2 = "ace" 
输出:3  
解释:最长公共子序列是 "ace",它的长度为 3。
示例 2:

输入:text1 = "abc", text2 = "abc"
输出:3
解释:最长公共子序列是 "abc",它的长度为 3。
示例 3:

输入:text1 = "abc", text2 = "def"
输出:0
解释:两个字符串没有公共子序列,返回 0。
 

提示:

1 <= text1.length <= 1000
1 <= text2.length <= 1000
输入的字符串只含有小写英文字符。

The following picture was obtained from elsewhere to illustrate the solution to this problem. Unlike the edit distance, characters cannot be added in this title, only characters can be deleted, so the first row and the first column of the array are both 0, and in the edit distance, since strings can be added, the first row is the first The columns are all 1 (except 0, 0 position element).
Insert picture description here
There is only delete operation in this question

Let dp[i][j] be the longest string length of the string at position i and j, then it can be seen that if cs[i-1]==cs[j-1], then position (i, The value of j) dp[i][j]=dp[i-1][j-1]+1
Otherwise, dp[i][j]=Math.max(dp[i-1][j],dp [i][j-1]). That is to delete a character

From this we can conclude

    public int longestCommonSubsequence(String text1, String text2) {
    
    
        char cs1[]=text1.toCharArray();
        char cs2[]=text2.toCharArray();
        int[][] dp=new int[cs1.length+1][cs2.length+1];
        for(int i=1;i<=cs1.length;i++){
    
    
            for(int j=1;j<=cs2.length;j++){
    
    
                if(cs1[i-1]==cs2[j-1]){
    
    
                    dp[i][j]=dp[i-1][j-1]+1;
                }else{
    
    
                    dp[i][j]=Math.max(dp[i-1][j],dp[i][j-1]);
                }
            }
        }
        return dp[cs1.length][cs2.length];
    }

3. Different subsequences

115. Different subsequences

给定一个字符串 S 和一个字符串 T,计算在 S 的子序列中 T 出现的个数。

一个字符串的一个子序列是指,通过删除一些(也可以不删除)字符且不干扰剩余字符相对位置所组成的新字符串。(例如,"ACE" 是 "ABCDE" 的一个子序列,而 "AEC" 不是)

 

示例 1:

输入:S = "rabbbit", T = "rabbit"
输出:3
解释:

如下图所示, 有 3 种可以从 S 中得到 "rabbit" 的方案。
(上箭头符号 ^ 表示选取的字母)

rabbbit
^^^^ ^^
rabbbit
^^ ^^^^
rabbbit
^^^ ^^^
示例 2:

输入:S = "babgbag", T = "bag"
输出:5
解释:

如下图所示, 有 5 种可以从 S 中得到 "bag" 的方案。 
(上箭头符号 ^ 表示选取的字母)

babgbag
^^ ^
babgbag
^^    ^
babgbag
^    ^^
babgbag
  ^  ^^
babgbag
    ^^^

This question is exactly the same as the above two questions, but the condition is changed to only delete or not delete.

What happens if it is not deleted?

dp[i][j]=dp[i-1][j-1];

What's the situation with deletion?

p[i][j]=dp[i][j-1];

If the characters corresponding to the two positions are the same, we can choose to delete or not delete, so:

dp[i][j]=dp[i-1][j-1]+dp[i][j-1];
    public int numDistinct(String s, String t) {
    
    
        int len1=s.length();
        int len2=t.length();
        int [][]dp=new int[len2+1][len1+1];
        for(int i=0;i<=len1;i++){
    
    
            dp[0][i]=1;//第一行为1,因为删除字符得到空字符
        }
        char[] cs=s.toCharArray();
        char[] ct=t.toCharArray();
        for(int i=1;i<=len2;i++){
    
    
            for(int j=1;j<=len1;j++){
    
    
                if(cs[j-1]==ct[i-1]){
    
    
                    dp[i][j]=dp[i-1][j-1]+dp[i][j-1];
                }else{
    
    
                    dp[i][j]=dp[i][j-1];
                }
            }
        }
        return dp[len2][len1];
    }

Guess you like

Origin blog.csdn.net/qq_23594799/article/details/105340048