Algorithm and Data Structure Interview Guide - Edit Distance Question

Edit distance problem

Edit distance, also known as Levenshtein distance, refers to the minimum number of modifications required to convert two strings into each other. It is usually used to measure the similarity of two sequences in information retrieval and natural language processing.

!!! question

输入两个字符串 $s$ 和 $t$ ,返回将 $s$ 转换为 $t$ 所需的最少编辑步数。

你可以在一个字符串中进行三种编辑操作:插入一个字符、删除一个字符、替换字符为任意一个字符。

As shown in the figure below, kittenconverting to sittingrequires 3 editing steps, including 2 replacement operations and 1 adding operation; converting helloto algorequires 3 steps, including 2 replacing operations and 1 deletion operation.

Insert image description here

The edit distance problem can be naturally explained by the decision tree model . A string corresponds to a tree node, and a decision round (an editing operation) corresponds to an edge of the tree.

As shown in the figure below, without restricting operations, each node can derive many edges, and each edge corresponds to an operation, which means that there are many possible paths from hellotransformation to .algo

From the perspective of a decision tree, the goal of this question is to find the shortest path between hellonodes algo.

Insert image description here

Dynamic programming ideas

Step 1: Think about the decision-making in each round, define the state, and get dp dpd p ​​table

The decision in each round is for the string sss to perform an editing operation.

We hope that during the editing operation, the size of the problem gradually shrinks so that sub-problems can be constructed. Let the string sss andttThe lengths of t are respectivelynnn andmmm , we first consider the characters s [ n − 1 ] s[n-1]at the end of the two stringss[n1 ] andt [ m − 1 ] t[m-1]t[m1]

  • If s [ n − 1 ] s[n-1]s[n1 ] andt [ m − 1 ] t[m-1]t[m1 ] are the same, we can skip them and considers [ n − 2 ] s[n-2]s[n2 ] andt [ m − 2 ] t[m-2]t[m2]
  • If s [ n − 1 ] s[n-1]s[n1 ] andt [ m − 1 ] t[m-1]t[m1 ] different, we need tosss performs an edit (insertion, deletion, replacement) so that the characters at the end of the two strings are the same, so that they can be skipped and smaller problems can be considered.

That is, we have the string ssEach round of decision-making (editing operation) performed in s will make sss andttThe remaining characters to be matched in t change. Therefore, the status is currently insss andttiithconsidered in ti andjjj characters, recorded as[i, j] [i, j][i,j]

state [i,j][i,j][i,j ] corresponding sub-problem:convert ssthe first iiof si characters changed tottt 's frontjjThe minimum number of editing steps required for j characters .

At this point, we get a size of ( i + 1 ) × ( j + 1 ) (i+1) \times (j+1)(i+1)×(j+1 ) Two-dimensionaldp dpd p ​​table.

Step 2: Find the optimal substructure and derive the state transition equation

Consider the subproblem dp [ i , j ] dp[i, j]dp[i,j ] , the tail characters of the two corresponding strings ares [ i − 1 ] s[i-1]s[i1 ] andt [ j − 1 ] t[j-1]t[j1 ] , which can be divided into three situations as shown in the figure below according to different editing operations.

  1. s [ i − 1 ] s[i-1]s[i1 ] add t [ j − 1 ] t[j-1]aftert[j1 ] , then the remaining subproblemdp [ i , j − 1 ] dp[i, j-1]dp[i,j1]
  2. Delete s [ i − 1 ] s[i-1]s[i1 ] , then the remaining subproblemdp [ i − 1 , j ] dp[i-1, j]dp[i1,j]
  3. General s [ i − 1 ] s[i-1]s[i1 ] is replaced byt [ j − 1 ] t[j-1]t[j1 ] , then the remaining subproblemdp [ i − 1 , j − 1 ] dp[i-1, j-1]dp[i1,j1]

Insert image description here

Based on the above analysis, the optimal substructure can be obtained: dp [i, j] dp[i, j]dp[i,The minimum number of editing steps for j ] is equal to dp [ i , j − 1 ] dp[i, j-1]dp[i,j1] d p [ i − 1 , j ] dp[i-1, j] dp[i1,j] d p [ i − 1 , j − 1 ] dp[i-1, j-1] dp[i1,j1 ] The minimum number of editing steps among the three, plus the number of editing steps this time1 11 . The corresponding state transition equation is:

d p [ i , j ] = min ⁡ ( d p [ i , j − 1 ] , d p [ i − 1 , j ] , d p [ i − 1 , j − 1 ] ) + 1 dp[i, j] = \min(dp[i, j-1], dp[i-1, j], dp[i-1, j-1]) + 1 dp[i,j]=min(dp[i,j1],dp[i1,j],dp[i1,j1])+1

Please note that when s [ i − 1 ] s[i-1]s[i1 ] andt [ j − 1 ] t[j-1]t[j1 ] are the same, there is no need to edit the current character. In this case, the state transition equation is:

d p [ i , j ] = d p [ i − 1 , j − 1 ] dp[i, j] = dp[i-1, j-1] dp[i,j]=dp[i1,j1]

Step 3: Determine boundary conditions and state transition sequence

When both strings are empty, the number of editing steps is 0 00 , that is,dp [0, 0] = 0 dp[0, 0] = 0dp[0,0]=0 . Whensss is empty butttWhen t is not empty, the minimum number of editing steps is equal tottThe length of t , that is, the first rowdp [0, j] = j dp[0, j] = jdp[0,j]=j . Whensss is not empty butttWhen t is empty, it is equal tossThe length of s , that is, the first columndp [i, 0] = i dp[i, 0] = idp[i,0]=i

Observe the state transition equation and solve dp [i, j] dp[i, j]dp[i,j ] relies on the solutions on the left, above, and above left, so the entiredp dpd p ​​table is enough.

Code

=== “Python”

```python title="edit_distance.py"
[class]{}-[func]{edit_distance_dp}
```

=== “C++”

```cpp title="edit_distance.cpp"
[class]{}-[func]{editDistanceDP}
```

=== “Java”

```java title="edit_distance.java"
[class]{edit_distance}-[func]{editDistanceDP}
```

=== “C#”

```csharp title="edit_distance.cs"
[class]{edit_distance}-[func]{editDistanceDP}
```

=== “Go”

```go title="edit_distance.go"
[class]{}-[func]{editDistanceDP}
```

=== “Swift”

```swift title="edit_distance.swift"
[class]{}-[func]{editDistanceDP}
```

=== “JS”

```javascript title="edit_distance.js"
[class]{}-[func]{editDistanceDP}
```

=== “TS”

```typescript title="edit_distance.ts"
[class]{}-[func]{editDistanceDP}
```

=== “Dart”

```dart title="edit_distance.dart"
[class]{}-[func]{editDistanceDP}
```

=== “Rust”

```rust title="edit_distance.rs"
[class]{}-[func]{edit_distance_dp}
```

=== “C”

```c title="edit_distance.c"
[class]{}-[func]{editDistanceDP}
```

=== “Zig”

```zig title="edit_distance.zig"
[class]{}-[func]{editDistanceDP}
```

As shown in the figure below, the state transition process of the edit distance problem is very similar to the knapsack problem, and both can be regarded as the process of filling in a two-dimensional grid.

=== “<1>”
Insert image description here

=== “<2>”
Insert image description here

=== “<3>”
Insert image description here

=== “<4>”
Insert image description here

=== “<5>”
Insert image description here

=== “<6>”
Insert image description here

=== “<7>”
Insert image description here

=== “<8>”
Insert image description here

=== “<9>”
Insert image description here

=== “<10>”
Insert image description here

=== “<11>”
Insert image description here

=== “<12>”
Insert image description here

=== “<13>”
Insert image description here

=== “<14>”
Insert image description here

=== “<15>”
Insert image description here

space optimization

Since dp [ i , j ] dp[i,j]dp[i,j ] is given by the upperdp [ i − 1 , j ] dp[i-1, j]dp[i1,j ] , leftdp [ i , j − 1 ] dp[i, j-1]dp[i,j1 ] , upper left statedp [ i − 1 , j − 1 ] dp[i-1, j-1]dp[i1,j1 ] is transferred, and the forward order traversal will lose the upper leftdp [ i − 1 , j − 1 ] dp[i-1, j-1]dp[i1,j1 ] , reverse order traversal cannot constructdp [ i , j − 1 ] dp[i, j-1]dp[i,j1 ] , so both traversal orders are undesirable.

To do this, we can use a variable leftupto temporarily store the upper left solution dp [ i − 1 , j − 1 ] dp[i-1, j-1]dp[i1,j1 ] , so only the left and upper solutions need to be considered. The situation at this time is the same as the complete knapsack problem, and forward order traversal can be used.

=== “Python”

```python title="edit_distance.py"
[class]{}-[func]{edit_distance_dp_comp}
```

=== “C++”

```cpp title="edit_distance.cpp"
[class]{}-[func]{editDistanceDPComp}
```

=== “Java”

```java title="edit_distance.java"
[class]{edit_distance}-[func]{editDistanceDPComp}
```

=== “C#”

```csharp title="edit_distance.cs"
[class]{edit_distance}-[func]{editDistanceDPComp}
```

=== “Go”

```go title="edit_distance.go"
[class]{}-[func]{editDistanceDPComp}
```

=== “Swift”

```swift title="edit_distance.swift"
[class]{}-[func]{editDistanceDPComp}
```

=== “JS”

```javascript title="edit_distance.js"
[class]{}-[func]{editDistanceDPComp}
```

=== “TS”

```typescript title="edit_distance.ts"
[class]{}-[func]{editDistanceDPComp}
```

=== “Dart”

```dart title="edit_distance.dart"
[class]{}-[func]{editDistanceDPComp}
```

=== “Rust”

```rust title="edit_distance.rs"
[class]{}-[func]{edit_distance_dp_comp}
```

=== “C”

```c title="edit_distance.c"
[class]{}-[func]{editDistanceDPComp}
```

=== “Zig”

```zig title="edit_distance.zig"
[class]{}-[func]{editDistanceDPComp}
```

Guess you like

Origin blog.csdn.net/zy_dreamer/article/details/132923802