Detailed explanation of the longest common subsequence problem (three methods)

Here, for a more convenient explanation, I will take a typical problem on Luogu as an example to explain several common methods for dealing with the longest common subsequence problem. This question stipulates that the lengths of the two subsequences are equal. If you encounter an unequal situation, you only need to modify the length slightly, and the algorithm idea remains unchanged.

topic description

Given two permutations A and B of 1, 2,..., n, find their longest common subsequence.

input format

The first line is a number n.

The next two lines, each with n numbers, is a permutation of natural numbers 1, 2, ..., n.

output format

A number that is the length of the longest common subsequence.

Sample input 

3 2 1 4 5
1 2 3 4 5

Sample output 
3

hint

- n <= 10^3 for 50% of the data;
- n <= 10^5 for 100% of the data.

Approach 1: Conventional Dynamic Programming

To solve this problem, dynamic programming must be used. Since dynamic programming is to be used, it is necessary to know the state transition equation. We let L[i][j] represent the length of the longest common subsequence of sequence A and sequence B, then the state transition equation is as follows:

Young a[i] =b[j], L[i][j] =L[i-1][j-1] +1

若a[i]\neqb[j], 则 L[i][j]=max (L[i][j-1],L[i-1][j])

The whole process is expressed in the form of a table as follows: (here 3 2 1 4 5 and 1 2 3 4 5 are taken as examples)

i\j 0 3 2 1 4 5
0 0 0 0 0 0 0
1 0 0 0 1 1 1
2 0 0 1 1 1 1
3 0 1 1 1 1 1
4 0 1 1 1 2 2
5 0 1 1 1 2 3

The process of filling the form is equivalent to the process of solving the problem (the initial value of the 0th row and the 0th column are both 0), we use the 0th row as a reference, first fill the 1st row from left to right; then use the 1st row as a reference , fill the second row from left to right; and so on, when the form is filled, the answer will come out (that is, L[n][n]) .

code show as below:

# include <iostream>

using namespace std;

const int maxn = 1e3 + 10;
int n;
int A[maxn];
int B[maxn];
int L[maxn][maxn];

int main()
{
	cin >> n;
	for (int i = 1; i <= n; i++) {
		cin >> A[i];
	}
	for (int i = 1; i <= n; i++) {
		cin >> B[i];
	}
	for (int i = 1; i <= n; i++) {
		for (int j = 1; j <= n; j++) {
			//对应状态转移方程
			if (A[i] == B[j]) {
				L[i][j] = L[i - 1][j - 1] + 1;
			}
			else {
				L[i][j] = max(L[i - 1][j], L[i][j - 1]);
			}
		}
	}
	cout << L[n][n] << endl;
	return 0;
}

This method is the most basic method. It is easy to see that its time complexity is O(n^2); but this method has a disadvantage, that is, the space requirement is very high, because we have created a two-dimensional array L, so the space complexity is O(n ^2) , if the value of n is relatively large, then we cannot create an L array. Therefore, a space-saving method is given below.

Approach 2: Improving conventional dynamic programming

Our algorithm idea is basically the same as before, except that we need to change the two-dimensional array L into a one-dimensional array. The idea of ​​realization is as follows: In the process of filling the form, we can find that when we fill in a certain row, we actually only need to use the array of the previous row as a reference, and other parts of the form are useless. Therefore, we thought that we can only create a one-dimensional array L to save the previous row of data that needs to be used as a reference; use a variable ans to save the calculated new value that needs to be filled in the form; while filling in the current row of data, update the array The part of L that has been traversed (not used later) is the data of the current row (equivalent to gradually filling the data of the current row into L); in this way, when filling in the next row of data, L has also been updated as a new reference row . The final ans is equivalent to the bottom right corner of the original table, which is the final answer.

The improved code is as follows:

# include <iostream>

using namespace std;

const int maxn = 1e5 + 10;
int n;
int A[maxn];
int B[maxn];
int L[maxn];

int main()
{
	cin >> n;
	for (int i = 1; i <= n; i++) {
		cin >> A[i];
	}
	for (int i = 1; i <= n; i++) {
		cin >> B[i];
	}
	int ans = 0, t;
	for (int i = 1; i <= n; i++) {
		ans = 0;
		for (int j = 1; j <= n; j++) {
			t = ans;  //提前记录上一个ans的值
			if (A[i] == B[j]) {
				ans = L[j - 1] + 1;
			}
			else {
				ans = max(ans, L[j]);
			}
			//对已经遍历过的地方将L更新为下一行的值
			L[j - 1] = t;  
		}
		L[n] = ans;  
	}
	//运行到最后,ans便是原二维数组最右下角的结果
	cout << ans << endl;
	return 0;
}

Method 2 and method 1 have basically the same algorithm ideas, and the time complexity is also O(n^2), but the space complexity of method 2 is only O(n) , which is obviously better than method 2 (of course, a certain problem When the required space is not large, we still prefer method 1, because method 1 is easier to write).

However, the time complexity of the above two methods is O(n^2). In some cases where the time limit is relatively high, it is not applicable , so we propose the following method.

Method 3: Use another dynamic programming skillfully

The above algorithm for solving the longest common subsequence problem can be referred to as LCS for short. We have another neat way to solve this kind of problem, which is to convert LCS to LIS. What is LIS? LIS is an algorithm for solving the longest increasing (or not descending) subsequence. The core idea of ​​LIS algorithm is also dynamic programming. Let's talk about the transformation process first:

The premise of conversion is that the data range of sequence A and sequence B must be the same

We still take 3 2 1 4 5 and 1 2 3 4 5 as an example

A: 3 2 1 4 5

B: 1 2 3 4 5

We change the data in A into 1, 2, 3, 4, 5 in order (increasing order), that is, 3 -> 1, 2 -> 2, 1 -> 3, 4 -> 4, 5 -> 5; then B is converted according to the conversion rules of A, so it becomes:

A: 1 2 3 4 5
B: 3 2 1 4 5

After such labeling, the length of the sequence will obviously not change. But there is a property: the subsequence of two sequences must be a subsequence of A. And A itself is increasing, so this subsequence is increasing. In other words, as long as this subsequence is incremented in B, it is a subsequence of A. Therefore, the problem is transformed into finding the longest increasing subsequence in B.

You may think that such transformation is superfluous, but please note that to solve the longest increasing subsequence class problem, the time complexity can reach O(nlogn) ; that is to say, in this way, we can solve the longest common subsequence The time complexity of the problem is reduced to O(nlogn), so that the time limit can be avoided when dealing with related problems.

But a new problem arises again, how to solve the longest increasing subsequence problem in O(nlogn) time complexity? Here, I refer to an explanation given by others:

Let's take the sequence 5 2 3 1 4 as an example

First, add 5 to the answer sequence, then traverse to 2, and find that 2<5, so we replace 5 with 2; then add 3, and find that 3>2, so directly add 3 to the answer sequence, at this time [ 2,3] ; Then traverse to 1, we find that 1<3, so we find the smallest number 2 that is larger than 1, and then replace 1 with 2, why does this not affect the result? You can think of it this way, we have already found a current optimal sequence, if we replace 2 with 1, and then replace 3 with a number, then we can get a better sequence, and if there is no digital replacement 3, then the replacement of 1 with 2 has no contribution and will not affect the optimality of our results. In addition, when solving problems, you can directly use the lower_bound function of STL to find a smallest number but greater than a certain number.

code show as below:

# include <iostream>
# include <vector>
# include <map>

using namespace std;

const int maxn = 1e5 + 10;
int n;
map<int, int>m;
int B[maxn];

int main()
{
	cin >> n;
	int a;
	for (int i = 1; i <= n; i++) {
		cin >> a;
		m[a] = i;
	}
	int b;
	for (int i = 1; i <= n; i++) {
		cin >> b;
		//按照A的转化规则,转化B
		B[i] = m[b];
	}
	//序列C用于保存当前的最优解
	vector<int>C;
	C.push_back(0);
	int len = 0; //保存最终结果
	for (int i = 1; i <= n; i++) {
		if (B[i] > C[len]) {
			C.push_back(B[i]);
			len++;
		}
		else {
			C[lower_bound(C.begin(), C.end(), B[i]) - C.begin()] = B[i];
		}
	}
	cout << len << endl;
	return 0;
}

In this way, the time complexity is reduced to O(nlogn). For the question I gave above, only by using this method will the time limit not be exceeded. The first two can only get half of the points.

Summarize:

Here, I give three methods to solve the longest common subsequence. You can choose what you need according to the actual problem. The above is my opinion, and I am happy to share it with you.

Guess you like

Origin blog.csdn.net/CXR_XC/article/details/129865214