Python implements subsequence series problem - longest common subsequence

Topic description: Given two sequences X={x1, x2, x3, ...xm} and Y={y1, y2, y3, ... yn}, find the longest common subsequence of X and Y.

Analysis: If the brute force search method is used, all subsequences of X need to be exhausted and then compared with all subsequences of Y respectively, so as to filter out LCS. X has a total of 2^m subsequences, so the complexity of brute force search must be exponential, which is obviously not practical. Then can we deduce the subsequences of X and Y by analyzing the results of the prefix subsequences of X and Y?

Suppose a prefix subsequence of X Xi = {x1, x2, x3, ... , xi}, a prefix subsequence of Y Yi = {y1, y2, y3, ... , yi}, and we assume that it is known The LCS of Xi and Yi is kij. So what is the LCS of X(i+1) and Y(i+1)? Let's assume its LCS is k(i+1)(j+1). After a little thinking, it is easy to find that there are two cases: (1) If X(i+1) = Y(i+1), then obviously k(i+1)(j+1) = kij + 1 (2) If X(i+1) != Y(i+1), then k(i+1)(j+1) = max(k(i+1)j, ki(j+1)) see here for dynamic Students who are familiar with planning usually find that this seems to be in line with the problem-solving characteristics of dynamic programming! Let's continue to analyze this problem with the problem-solving idea of dynamic programming:

Step 1. Sub-problem: To find the LCS of Xi and Yj, we must first find the LCS of X(i-1) and Y(j-1), and the LCS of X(i) and Y(j-1) and the LCS of X(i-1) and Y(j), thus forming a recursive problem

Step 2. Find the state transition formula of dynamic programming

Suppose we use an array c[i,j] to record the LCS lengths of Xi and Yj, then (1) c[i, j] = 0 if i = 0 or j = 0 (2)c[i-1, j -1] + 1 if i, j>0 and X[i] = Y[j] (3)max(c[i-1, j], c[i, j-1]) if i, j>0 and X[i] != Y[j]

Step 3. Write the code according to the formula

(1) From the formula in step 2, we can easily write the recursive algorithm:

#Recursively find LCS
def LCS_Length(X, Y, i, j):
    if i < 0 or j < 0: # Judge the recursive exit
        return 0
    else:
        if X[i] == Y[j]:
            return (LCS_Length(X, Y, i-1, j-1) + 1)
        return max(LCS_Length(X, Y, i, j-1), LCS_Length(X, Y, i-1, j))

(2) Recursive conversion into a bottom-up dynamic programming algorithm

#Dynamic programming for LCS
def LCS_Length2(X, Y):
    m = len(X)
    n = len (Y)
    #record list is used to record the LCS length of Xi and Yj
    record = [[0 for i in range(n)] for j in range(m)]
    #The outer loop starts from i = 0 and calculates record[i, j] in turn,
    #Calculation order: [0,0],[0,1],[0,2]...., [1,0],[1,1],[1,2]....
    #So when solving record[i, j], we have saved record[i-1, j-1], record[i, j-1], record[i-1,j] (the key to solving the problem)
    for i in range(m):
        for j in range(n):
            if X[i] == Y[j]:
                if i>0 and j>0:
                    record[i][j] = record[i-1][j-1] + 1
                else:
                    record[i][j] = 1
            else:
                #Pay attention to judging the boundary conditions here, that is, whether i, j are equal to 0
                if i == 0 and j>0:
                    record[i][j] = record[i][j-1]
                elif i > 0 and j==0:
                    record[i][j] = record[i-1][j]
                else:
                    record[i][j] = max(record[i-1][j], record[i][j-1])
    #return an array of records LCS                
    return record

Step 4. Refactor the solution to the problem

After writing the code, we found that we seem to have missed a problem, that is: the above code only helps us to find the length of the LCS, how do we reconstruct the solution of the LCS problem? That is, how to output the LCS instead of just finding the length of the LCS.

Let's re-analyze the formula in step 1. We are based on whether X[i] and X[j] are equal, and then pass record[i-1, j-1], record[i, j-1] or record[i -1, j] derives record[i, j]. Can we now reversely determine X by comparing the values of record[i, j] and (record[i-1, j-1], record[i, j-1], record[i-1, j]) Are the values of [i] and Y[j] equal? The answer is yes. Code directly below:

#Print LCS, because the recursive function is used, the LCS output order is just the same as the actual situation
def Print_LCS(record, X, i, j):
    #recursive exit
    if i==0 or j==0:
        return
    #At this time X[i] = Y[j], so X[i] is in LCS, output X[i]
    if record[i][j] == record[i-1][j-1] + 1:
        Print_LCS(record, X, i-1, j-1)
        print(X[i], end = '')
    #Discuss X[i] separately below! = Y[j] in both cases
    elif record[i][j] == record[i-1][j]:
        Print_LCS(record, X, i-1, j)
    else:
        Print_LCS(record, X, i, j-1)

Problem- solving idea: analyze the problem, divide the original problem into several sub-problems, and deduce the solution of the original problem through the solutions of the sub-problems, so as to find that the problem can be solved by dynamic programming. Then use the problem-solving steps of dynamic programming, find out the state transition formula, write code through the formula and reconstruct the solution of the original problem!

Algorithm optimization: If this problem is solved by a bottom-up dynamic programming algorithm, the time complexity is O(n^2), and the space complexity is O(n*n). But by analyzing the formula, we can see that when solving record[i, j], only the two lines of record[i-1] and record[i] are used, so we can replace the original n with a 2*n list *n list.

Python implements subsequence series problem - longest common subsequence

Guess you like