Experiment 3: Dynamic programming algorithm and greedy algorithm

Experiment 3: Dynamic programming algorithm and greedy algorithm

Design and realize the longest common subsequence problem with the idea of ​​dynamic programming, design and realize the problem of activity arrangement with the idea of ​​greedy, and conduct comparative analysis of experiments with different data volumes, which requires analyzing the time complexity of the algorithm and forming an analysis report.

  1. Problem Description

(1) Dynamic programming algorithm

The dynamic programming algorithm, Dynamic Programming referred to as DP, is usually based on a recursive formula and one or more initial states. The solution to the current subproblem will be derived from the solution to the previous subproblem. Using dynamic programming to solve problems requires only polynomial time complexity, so it is much faster than backtracking, brute force, etc.

 

Dynamic programming is often suitable for problems with overlapping subproblems and optimal substructure properties, and the time spent by dynamic programming methods is often much less than that of naive solutions.

The basic idea behind dynamic programming is very simple. Roughly, to solve a given problem, we need to solve its different parts (i.e., subproblems), and then combine the solutions of the subproblems to obtain the solution of the original problem.

 

Usually many sub-problems are very similar, so the dynamic programming method tries to solve each sub-problem only once, thereby reducing the amount of calculation: once the solution of a given sub-problem has been calculated, it is memorized and stored so that the same sub-problem is needed next time Look up the table directly when solving. This approach is especially useful when the number of repeated subproblems grows exponentially with the size of the input.

(2) Longest Common Subsequence Problem (LCS)

Longest Common Subsequence (LCS) is a problem of finding the longest subsequence among all sequences in a set of sequences (usually two sequences). A sequence, if it is a subsequence of two or more known sequences and is the longest among all sequences that meet this condition, is called the longest common subsequence of known sequences. [1]

The longest common subsequence problem is a classic computer science problem and is the basis of data comparison programs, such as Diff tools, and bioinformatics applications. It is also widely used in version control, such as Git to reconcile changes between files.

(3) Greedy algorithm

The so-called greedy algorithm means that when solving a problem, it always makes the best choice at present. That is to say, without considering the overall optimality, what it makes is only a local optimal solution in a certain sense.

The greedy algorithm has no fixed algorithm framework, and the key to algorithm design is the choice of greedy strategy. It must be noted that the greedy algorithm cannot obtain the overall optimal solution for all problems, and the selected greedy strategy must have no aftereffect (that is, the process after a certain state will not affect the previous state, only related to the current state. )

(4) Activities Arrangement Issues

There are n activities that need to use the same resource, and only one activity can use the resource in the same period. Each activity i has a start time si and an end time ei (si < ei), and if activity i is scheduled, it occupies resources within the time interval [si, ei). If the interval [si, ei) does not intersect with the interval [sj, ej), then activity i is said to be compatible with activity j.

The activity scheduling problem is to arrange activities so that as many activities as possible are compatible. The essence is to select the largest compatible activity sub-set in the given activity set.

  1. Purpose

Deepen the understanding of the basic ideas of dynamic programming thought design method and greedy thought design method.

 

  1. Experimental principle

Describe the idea of ​​divide and conquer and how to use it to solve problems, including complexity analysis.

  1. longest common subsequence problem

A subsequence of a given sequence is a sequence obtained by deleting elements from the sequence.

A sequence that is a subsequence of multiple sequences at the same time is the common subsequence of these multiple sequences.

Whereas the longest common subsequence problem is restricted to two sequences.

     ① Optimal substructure property: The longest common subsequence has the optimal substructure property. This brings up a method that does it recursively:

    For X={x1,x2,x3,...,xm}, Y={y1,y2,y3,...,ym}:

    When Xm=yn, find the longest common subsequence corresponding to the number of digits minus one, and add xm or yn to the end, which is the longest common subsequence of the two sequences.

    When Xm!=yn, two subproblems must be solved, that is, to find out:

        Xm-1, the longest common subsequence of Y

        The longest common subsequence of X, Yn-1

    Compare and take the longest.

② Solution:

Data structure: Set up two two-dimensional arrays, one is used to recursively find the subsequence, and the other is used to record the evaluation method.

Function: LCSLength is used to calculate the optimal value, and LCS is used to print the subsequence.

 

Time complexity: O(mn)+O(m+n)

(2) Activities Arrangement Issues

There are many possible forms of data input for the event scheduling problem, and here I use the form of a two-dimensional array (list). A nested list, the first number in the sublist is the activity start time, and the second is the end time. The list is not given in order.

 

The idea of ​​the solution is the embodiment of the greedy algorithm idea, that is, the difference between the start time and the first time is calculated one by one. Add new activities. And use the end time of the new activity as a standard. Until the last activity is traversed.

 

Time complexity: The time complexity of sorting In addition, the time complexity of the activity scheduling function itself is O(n)

  1. experimental design

4.1 Longest Common Substring

LCS.c consists of 3 functions:

Their relationship is as follows:

4.2 Activity Arrangement

It consists of a sorting function, an event scheduling function, and a main function (including the random number generation part):

Their relationship is as follows:

  1. Experimental results and analysis

5.1 Longest Common Subsequence

I run the program with the example of p54 in the book, and the results are as follows:

This data is obviously not enough for testing. So I decided to increase the amount of data.

 

I added a function to this file that generates random strings. The principle of the function is similar to the previous one, except that this time, uppercase and lowercase letters and Arabic numerals are put into a string, a total of 62. Then generate random numbers to extract the letters and splicing them into the string.

Number of string

LCSLength time (ms)

LCS time (ms)

50

0

0

100

0

0

150

0

0

200

0

0

250

0

0

300

0

0

350

0

0

400

0

0

450

0

0

500

1

0

The following is a screenshot of the test with 50 characters:

The following is a screenshot of the test with 500 characters:

Programs with 600 characters and beyond will generate pointer out of bounds or memory overflow errors:

5.2 Activity arrangement

The sorting time is meaningless (and I use bubble sorting, which takes a long time), so the focus is on the time of the activity arrangement:

Number of activities

Event schedule time (ns)

100

0

200

0

300

0

400

0

500

0

600

0

700

0

800

0

900

0

1000

0

Here's the record at 200 events:

100 levels of activity didn't seem to make the algorithm show a performance gap, so I decided to start at 1000 and do it again in increments of 500.

Number of data Event schedule time 1 (ms) Event schedule time 2 (ms) Event schedule time 3 (ms) Event scheduling time (ms)
1000 0 0 0 0
1500 0 0 0 0
2000 0 0 0 0
2500 0 0 0 0
3000 985400 0 0 328466.6667
3500 0 1018700 0 339566.6667
4000 998600 996900 0 665166.6667
4500 0 995900 0 331966.6667
5000 1001000 1008000 0 669666.6667
5500 998100 964700 998000 986933.3333
6000 1002500 1001000 996500 1000000
6500 999000 965000 997100 987033.3333
7000 930900 997500 991700 973366.6667
7500 996100 999100 999800 998333.3333
8000 997700 997000 1000000 998233.3333
8500 997200 998200 1000100 998500
9000 1995400 998000 1997300 1663566.667
9500 1994000 1001400 1000100 1331833.333
10000 996800 997700 995300 996600

The time becomes longer as the number of activities increases, which basically conforms to O(n)

Here's the schedule output for 10 000 events with random start and end times:

  1. in conclusion

6.1 The Longest Common Subsequence Problem

The time complexity of longest common subsequence is O(mn)+O(m+n). This solution applies the idea of ​​dynamic programming.

From the experimental data. Because the data are too small, it seems that this conclusion cannot be accurately verified. Subsequent code improvement is required to verify.

In addition, the algorithm itself can be optimized twice. If the purpose of the experiment is to find the length of the longest common subsequence without outputting the longest common subsequence. The current design can be replaced with a two-line array space. Reduce space requirement to O(min(m,n))

6.2 Event Arrangement Issues

Because the timing function of C language can only be accurate to ms at most, and the execution time of the program is usually less than this order of magnitude. So I decided to use python instead of c language to complete the event scheduling problem.

The time complexity of the activity scheduling problem depends on the time complexity of the sort and the time complexity of the scheduling main function itself. In the experiment, I used bubble sort, which is a relatively slow sorting method. However, the test of the main function of the activity arrangement is relatively smooth. The time complexity of the activity scheduling main function is O(n)

  1. program source code

7.1 Longest Common Subsequence Problem (c)

#include <stdio.h>
#include <string.h>
#include<time.h>
#include<stdlib.h>
#include<windows.h>
#define MAXLEN 601

void LCSLength(int m, int n, char *x, char *y, int c[][MAXLEN], int b[][MAXLEN])
{
    int i, j;

    for (i = 0; i <= m; i++)
        c[i][0] = 0;
    for (j = 1; j <= n; j++)
        c[0][j] = 0;
    for (i = 1; i <= m; i++)
    {
        for (j = 1; j <= n; j++)
        {
            if (x[i - 1] == y[j - 1])
            {
                c[i][j] = c[i - 1][j - 1] + 1;
                b[i][j] = 1;
            }
            else if (c[i - 1][j] >= c[i][j - 1])
            {
                c[i][j] = c[i - 1][j];
                b[i][j] = 3;
            }
            else
            {
                c[i][j] = c[i][j - 1];
                b[i][j] = 2;
            }
        }
    }
}

void LCS(int i, int j, char *x, int b[][MAXLEN])
{
    if (i == 0 || j == 0)
        return;
    if (b[i][j] == 1)
    {
        LCS(i - 1, j - 1, x, b);
        printf("%c ", x[i - 1]);
    }
    else if (b[i][j] == 3)
    {
        LCS(i - 1, j, x, b);
    }
    else
    {
        LCS(i, j - 1, x, b);
    }
}

void genRandomString(char* buff, int length)
{
    char metachar[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
    int i = 0;
    srand((unsigned) time(NULL));   //用时间做种,每次产生随机 数不一样
    for (i = 0; i < length; i++)
    {
        buff[i] = metachar[rand() % 62];
    }

    buff[length] = '\0';

}

int main()
{
    char x[600]={0};
    genRandomString(x,600);
    printf("%s",x);
    printf("\n============================\n");
    Sleep(1000);
    char y[600]={0};
    genRandomString(y,600);
    printf("%s",y);
    printf("\n============================\n");

    //char x[MAXLEN] = {"ABCBDAB"};
    //char y[MAXLEN] = {"BDCABA"};

    int c[MAXLEN][MAXLEN];
    int b[MAXLEN][MAXLEN];

    int len_x, len_y;

    len_x = strlen(x);
    len_y = strlen(y);

    //time test
    clock_t stime1 = clock();
    LCSLength(len_x, len_y, x, y, c, b);
    clock_t etime1 = clock();

    clock_t stime2=clock();
    LCS(len_x, len_y, x, b);
    clock_t etime2=clock();

    printf("\nLCSLength time is %d ms\n",etime1-stime1);
    printf("LCS time is %d ms",etime2-stime2);

    return 0;
}

 

7.2 Activity scheduling problem (python)

import time
import random

def Order(activities,n):
    for i in range(0,n):
        for j in reversed(range(i+1,n)):
            if activities[j][1]<activities[j-1][1]:
                activities[j],activities[j-1]=activities[j-1],activities[j]

def ActivitiesArrange(a,n):
    b =[0]
    end=a[0][1]
    for i in range(1,n):
        if a[i][0]>=end:
            b.append(i)
            end=a[i][1]
    return b

if __name__ == '__main__':

    n=10000

    activities=[[0 for col in range(2)] for row in range(n)]
    for k in range(n):
        startt = random.randint(0,n)
        activities[k][0]=startt
        endtt=random.randint(0,n)
        activities[k][1]=endtt
        for p in activities:
            if p[0]>=p[1]:
                p[0],p[1]=p[1],p[0]
    # for q in activities:
    #     print(q)
        
    start_order=time.time_ns()
    Order(activities,n)
    end_order=time.time_ns()

    start_aa=time.time_ns()
    b=ActivitiesArrange(activities,n)
    end_aa=time.time_ns()

    print("Total arranged activities: "+str(len(b))+".\n")
    for i in b:
        print(activities[i])
    print("\nOrder time is :%s ns"%(end_order-start_order))
    print("\nActivity arrange time is :%s ns"%(end_aa-start_aa))

Project source code: github address

Guess you like

Origin blog.csdn.net/qq_37387199/article/details/109722040