maximum consecutive subsequence and

Reprinted in: https://www.cnblogs.com/conw/p/5896155.html

 

 

The maximum consecutive subsequence and a very classic algorithm problem, given a sequence, which may have positive numbers or negative numbers, our task is to find a continuous subsequence (no empty sequence is allowed), so that their sum as big as possible. Together, we use a variety of ways to gradually optimize to solve this problem.

In order to understand the problem more clearly, first let's look at a set of data:
8
-2 6 -1 5 4 -7 2 3
the 8 in the first line means that the length of the sequence is 8, and then there are 8 numbers in the second line, that is, the sequence to be calculated.
For this sequence, our answer should be 14that the selected sequence is from the 2nd to the 5th number, and the sum of these 4 numbers is the largest of all the sub-sequences.

The most violent approach, complexity O(N^3)

The brute force solution is also easy to understand. In short, we only need to enumerate the start and end points with two layers of loops, so that we try all the subsequences, then calculate the sum of each subsequence, and then find the largest of them, C The language codes are as follows:

copy code
#include <stdio.h>

//N is the length of the array, num is the array to be calculated, it is placed in the global area because it can open a large array
int N, num[1024];

intmain()
{
    //Input data
    scanf("%d", &N);
    for(int i = 1; i <= N; i++)
        scanf("%d", &num[i]);
    
    int ans = num[1]; //ans saves the maximum subsequence sum, initialized to num[1] to ensure the final result is correct
    //i and j are the start and end points of the enumerated subsequences, respectively, and the loop where k is located calculates the sum of each subsequence
    for(int i = 1; i <= N; i++) {
        for(int j = i; j <= N; j++) {
            int s = 0;
            for(int k = i; k <= j; k++) {
                s += num[k];
            }
            if(s > ans) ans = s;
        }
    }
    printf("%d\n", ans);

    return 0;
}
copy code

 

The time complexity of this algorithm is O(N^3). For the calculation method of the complexity, please refer to Chapter 1 of "Introduction to Algorithms". If our computer can calculate 100 million times per second, this algorithm can only be calculated in one second. Calculate the answer for a sequence of about 500 lengths.

prefix and optimization

If you understand the program just now, we can look at a simple optimization.
If we have such an array sum, sum[i]it represents the sum of the 1st to ith numbers. So how do we quickly calculate the sum of the i-th to j-th sequence? Yes, just use sum[j] - sum[i-1]it! In this way, we can save the innermost loop and make our program more efficient! The C language code is as follows:

copy code
#include <stdio.h>

//N is the length of the array, num is the array to be calculated, sum is the prefix sum of the array, and it is placed in the global area because it can open a large array
int N, num[16384], sum[16384];

intmain()
{
    //Input data
    scanf("%d", &N);
    for(int i = 1; i <= N; i++)
        scanf("%d", &num[i]);
    
    // Calculate array prefix sum
    sum[0] = 0;
    for(int i = 1; i <= N; i++) {
        sum[i] = num[i] + sum[i - 1];
    }

    int ans = num[1]; //ans saves the maximum subsequence sum, initialized to num[1] to ensure the final result is correct
    //i and j are the start and end points of the enumerated subsequence, respectively
    for(int i = 1; i <= N; i++) {
        for(int j = i; j <= N; j++) {
            int s = sum[j] - sum[i - 1];
            if(s > ans) ans = s;
        }
    }
    printf("%d\n", ans);

    return 0;
}
copy code

 

The time complexity of this algorithm is O(N^2). If our computer can calculate 100 million times per second, this algorithm can calculate the answer of a sequence of about 10,000 lengths in one second, which is a great improvement over the previous program! In addition, we created a sum array in this program, in fact, this is also unnecessary, we can also calculate the array prefix sum directly in the num array, which can save some memory.

Change your thinking and continue to optimize

You should have heard of divide and conquer, which is exactly: divide and conquer. We have a very complex big problem, and it is difficult to solve it directly, but we found that we can divide the problem into sub-problems, and if the sub-problem is still too large, and it can be divided further, then continue to divide. Until the scale of these sub-problems is easy to solve, then solve all the sub-problems, and finally combine all the sub-problems, and we have the answer to the complex big problem. It may be easy to say, but I still don't know how to do it. Let's analyze this problem:
First, we can divide the entire sequence into two parts, the left and the right, and the answer will be in the following three cases:
1. The desired sequence is completely contained in the left half of the sequence.
2. The desired sequence is completely contained in the sequence in the right half.
3. The required sequence just straddles the split point, that is, the left and right sequences each occupy a part.
The first two cases are the same as the big problem, but on a smaller scale. If all three sub-problems can be solved, then the answer is the maximum of the three results. We mainly study how to solve the third situation:

We only need to calculate: the maximum continuous sequence sum to the left starting from the split point, and the maximum continuous sequence sum to the right starting from the split point, the sum of these two results is the answer to the third case. Since the starting point is known, both results can be calculated in O(N) time complexity.

Recursion keeps reducing the size of the problem until the sequence length is 1, and the answer is the number in the sequence.
In summary, the C language code is as follows, recursively implemented:

copy code
#include <stdio.h>

//N is the length of the array, num is the array to be calculated, it is placed in the global area because it can open a large array
int N, num[16777216];

int solve(int left, int right)
{
    //When the sequence length is 1
    if(left == right)
        return num[left];
    
    // Divide into two smaller problems
    int mid = left + right >> 1;
    int lans = solve(left, mid);
    int rans = solve(mid + 1, right);
    
    // the case of crossing the split point
    int sum = 0, lmax = num[mid], rmax = num[mid + 1];
    for(int i = mid; i >= left; i--) {
        sum += num[i];
        if(sum > lmax) lmax = sum;
    }
    sum = 0;
    for(int i = mid + 1; i <= right; i++) {
        sum += num[i];
        if(sum > rmax) rmax = sum;
    }

    //The answer is the maximum of the three cases
    int ans = lmax + rmax;
    if(lans > ans) ans = lans;
    if(rans > ans) ans = rans;

    return ans;
}

intmain()
{
    //Input data
    scanf("%d", &N);
    for(int i = 1; i <= N; i++)
        scanf("%d", &num[i]);

    printf("%d\n", solve(1, N));

    return 0;
}
copy code

 

It is not difficult to see that the time complexity of this algorithm is O(N*logN) (think merge sort). It can process millions of data in one second, and even tens of millions will not seem slow! This is the beauty of algorithms. If you are not familiar with recursion, you may have doubts about this algorithm, so you should think about it carefully.

The charm of dynamic programming, O(N) solution!

Many dynamic programming algorithms are very much like recursion in mathematics. If we can find a suitable recursion formula, we can solve the problem easily.
We use dp[n] to represent the sum of the largest continuous subsequence ending with the nth number, so there is the following recursive formula:
dp[n] = max(0, dp[n-1]) + num[n]
after careful thinking, it is not difficult to find that this recursive formula is correct, then the answer to the whole question is max(dp[m]) | m∈[1, N]. The C language code is as follows:

copy code
#include <stdio.h>

//N is the length of the array, num is the array to be calculated, it is placed in the global area because it can open a large array
int N, num[134217728];

intmain()
{
    //Input data
    scanf("%d", &N);
    for(int i = 1; i <= N; i++)
        scanf("%d", &num[i]);
    
    num[0] = 0;
    int ans = num[1];
    for(int i = 1; i <= N; i++) {
        if(num[i - 1] > 0) num[i] += num[i - 1];
        else num [i] + = 0;
        if(num[i] > ans) ans = num[i];
    }

    printf("%d\n", ans);

    return 0;
}
copy code

 

Here we do not create a dp array. According to the dependencies of the recursive formula, a single num array is enough to solve the problem. Creating an array of 100 million lengths takes hundreds of MB of memory! The time complexity of this algorithm is O(N), so it does not matter if it calculates a sequence of 100 million lengths! But if you really test the program with such a large amount of data, it will be very slow, because a lot of time is spent reading the data in the program!

Another way, another O(N) algorithm

Considering our previous O(N^2) algorithm, a simple optimization section, is there any way we can optimize this algorithm? The answer is yes!
We know a sum array, which sum[i]represents the sum of the 1st number to the ith number, so sum[j] - sum[i-1]it represents the sum of the ith number to the jth number.
So, what are the characteristics of the largest subsequence sum ending with the nth number? Suppose the starting point of this subsequence is m, so the result is sum[n] - sum[m-1]. And, it sum[m]must be sum[1],sum[2]...sum[n-1]the minimum value in ! In this way, if we maintain the previous minimum value when maintaining the sum array, then the answer will come out! To save memory, we still just use a num array. The C language code is as follows:

copy code
#include <stdio.h>

//N is the length of the array, num is the array to be calculated, it is placed in the global area because it can open a large array
int N, num[134217728];

intmain()
{
    //Input data
    scanf("%d", &N);
    for(int i = 1; i <= N; i++)
        scanf("%d", &num[i]);
    
    // Calculate the array prefix sum, and get the answer in the process
    num[0] = 0;
    int ans = num[1], lmin = 0;
    for(int i = 1; i <= N; i++) {
        num[i] += num[i - 1];
        if(num[i] - lmin > ans)
            ans = num [i] - lmin;
        if(num[i] < lmin)
            lmin = num[i];
    }

    printf("%d\n", ans);

    return 0;
}
copy code

 

It looks like we've solved the problem of the largest sum of consecutive subsequences perfectly, both time and space complexity are O(N), but we can indeed continue!

The avenue to simplicity, the perfect solution to the largest continuous subsequence and the problem

Obviously, the time complexity of the algorithm to solve this problem cannot be lower than O(N), because we have to calculate the sum of the entire sequence at least, but if the space complexity also reaches O(N), it is a bit unreasonable, let Let's get rid of the num array too!

copy code
#include <stdio.h>

intmain()
{
    int N, n, s, ans, m = 0;

    scanf("%d%d", &N, &n); //Read the length of the array and the first number in the sequence
    ans = s = n; //initialize ans to the first number in the sequence
    for(int i = 1; i < N; i++) {
        if(s < m) m = s;
        scanf("%d", &n);
        s += n;
        if(s - m > ans)
            years = s - m;
    }
    printf("%d\n", ans);

    return 0;
}
copy code

 

The principle of this program is the same as that described in Another O(N) algorithm , which maintains the previously obtained minimum value in the process of computing the prefix sum. Its time complexity is O(N) and space complexity is O(1), which reaches the theoretical lower bound! The only troublesome thing is the initialization value of ans, which cannot be directly initialized to 0, because the sequence may all be negative!

So far, the problem of the largest sum of consecutive subsequences has been perfectly solved by us! However, the algorithms introduced above only directly find the result of the problem, but cannot find out which sub-sequence it is. In fact, it is not complicated to solve this problem. How to do it is left to the readers to think about!

 

2018-04-18

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325987105&siteId=291194637