"Data Structure and Algorithm Analysis" Study Notes-Chapter 2-Algorithm Analysis

Analysis of Algorithms

If the algorithm to solve a problem is determined and proved to be correct by a certain proof method, then the running time of the algorithm and the space occupied by the running time must be judged next. This chapter mainly discusses

  • Estimate program run time
  • Reduce the running time of the program
  • Recursive risk
  • An effective algorithm for multiplying a number to get its power and calculating the greatest common factor of two numbers

2.1 Mathematical foundation

  1. If there are normal numbers c and n0 such that when N> = n0, T (N) <= cf (N), then it is recorded as T (N) = 0 (f (N)). Here we are talking about T (N) The growth trend does not exceed the growth trend of f (N). We often use the definition here for the time complexity, f (N) is also called the upper bound of T (N)
  2. If there are normal numbers c and n0 such that when N> = n0, T (N)> = cg (N), then it is recorded as T (N) = Ω (f (N)). Here we are talking about T (N) The growth trend of is not less than the growth trend of g (N). Here it is said that g (N) is the lower bound of T (N)
  3. T (N) = Θ (h (N)) if and only if T (N) = O (h (N)), and T (N) = Ω (h (N)). Here it is said that the growth trend of T (N) and g (N) are the same
  4. If T (N) = O (p (N)), and T (N)! = Θ (p (N)), then T (N) = o (p (N)). Here it is said that the growth trend of T (N) is always less than p (N). And there is no equality

The above statement is too obscure. Give a simple example. When g (N) = N ^ 2, g (N) = O (N ^ 3) and g (N) = O (N ^ 4) are all right. g (N) = Ω (N), g (N) = Ω (1) are also correct. g (N) = Θ (N ^ 2) means g (N) = O (N ^ 2), g (N) = Ω (N ^ 2). That is, the current result is most consistent with the growth trend of g (N) itself. as the picture shows:

There are three important rules to remember:

  1. If T1 (N) = O (f (N)), and T2 (N) = O (g (N)), then
    • T1(N) + T2(N) = max(O(f(N)), O(g(N))),
    • T1(N) * T2(N) = 0(f(N) * g(N))
  2. If T (N) is a polynomial of degree k, then T (N) = Θ (N ^ k)
  3. For any constant k, log ^ k N = O (N). It tells us that the logarithm grows very slowly

When using big O notation, keep high-order powers, discard constant terms and low-order powers. The function is classified by the growth rate as shown in the figure:

We can always determine the relative growth rates of the two functions f (N) and g (N) by calculating the limit lim f (N) / g (N) (n-> ∞). It can be calculated using the Lobita criterion.

  • The limit is 0, then f (N) = o (g (N))
  • The limit is c and c! = 0, then f (N) = Θ (g (N))
  • The limit is ∞, then g (N) = o (f (N))
  • Limit swing: the two have nothing to do

For example, the relative growth rate of f (N) = NlogN and g (N) = N ^ 1.5 can be calculated as f (N) / g (N) = logN / N ^ 0.5 = log ^ 2 N / N. And because N grows faster than any power of logN. So the growth of g (N) is faster than the growth of f (N)

Lupida criterion: If lim f (N) = ∞ (n-> ∞) and lim g (N) = ∞ (n-> ∞). Then lim f (N) / g (N) = lim f '( N) / g '(N) (n-> ∞).

2.2 Model

To facilitate problem analysis, we assume a model computer. It executes any basic instruction consumes a unit of time, and assumes it has unlimited memory.

2.3 Problems to be analyzed

  1. If it is a small input, it is not worth spending a lot of time to design a smart algorithm
  2. The reading of data is a bottleneck. Once the data is read in, good algorithm problems will be solved quickly. Therefore, it is important to make the algorithm effective enough so as not to become the bottleneck of the problem

2.4 Run time calculation

2.4.1 Examples

  • If the two algorithms take roughly the same time, the best way to determine which program is faster is to code them and run
  • To simplify the analysis, we use Big O notation to calculate the running time, Big O is an upper bound. Therefore, the analysis result is to provide a guarantee that the program can be completed within the specified time in the worst case. The procedure may end early, but it will not be delayed
// 书上例程
// 计算i^3的累加求和
int sum (int N)
{
    int i, PartialSum;
    PartialSum = 0;             /*1*/
    for(i = 1; i <= N; i++)     /*2*/
        PartialSum += i * i * i;/*3*/
    return PartialSum;          /*4*/
}

Here for each line of analysis:

  1. Spend 1 time unit: 1 assignment
  2. Spend 1 + N + 1 + N = 2N + 2 time units: 1 assignment, N + 1 judgments, N additions
  3. Cost N (2 + 1 + 1) = 4N time units: 2 multiplications, 1 addition, 1 assignment, execute N times
  4. Takes 1 time unit: 1 return

The total cost is 1 + 2N + 2 + 4N + 1 = 6N + 4 time units.

But in fact, we don't need to do this analysis every time, because when faced with hundreds or thousands of lines of programs, we can't do this analysis for every line. Only the highest order needs to be calculated. It can be seen that the for loop takes the most time. So the time complexity is O (N)

2.4.2 General rules

  1. for loop: the running time of a for loop should be at most the running time of the statement in the for loop multiplied by the number of iterations
  2. Nested for loops: analyze loops from the inside out . The total running time of a statement inside a group of nested loops is the product of the statement running time multiplied by the size of all for loops in the group
for (i = 0; i < N; i++)
    for (j=0; j < N; j++)
        k++;    // 1 * N * N = N^2,时间复杂度为O(N^2)
  1. Sequential statement: The running time of each statement can be summed up. Take the maximum value.
for (i = 0; i < N; i++)
    A[i] = 0;   // O(N)
for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
        A[i] += A[j] + i + j;   // O(N^2)
// 总时间为O(N) + O(N^2),因此取最高阶,总时间复杂度为O(N^2)
  1. if-else statement: Judging time plus longer running time in two branches

We want to avoid doing repetitive work in recursive calls.

2.4.3 Maximum subsequence and solution to the problem

Maximum subsequence problem: given integers A1, A2, ..., AN (may have negative numbers), find the maximum value of the sum of any consecutive integers. If all integers are negative, the maximum subsequence sum is 0

  1. Option one, time complexity O (N ^ 3)
// 书上例程
int
MaxSubsequenceSum(const int A[], int N)
{
    int ThisSum, MaxSum, i, j, k;
    
    MaxSum = 0;
    for (i = 0; i < N; i++) {
        for (j = i; j < N; j++) {
            ThisSum = 0;
            for (k = i; k <= j; k++) {
                ThisSum += A[k];
            }
            
            if (ThisSum > MaxSum) {
                MaxSum = ThisSum;
            }
        }
    }
    
    return MaxSum;
}
  1. Solution two, time complexity O (N ^ 2). Compared with scheme one, the innermost loop is discarded
int
MaxSubsequenceSum(const int A[], int N)
{
    int ThisSum, MaxSum, i, j, k;
    
    MaxSum = 0;
    for (i = 0; i < N; i++) {
        ThisSum = 0;
        for (j = i; j < N; j++) {
            ThisSum += A[k];
            if (ThisSum > MaxSum) {
                MaxSum = ThisSum;
            }
        }
    }
    
    return MaxSum;
}
  1. Scheme three, time complexity O (NlogN). Use a divide and conquer strategy. 'Fen' is to divide the data into two parts, that is, divide the problem into two roughly equal sub-problems, and then solve them recursively; 'Govern' is to calculate the sum of the maximum sub-sequences of the two parts, and then merge the results. In this problem, the maximum subsequence may occur in three cases: left half, right half, spanning the left half and right half (including the last element of the left half and the first element of the right half). The maximum subsequence sum in the third case is the sum of the maximum subsequence sum including the last element of the left half plus the maximum subsequence sum including the first element of the right half.
// 书上例程
int 
max3(int a, int b, int c)
{
    int x;
    x = a > b? a: b;
    return (x > c? x: c);    
}

int
MaxSubsequenceSum(const int A[], int Left, int Right)
{
    int MaxLeftSum, MaxRightSum;
    int MaxLeftBorderSum, MaxRightBorderSum;
    int MaxLeftThisSum, MaxRightThisSum;
    int Center;
    int cnt;
    
    if (Left == Right) {
        if (A[Left] > 0) {
            return A[Left];
        } else {
            return 0;
        }
    }
    
    Center = (Left + Right) / 2;
    MaxLeftSum = MaxSubsequenceSum(A, Left, Center);
    MaxRightSum = MaxSubsequenceSum(A, Center + 1, Right);
    
    MaxLeftBorderSum = 0;
    MaxLeftThisSum = 0;
    for (cnt = Center; cnt >= Left; cnt--) {
        MaxLeftThisSum += A[cnt];
        if (MaxLeftThisSum > MaxLeftBorderSum) {
            MaxLeftBorderSum = MaxLeftThisSum;
        }
    }
    
    MaxRightBorderSum = 0;
    MaxRightThisSum = 0;
    for (cnt = Center + 1; cnt <= Right; cnt++) {
        MaxRightThisSum += A[cnt];
        if (MaxRightThisSum > MaxRightBorderSum) {
            MaxRightBorderSum = MaxRightThisSum;
        }
    }
    
    return max3(MaxLeftSum, MaxRightSum, MaxRightBorderSum + MaxLeftBorderSum);
}
  1. Solution four, the time complexity is O (N). The data is scanned only once, and once read in and processed, it does not need to be memorized. If the array is stored on disk, it can be read in sequentially, without having to store any part of the array in main memory. And at any time, the algorithm can give the correct answer to the subsequence problem for the data it has read . Algorithms with this characteristic are also called online algorithms (online algorithms).Online algorithms that only require constant space and run in linear time are almost perfect algorithms
//书上例程
int
MaxSubsequenceSum(const int A[], int N)
{
    int ThisSum, MaxSum, j;
    
    ThisSum = MaxSum = 0;
    for (j = 0; j < N; j++) {
        ThisSum += A[j];
        if (ThisSum > MaxSum) {
            MaxSum = ThisSum;
        } else if (ThisSum < 0) {
            ThisSum = 0;
        }
    }
    return MaxSum;
}

2.4.4 Logarithm in running time

If an algorithm uses constant time (O (1)) to reduce the size of the problem to a part (usually 1/2), then the algorithm is O (logN). On the other hand, if using constant time just reduces the problem by a constant (such as reducing the problem by 1), then this algorithm is O (N)

  1. Binary search: The binary search provides a search operation with a time complexity of O (logN). Its premise is that the data has been sorted, and whenever an element is to be inserted, the time complexity of the insert operation is O (N). Because the binary search is suitable for the case where the elements are relatively fixed.
// 书上例程,时间复杂度为O(logN)
#define NotFound -1

int BinarySearch(const ElementType A[], ElementType X, int N)
{
    int low, high, mid;
    low = 0;
    high = N - 1;
    mid = (low + high) / 2;
    
    while (low <= high) {
        if (A[mid] < X) {
            low = mid + 1;
        } else if (A[mid] > X) {
            high = mid - 1;
        } else {
            return mid;
        }
    }
    return NotFound;
}
  1. Euclidean algorithm: The name Euclidean algorithm sounds very high, in fact, it is what we call tornado division. When finding the greatest common factor of two integers, use one of the integers to divide the other to obtain the remainder. Then divide the previous divisor to divide the remainder to get the new remainder, and so on. When the new remainder is 0, the divisor in the current integer is the greatest common factor. After two iterations, the remainder is at most half of the original value. The maximum number of iterations is 2logN = 0 (logN)
// 书上例程:辗转相除法,时间复杂度O(logN)
int test(unsigned int M, ungisned int N)
{
    unsigned int Rem;
    
    while (N > 0) {
        Rem = M % N;
        M = N;
        N = Rem;
    }
    return M;
}
  • Theorem 2.1: If M> N, then M mod N <M / 2.
    Proof: If N <= M / 2, the remainder must be less than N, so M mod N <M / 2; if N> M / 2, then M-N <M / 2, that is, M mod N <M / 2. Theorem is proved
  1. Power operation: Find the power of an integer. That is X ^ N. The number of multiplications required is at most 2logN, so splitting the problem in half requires up to two multiplications (N is an odd number)
// 书上例程,时间复杂度O(logN)
long int Pow(long int X, unsigned int N)
{
    if (N == 0) {
        return 1;
    } else if (N == 1) {
        return X;
    }
    
    if (isEven(N)) {
        return Pow(X * X, N / 2);
    } else {
        return Pow(X * X, N / 2) * X;
    }
}

2.4.5 Test your analysis

  1. Method one: actual programming, observe whether the running time result matches the running time predicted by the analysis. When N is doubled, the running time of a linear program is multiplied by a factor of 2, the running time of a secondary program is multiplied by a factor of 4, and the running time of a third program is multiplied by a factor of 8. For a program that runs in logarithmic time, when N increases by one At times, the running time only increases by a constant. The program running with O (NlogN) is a little more than twice the original running time. (NX, 2N (X + 1)). If the coefficient of the low-order term is relatively large, and N is not large enough, then the running time is difficult to observe clearly. It is difficult to distinguish O (N) and O (NlogN) based on practice alone
  2. Method 2: Calculate the ratio T (N) / f (N) for a range of N (usually separated by a multiple of 2), where T (N) is the observed running time and f (N) is the theoretical derivation Out running time. If the calculated value converges to a normal number, it means that f (N) is an ideal approximation of the running time; if it converges to 0, it means that f (N) is overestimated; The representative f (N) is estimated to be too small.
//书上例程,时间复杂度O(N^2)
void test(int N)
{
    int Rel = 0, Tot = 0;
    int i, j;
    
    for( i = 1; i <= N; i++) {
        for ( j = i + 1, j <= N; j++) {
            Tot++;
            
            if (Gcd(i,j) == 1) {
                Rel++;
            }
        }
    }
    
    printf("%f", (double)Rel / Tot);
}

2.4.6 Accuracy of analysis results

Sometimes the analysis will estimate too large. Then either the analysis needs to be more detailed, or the average running time is significantly less than the worst-case running time and there is no way to improve the resulting bound. In many algorithms, the worst bound is achieved by some bad input, but in practice it is usually overestimated. For most of these problems, the analysis of the average situation is extremely complex or unresolved. The worst-case world is a bit pessimistic but it is the best known analytical result.

  • Simple programs may not have simple analysis
  • Lower bound analysis is not only applicable to a certain algorithm but a certain kind of algorithm
  • Gcd algorithm and exponentiation algorithm are widely used in cryptography

Advice:

This article is original, welcome to learn to reprint _

Reprint please indicate in a prominent position:

Blogger ID: CrazyCatJack

Original blog link address: https://www.cnblogs.com/CrazyCatJack/p/12688582.html


This concludes Chapter 2 and then to Chapter 3, where we begin to explain the implementation of specific data structures and algorithms. If you feel good, please click on & recommend, so that you can learn together later. thanks for your support!

CrazyCatJack

Guess you like

Origin www.cnblogs.com/CrazyCatJack/p/12688582.html