Algorithmic Complexity - Introductory Notes on Algorithms and Data Structures (2)

CSDNlogoPost

This article is the second part of the study notes on algorithms and data structures, and will continue to be updated. Friends are welcome to read and learn. If there is something that I don’t understand or is wrong, please communicate

What is algorithmic complexity?

Algorithmic complexity is designed to calculate the input data volume NNIn the case of N , the "time usage" and "space usage" of the algorithm; it reflects the time and space used by the algorithm to run with the "data sizeNNN "and increase the speed.

Algorithm complexity can be evaluated from two perspectives of time and space :

  • Time : Assuming that the running time of each operation is a fixed constant, the "number of computing operations" performed by the algorithm is counted to represent the time required for the algorithm to run;
  • Space : Count the "maximum space" required for the algorithm to run in the worst case;

"Input data size" NNN refers to the amount of input data processed by the algorithm; according to different algorithms, it has different definitions, for example:

  • Sorting algorithm : NNN represents the number of elements to be sorted;
  • Search algorithm : NNN represents the total number of elements in the search range, such as array size, matrix size, number of binary tree nodes, graph nodes and edges, etc.;

Next, we will introduce "time complexity" and "space complexity" from the perspectives of concept definition, symbolic representation, analysis rules, common types, example analysis, and space-time trade-offs.

time complexity

concept definition

By definition, the time complexity refers to the input data size is NNWhen N , the time it takes for the algorithm to run. requires attention:

  • What is counted is the "number of computing operations" of the algorithm, not the "absolute running time". The number of computational operations and the absolute time to run are positively correlated, but not equal. The running time of the algorithm is affected by various factors such as "programming language, computer processor speed, and operating environment". For example, the same algorithm will have different runtimes if it is implemented in Python or C++, using a CPU or GPU, using a local IDE or an online platform.
  • It reflects that the number of calculation operations varies with the data size NNWhat happens when N changes. Assuming that the algorithm requires a total of "1 operation" and "100 operations", the time complexity of both cases is constantO ( 1 ) O(1)O ( 1 ) ; the time complexity of "N operations" and "100N operations" is linearO ( N ) O(N)O ( N )

symbolic representation

According to the characteristics of the input data, the time complexity has three situations: "worst", "average", and "best", respectively using OOOΘ \ThetaΘΩ \OmegaΩ is represented by three symbols. The following is an example of a search algorithm to help understand.

Topic : The input length is NNAn integer array of Nnums , determine whether there is a number 7 in this array, return if there istrue, otherwise returnfalse.
Problem-solving ideas: Linear search, that is, traverse the entire array, and return when encountering 7true.
C code:

#include <stdbool.h>

bool find_seven(int* nums, int length) {
   for (int i = 0; i < length; i++) {
      if (nums[i] == 7) {
          return true;
       }
   }
   return false;
}
  • Best case Ω ( 1 ) \Omega(1)Ω ( 1 ) : nums = [7, a, b, c, …] , that is, when the first number in the array is 7, no matter how many elements numsthere are
  • Worst case O ( N ) O(N)O ( N ) : nums = [a, b, c, …] and all the numbers numsinNNN times;
  • Average case Θ \ThetaΘ : It is necessary to consider the distribution of input data and calculate the average time complexity of all data cases; for example, in this topic, it is necessary to consider the length of the array, the value range of the array elements, etc.;

big OOO is the most commonly used time complexity evaluation symbol, also known as the asymptotic upper bound, which shows thatNNN gradually increases the time resource overheadT ( N ) T(N)The increasing trend of T ( N ) . In fact, the results of the analysis provide a guarantee that the program can be terminated within a certain period of time. The program may end early, but never late.

Time Complexity Analysis Rules

The following is a program to help understand the time complexity analysis, here is the calculation ∑ i = 1 N i 3 \sum\limits_{i = 1}^N { { i^3}}i=1NiA simple program fragment of 3

int sum(int N){
	int i, PartialSum;

	PartialSum = 0;
	for( i=1; i<=N; i++){
		PartialSum += i * i * i;
	}
	return PartialSum;
}

The analysis of this program is simple. Statements do not count towards time. Lines 4 and 8 each take one time unit. Line 6 takes four time units per execution (two multiplications, one addition, and one assignment), while executing NNN times occupy4 N 4N4 N time units. Line 5 is initializingiii , testi ≤ N i\le NiN and pairiiThere is an implicit overhead in the self-increment operation of i . The total overhead of all this is 1 time unit to initialize, and all testsN+1 N+1N+1 time unit, and all self-increment operationsNNN time units, a total of2 N + 2 2N+22N _+2 . We ignore the overhead of calling the function and returning the value, and the total is6 N + 4 6N+46 N+4 , so we say the program isO ( N ) O(N)O ( N ) . The analysis is shown in the figure below:
Time Complexity Analysis Example

If we had to demonstrate all of this work every time we analyzed a program, the task would quickly become infeasible. Fortunately, since we have the big OOO result, so there are many shortcuts that can be taken without affecting the final result. For example, line 6 (on each execution) is obviouslyO ( 1 ) O(1)O ( 1 ) statement, so it's silly to calculate exactly whether it's two, three, or four time units; it doesn't matter. Line 4 is obviously trivial compared to the for loop, so it's not wise to spend time here either. This leads us to several general laws.

  • Rule 1—for loops :
    The running time of a for loop is at most the running time of the statements (including tests) in the for loop multiplied by the number of iterations.
  • Rule 2 - Nested for loops :
    Analyze these loops from the inside out. The total running time of a statement inside a set of nested loops is the product of the statement's running time times the size of all the for loops in the set.
    As an example, the following program fragment is O ( N 2 ) O(N^2)O ( N2):
    for( i=0; i<N; i++){
    	for( j=0; j<N; j++){
    		k++;
    	}
    }
    
  • Rule 3 - Sequential statement :
    Take the largest multi-segment statement: the total complexity is equal to the complexity of the code with the largest magnitude.
    As an example, the following program fragment first uses O ( N ) O(N)O ( N ) , costO ( N 2 ) O(N^2)O ( N2 ), the total overhead is alsoO ( N 2 ) O(N^2)O ( N2):
    for( i=0; i<N; i++){
    	A[i] = 0:
    }
    for( i =0; i<N; i++){
    	for( j=0; j<N; j++){
    		A[i] += A[j] + i + j; 
    	}
    }
    
  • Rule 4 - IF/ELSE Statements :
    For Program Fragments
    if(Condition){
    	S1;
    }
    else{
    	S2;
    }
    
    The running time of an if/ise statement never exceeds S1the S2total running time of predicate plus the running time of and , whichever is longer.

Common types

The time complexity of the algorithm is finally expressed must be an independent variable for the input size NNThe unary functions of N , arranged from small to large, the time complexities of common algorithms mainly include: O ( 1 ) < O ( log ⁡ N ) < O ( N ) < O ( N log ⁡ N ) < O ( N 2 ) < O ( N c ) < O ( 2 N ) < O ( N ! ) O(1)<O(\log N)<O(N)<O(N\log N)<O(N^2)< O(N^c)<O(2^N)<O(N!)O(1)<O(logN)<O ( N )<O ( NlogN)<O ( N2)<O ( Nc)<O(2N)<O ( N !) exponential and factorial levels are catastrophic, and other levels are acceptable.
Common Complexity Types

Example Analysis

Here are a few examples of C code of varying complexity:

Constant level O( 1 ) O(1)O ( 1 ) :
number of runs vs.NNThe size of N has a constant relationship, that is, it does not vary with the size of the input dataNNN changes.
For the following code, regardless ofaaHow big a is, is the same as the input data sizeNNN is irrelevant, so the time complexity is stillO ( 1 ) O(1)O(1)

int algorithm(int N) {
    int count = 0;
    int a = 10000;
    for (int i = 0; i < a; i++) {
        count += 1;
    }
    return count;
}

constant level

Linear class O ( N ) O(N)O ( N ) :
number of runs vs.NNThe size of N is linear, and the time complexity isO ( N ) O(N)O ( N ) .
The following code is a single-layer loop, runningNNN times, so the time responsibility isO ( N ) O(N)O ( N )

int algorithm(int N) {
    int count = 0;
    for (int i = 0; i < N; i++) {
        count += 1;
    }
    return count;
}

linear level

Square class O ( N ) O(N)O ( N ) :
Take two layers of loops as an example, if the two layers of loops are independent of each other, both of them are related toNNN is linearly related, so the overall andNNN has a square relationship, and the time complexity isO ( N 2 ) O(N^2)O ( N2)
square level

Polynomial level O ( N c ) O(N^c)O ( Nc ):
Among them,ccc is a constant, smart you can guessO ( N 3 ) O(N^3)O ( N3 )How to write programs with time complexity.

Exponential O ( 2 N ) O(2^N)O(2N ):
"Cell division" in biology is exponential growth. The initial state is 1 cell, 2 after one round of division, 4 after two rounds of division, ..., splitNNAfter N rounds there are2 N 2^N2N cells.
In the algorithm, the exponential level often appears in recursion, and the algorithm code and schematic diagram are shown below.

int algorithm(int N) {
    if (N <= 0) {
        return 1;
    }
    int count_1 = algorithm(N - 1);
    int count_2 = algorithm(N - 1);
    return count_1 + count_2;
}

Exponential

Logarithmic O ( log ⁡ N ) O(\log N)O(logN ) :
The logarithmic order is the opposite of the exponential order, the exponential order is "split two cases per round", and the logarithmic order is "exclude half cases per round". The logarithmic order often appears in algorithms such as "dichotomy" and "divide and conquer", which embodies the algorithmic idea of ​​"one divides into two" or "one divides into many".

int algorithm(int N) {
	int count = 1;
	while(count<N){
    	count *= 2;
}

countThe initial value is 1, and it is continuously multiplied by 2 to approach NNN , let the number of cycles bemmm , then the input data sizeis NNN and2 m 2^m2m has a linear relationship, takinglog ⁡ 2 \log_2log2logarithm, then get the number of cycles mmm andlog ⁡ 2 N \log_2Nlog2N is linear, that is, the time complexity isO ( log ⁡ N ) O(\log N)O(logN)
Logarithmic

Linear-logarithmic scale O ( N log ⁡ N ) O(N\log N)O ( NlogN ) :
The two layers of loops are independent of each other, and the time complexity of the first layer and the second layer areO ( log ⁡ N ) O(\log N)O(logN ) andO ( N ) O(N)O ( N ) , the overall time complexity isO ( N log ⁡ N ) O(N\log N)O ( NlogN)

int algorithm(int N) {
    int count = 0;
    int i = N;
    while (i > 1) {
        i = i / 2;
        for (int j = 0; j < N; j++) {
            count += 1;
        }
    }
    return count;
}

The linear logarithmic order often appears in sorting algorithms, such as "quick sort", "merge sort", "heap sort", etc., and its time complexity principle is shown in the figure below.
linear logarithmic order

Factorial level O ( N ! ) O(N!)O ( N !) :
The factorial level corresponds to the common "full permutation" in mathematics. Namely givenNNFind all possible permutations of N non-repeating elements, then the number of schemes is: N × ( N − 1 ) × ( N − 2 ) × ⋯ × 2 × 1 = N ! N×(N−1) ×(N−2)×⋯×2×1=N!N×(N1)×(N2)××2×1=N ! As shown in the figure and code below, factorial is often implemented using recursion. The principle of the algorithm: the first layer splitsNNN , the second layer splits outN − 1 N−1N1 , ... , until theNNthTerminate and backtrack at N levels.

int algorithm(int N) {
    if (N <= 0) {
        return 1;
    }
    int count = 0;
    for (int i = 0; i < N; i++) {
        count += algorithm(N - 1);
    }
    return count;
}

factorial level

space complexity

concept definition

The space types involved in space complexity are:

  • Input space : the size of the space required to store the input data;
  • Temporary storage space : During the operation of the algorithm, the space required to store all intermediate variables and objects and other data;
  • Output space : When the algorithm returns from running, the space required to store the output data;

Usually, the space complexity refers to the input data size NNWhen N , the overall size of the "temporary storage space" + "output space" used by the algorithm to run.
space complexity
According to different sources, the memory space used by the algorithm is divided into three categories:
Instruction space:
After compilation, the memory space used by program instructions.
Data space:
the space used by various variables in the algorithm, including: the memory space used by declared constants, variables, dynamic arrays, and dynamic objects.
Stack frame space:
The program call function is implemented based on the stack. During the call, the function occupies a constant size stack frame space until it is released after returning. As shown in the following code, the function is called in a loop, and after each round of calling test()returnsO ( 1 ) O(1)O(1)

int test() {
    return 0;
}

void algorithm(int N) {
    for (int i = 0; i < N; i++) {
        test();
    }
}

In the algorithm, the accumulation of stack frame space often occurs in recursive calls. As shown in the following code, through recursive calls, there will be NN at the same timeN non-returned functionsalgorithm(), cumulative use ofO ( N ) O(N)O ( N ) size stack frame space.

int algorithm(int N) {
    if (N <= 1) {
        return 1;
    }
    return algorithm(N - 1) + 1;
}

symbolic representation

Usually, the space complexity statistical algorithm uses the space size in the "worst case" to reflect the amount of space reserved for the algorithm to run, using the symbol OOO said.
The worst case has two meanings, which are "worst input data" and "worst operating point" in the operation of the algorithm. For example the following code:

Enter the integer NNN , value rangeN ≥ 1 N≥1N1

  • Worst input data: when N ≤ 10 N\le10NWhen 10 , the length numsofO ( 10 ) = O ( 1 ) O(10)=O(1)O(10)=O(1) ;当 N > 10 N>10 N>When 10 , numsthe length ofNNN , the space complexity isO ( N ) O(N)O ( N ) ; thus, the space complexity should beO ( N ) O(N)O ( N )
  • Worst run point: The algorithm uses only O ( 1 ) O(1)int* nums = (int*)malloc(10 * sizeof(int)); when executingO ( 1 ) sized space; while nums = (int*)malloc(N * sizeof(int));whenO ( N ) O(N)O ( N ) space; thus, the space complexity should beO ( N ) O(N)O ( N )
void algorithm(int N) {
    int num = 5;              // O(1)
    int* nums = (int*)malloc(10 * sizeof(int));  // O(1)
    
    if (N > 10) {
        free(nums);  // 释放原来分配的内存
        nums = (int*)malloc(N * sizeof(int));  // O(N)
    }
}

Common types

Arranged from small to large, common algorithm space complexities are: O ( 1 ) < O ( log N ) < O ( N ) < O ( N 2 ) < O ( 2 N ) O(1)<O(logN) <O(N)<O(N^2)<O(2^N)O(1)<O(logN)<O ( N )<O ( N2)<O(2N)common complexity

Example Analysis

For all the following examples, let the input data size be a positive integer NNN , node classNode, function test()as

// 节点结构体
struct Node {
    int val;
    struct Node* next;
};

// 创建节点函数
struct Node* createNode(int val) {
    struct Node* newNode = (struct Node*)malloc(sizeof(struct Node));
    newNode->val = val;
    newNode->next = NULL;
    return newNode;
}

// 函数 test()
int test() {
    return 0;
}

Constant level O( 1 ) O(1)O ( 1 ) :
Ordinary constants, variables, objects, and collections whose number of elements is independent of the input data size N all use a constant size space.

int N = 0;                        // 变量
int num = 0;
int nums[10000] = {0};            // 数组
struct Node* node = createNode(0); // 动态对象

As shown in the following code, although the function test()calls NNN times, but after each round of calling test()hasO ( 1 ) O(1)O(1)

void algorithm(int N) {
    for (int i = 0; i < N; i++) {
        test();
    }
}

Linear class O ( N ) O(N)O ( N ) :
number of elements vsNNAny type of collection with a linear relationship between N (commonly found in one-dimensional arrays, linked lists, hash tables, etc.), all use a linear size space.

int* nums_1 = (int*)malloc(N * sizeof(int));
int* nums_2 = (int*)malloc((N / 2) * sizeof(int));
struct Node** nodes = (struct Node**)malloc(N * sizeof(struct Node*));

As shown in the figure and code below, during this recursive call, there will be NN at the same timeNalgorithm() functions that do not returnO ( N ) O(N)O ( N ) size stack frame space.

int algorithm(int N) {
    if (N <= 1) return 1;
    return algorithm(N - 1) + 1;
}

linear level

Square level O ( N 2 ) O(N^2)O ( N2 ):
Number of elements andNNAny type of collection with N squared (commonly seen in matrices), all using a square-sized space.

int** num_matrix = (int**)malloc(N * sizeof(int*));
struct Node*** node_matrix = (struct Node***)malloc(N * sizeof(struct Node**));

// 初始化 num_matrix 二维数组
for (int i = 0; i < N; i++) {
    num_matrix[i] = (int*)malloc(N * sizeof(int));
    for (int j = 0; j < N; j++) {
        num_matrix[i][j] = 0;
    }
}

// 创建节点对象并初始化
for (int i = 0; i < N; i++) {
    node_matrix[i] = (struct Node**)malloc(N * sizeof(struct Node*));
    for (int j = 0; j < N; j++) {
        node_matrix[i][j] = createNode(j);
    }
}

As shown in the figure and code below, NN exists at the same time when recursive callsN non-returning algorithm()functionsO ( N ) O(N)O ( N ) stack frame space; arrays are declared in each layer of recursive function, with an average length ofN 2 \frac{N}{2}2N​ , use O ( N ) O(N)O ( N ) space; thus the overall space complexity isO ( N 2 ) O(N^2)O ( N2)

int algorithm(int N) {
    if (N <= 0) return 0;
    int* nums = (int*)malloc(N * sizeof(int));
    return algorithm(N - 1);
}

square level

Exponential O ( 2 N ) O(2^N)O(2N ):
The exponential order is common in binary trees and multi-fork trees. For example, with a height ofNNThe number of nodes in N 's "full binary tree" is 2 N 2^N2N , occupiedO ( 2 N ) O(2^N)O(2N )size space; similarly, the height isNNN 's fullmmThe number of nodes in the "m -ary tree" ism N m^NmN,占用O ( m N ) = O ( 2 N ) O(m^N)=O(2^N)O(mN)=O(2N )size space.
Exponential

Logarithmic O ( log ⁡ N ) O(\log N)O(logN ) :
The logarithmic order often appears in the stack frame space accumulation and data type conversion of the divide-and-conquer algorithm, for example:

  • Quick sort , the average space complexity is Θ ( log ⁡ N ) \Theta(\log N)Θ ( lo gN ) , the worst space complexity isO ( N ) O(N)O ( N )
  • Numbers are converted to strings , and a positive integer is set to NNN , then the space complexity of the string isO ( log ⁡ N ) O(\log N)O(logN ) . The derivation is as follows: positive integerNNThe number of digits of N islog ⁡ 10 ​ N \log_{10}​Nlog10​N , that is, the length of the converted string islog ⁡ 10 ​ N \log_{10}​Nlog10​N , so the space complexity islog ⁡ ​ N \log ​NlogN

space-time trade-off

For the performance of the algorithm, it needs to be comprehensively evaluated from the usage of time and space. A good algorithm should have two characteristics, namely, low time and space complexity. In fact, it is very difficult to simultaneously optimize time complexity and space complexity for an algorithm problem. Reducing time complexity is often at the expense of increasing space complexity, and vice versa.

Due to the sufficient memory of contemporary computers, under normal circumstances, the method of "space for time" is generally adopted in algorithm design, that is, part of the computer storage space is sacrificed to increase the running speed of the algorithm.

Taking LeetCode's first question, the sum of two numbers , as an example, "violent enumeration" and "auxiliary hash table" are two algorithms of "space optimal" and "time optimal" respectively.

  • Method 1: Violent enumeration
    Time complexity O ( N 2 ) O(N^2)O ( N2 ), the space complexityis O ( 1 ) O(1)O ( 1 ) ; belongs to "time for space", although it only uses a constant size of extra space, it runs too slowly.
  • Method 2: Auxiliary Hash Table
    Time Complexity O ( N ) O(N)O ( N ) , space complexityO ( N ) O(N)O ( N ) ; belongs to "space for time", with the help of auxiliary hash table, by saving the mapping of array element values ​​and indexes to improve the operation efficiency of the algorithm, it is the best solution to this problem.

summary

The above is the relevant content of the algorithm complexity.
The next article will introduce the nine commonly used data structures in detail, and it will be continuously updated...

Guess you like

Origin blog.csdn.net/a2360051431/article/details/130736803