[Learn to understand data structure] algorithm and its complexity analysis

Event address: CSDN 21-day learning challenge

Algorithms and their complexity

1. Algorithm Introduction


1.1 What is the algorithm

A commonly accepted definition of an algorithm today is:

An algorithm is a description of the steps to solve a specific problem, embodied in a computer as a finite sequence of instructions, and each instruction represents one or more operations

​ There are many kinds of problems in the real world. Algorithms are born to solve problems, so there are many kinds of algorithms. There is no general algorithm that can solve all problems.

1.2 Characteristics of the algorithm

Algorithms have five basic properties.

Scan Almighty King 2022-07-25 12.10_2

Input Output: An algorithm has zero or more inputs and at least one or more outputs.

Finiteness: After the algorithm executes a finite number of steps, it automatically ends without an infinite loop, and each step is completed within an acceptable time.

Deterministic: Each step of the algorithm has a definite meaning without ambiguity.

Feasibility: Each step of the algorithm must be feasible, that is, each step can be completed by executing a finite number of times.

1.3 Requirements for Algorithm Design

There are four basic requirements for algorithm design.

image-20220725160600383

Correctness: It means that the algorithm should at least have no ambiguity in input and output and processing, and can correctly reflect the needs of the problem and get the correct answer.

Readability: Another purpose of algorithm design is to facilitate reading, understanding and communication.

Robustness: When the input data is illegal, the algorithm can also make relevant processing instead of producing abnormal or inexplicable results.

High time efficiency and low storage capacity: use limited resources to maximize time and space efficiency.


2. Algorithm efficiency


2.1 Measurement method

2.1.1 Post-event statistical method

​ This method is mainly to use the computer timer to compare the running time of programs compiled by different algorithms through designed test programs and data, so as to determine the efficiency of the algorithm.

​ In fact, this method has serious flaws:

  • It must be programmed in advance based on the algorithm, which usually takes a lot of time and effort, and if the algorithm is found to be bad after programming, wouldn't it be a waste of money?
  • The comparison of time depends on environmental factors such as computer hardware and software to a large extent , which sometimes conceals the advantages and disadvantages of the algorithm itself. Compared with the previous 386, 486 and other grandparents' computers, a current quad-core processor computer is completely incomparable in the calculation speed of the processing algorithm. Even if it is the same computer, the CPU usage and memory usage are different, which will cause subtle differences. Therefore, it is meaningless to simply compare time.
  • Algorithm test data design is difficult. How much data do we use to test? How many times can it be tested? These are difficult questions to judge.

2.1.2 Pre-analysis and estimation method

Before computer programming, the algorithm efficiency is estimated based on statistical methods.

​ The time it takes for a program written in a high-level programming language to run on a computer depends mainly on the following factors:

Scan Almighty King 2022-07-25 12.04_3

​ Article (1) is of course the foundation of the algorithm, article (2) needs to be supported by software, and article (4) depends on hardware performance. That is to say,Regardless of environmental factors, the running time of a program depends on the quality of the algorithm and the input scale of the problem

​ During the execution of the program, does it take a certain amount of time to execute each code instruction operation? The more operations are executed, the longer the time will be consumed. In other words, the running time of the program is related to the basic operations that consume time. Proportional to the number of executions.

We don't care about the language used to write the program, nor what kind of computer these programs will run on, we only care about the algorithm it implements. Thus, regardless of the changes of the loop index and operations such as loop conditions, variable declarations, and printing results, in the end, when analyzing the running time of a program, it is most important to think of the program as an algorithm or a series of steps independent of the programming language. , according to the number of executions of basic operations to estimate the running time of the comparison program, so as to measure the time efficiency of the algorithm .

For example:

int sum(int n)
{
    
    
    int i = 0;
    int sum = 0;
    
    for(i = 1; i < n; i++)
    {
    
    
        sum += i;    //执行n次
    }
    
    return sum;
}
int print(int n)
{
    
    
    int i = 0;
    int j = 0;
    int m = 0;
    
    for(i = 0; i < n; i++)
    {
    
    
        for(j = 0; j < n; j++)
        {
    
    
            printf("%d ", m);    //nxn次
            m++;                 //nxn次
        }
        printf("\n");            //n次
    }
}

​ When we analyze the running time of an algorithm, it is important to associate the number of basic operations with the input scale, that is, the number of basic operations must be expressed as a function of the input scale, here we set f(n), n is the basic number of operations.

Scan Almighty King 2022-07-25 12.04_2

2.2 Asymptotic growth of functions

Which of the following algorithms is faster? Not necessarily.

Scan Almighty King 2022-07-25 12.04_1

​ When n=1, the efficiency of algorithm A is not as good as B, and when n=2, the efficiency of the two is the same, but as n increases, algorithm A is getting better and better than B, so we can say that algorithm A is overall Better than B.

​Asymptotic growth of functions

We found that as n increases, the latter +3 or +1 actually does not affect the final algorithm efficiency, so we can actually ignore these constants .

​ Look at the second example

Scan Almighty King 2022-07-25 12.10_11

​ When n<=3, Algorithm C is worse than D, but when n>3, Algorithm C is more and more better than D, and even far outperformed in the end. Regardless of removing the following constant or removing the constant multiplied with n, the time efficiency gap between algorithms C and D hardly changes. That is, the constant multiplied by the highest order term is not that important and can be ignored.

Take a look at the third example.

Scan Almighty King 2022-07-25 12.10_12

​ When n=1, the results of algorithms E and F are the same, but when n>1, the advantages of algorithm E are highlighted, and it becomes more and more obvious as n increases. That is to say, if the index of the highest order item is large, the number of operation executions will increase faster, and the time efficiency of the algorithm will be lower.

Take a look at the fourth example.

Scan Almighty King 2022-07-25 12.10_13

​ You see, as n gradually increases, the algorithm H is completely incomparable with the other two, neither of which is of the same order of magnitude. However, algorithm G gradually tends to algorithm I as n increases, and the gap between them is completely negligible compared to the current magnitude. That is to say, when judging the efficiency of an algorithm, the constants and other secondary terms can be ignored, and the order of the main term (the highest order term) should be paid attention to.

​ To judge whether an algorithm is good or not, it is impossible to make an accurate judgment only through a small amount of data. **A certain algorithm, as n increases, it will become more and more better than another algorithm, or worse and worse than another algorithm. **This is the theoretical basis we use to estimate the time efficiency of the algorithm.

2.3 Time Complexity

2.3.1 Definition

​ When performing algorithm analysis, the total number of executions T(n) of the statement is a function of the input scale n of the problem, and then analyze the variation of T(n) with n and determine the magnitude of T(n).

​ The time complexity of the algorithm, that is, the time measurement of the algorithm, is proportional to T(n), denoted as T(n) = O(f(n)). This means that as n increases, the growth rate of the algorithm execution time is the same as the growth rate of f(n) , which is called the asymptotic time complexity of the algorithm, or time complexity for short. where f(n) is a function of the problem input size n.

​ Such a notation that uses capital O( ) to reflect time complexity is called big O notation .

In general, as n increases, the algorithm with the slowest growth of T(n) is the optimal algorithm.

2.3.2 How to derive the Big O order

​ In fact, what we summarized in the previous section is the method.

​Derivation method

1. Replace all additive constants with the constant 1

2. Only the highest-order items are kept, and the others are removed

3. If the highest order term exists and its coefficient is not 1, remove its coefficient

​ **It should be noted that the order here is not only based on the index, but on the growth rate of f(n). The higher the growth rate, the higher the order, and the lower the lower order. **So the growth rate relationship of some mathematical functions must be clear.

​ In fact, the low-order items do not mean that they have no effect, but that the impact is negligible compared to the high-order items, because what we want to look at is the magnitude change , such as n and n 2 , not the same magnitude Yes, the gap is significant, while n and 2n are still in the same order of magnitude.

2.3.3 Explanation of Common Big O Order

2.3.3.1 Constant order
int main()
{
    
    
    int sum = 0;//执行1次
    int i = 0;//执行1次
    for(i = 0; i < 10; i++)//执行11次
    {
    
    
        sum += i;//执行10次
    }
    printf("%d", sum);//执行1次
    
    return 0;
}

​ The number of executions of the above code is a fixed constant and has nothing to do with n. We call it a time complexity of O(1), also known as a constant order .

​ No matter what the constant is, we will write it as O(1).

2.3.3.2 Linear order

​ To analyze the complexity of the algorithm, one of the keys is to analyze the operation of the loop structure.

int add(int n)
{
    
    
    int i = 0;
    int sum = 0;
	for(i = 0; i < n; i++)
    {
    
    
        sum += i;
    }
    
   return sum;
}

​ The code in the loop body needs to be executed n times, and the highest-order item is n, so it is easy to know that the time complexity is O(n).

2.3.3.3 Logarithmic order
void mul(int n)
{
    
    
    int cnt = 1;
    while(cnt < n)
    {
    
    
        cnt *= 2;
    }
    
}

​ As long as cnt is less than n, it will continue to enter the loop and multiply by 2 until it is greater than n. In this process, we assume that x 2 is multiplied, then there will be 2 x = n, that is, x = log 2 n . So the time complexity is O(logn). Note that logn is shorthand for log 2 n .

2.3.3.4 Square order

​ The following is a very simple example, the time complexity is O(n 2 ).

for(i = 0; i < n; i++)
{
    
    
    for(j = 0; j < n; j++)
    {
    
    
        //执行O(1)的操作
    }
}

In general, the time complexity of a loop is equal to the complexity of the loop body multiplied by the number of times the loop runs.

for(i = 0; i < n; i++)
{
    
    
    for(j = 0; j < m; j++)
    {
    
    
        //执行O(1)的操作
    }
}

​ Is the time complexity here just O(m*n)? Not necessarily, it depends on the relative size of m and n:

If m>>n, the time complexity is O(m 2 )

If n>>m, the time complexity is O(n 2 )

If m and n are about the same size, the time complexity is O(m*n)

​ Look at this example again:

for(i = 0; i < n; i++)
{
    
    
    for(j = 0; j <= i; j++)
    {
    
    
        //执行O(1)的操作
    }
}

​ When i=0, the inner loop is executed once, when i=1, the inner loop is executed twice... when i=n-1, the inner loop is executed n times, that is to say, the total number of executions is 1+2+3 +...+n, the sum is (n+1)n/2, which is 1/2n 2 +1/2n, and the time complexity is O(n 2 ).

Here is an example of calling a function:

void fun(int cnt)
{
    
    
	int j = 0;
	for(j = cnt; j < n; j++)
	{
    
    
		//执行O(1)的操作
	}
}

int main()
{
    
    
    int i = 0;
    int j = 0;
    int n = 0;
    scanf("%d", &n);
    
    fun(n);//执行次数为n
    for(i = 0; i < n; i++)
    {
    
    
        fun(i);//执行次数为n*n
    }
    
    for(i = 0; i < n; i++)//执行次数为n(n+1)/2
    {
    
    
        for(j = i; j < n; j++)
        {
    
    
            //执行O(1)的操作
        }
    }
    
    return 0;
}

​ At first glance, is it a bit complicated? In fact, it is still of square order. Let's analyze it.

​ Don’t look at the constant number of times. The first fun(n) calls the fun function once, which is executed n times. The first for loop calls the fun function in the loop. One call is n times, and the total calls n times, so it is n*n times.

​ As for the second for loop, it is a nested loop. Note that the initialization condition of the inner loop is j=i. When i=0, the inner loop is executed n times. When i=1, the inner loop is executed n-1 times... When i=n-2, the inner loop is executed twice, when i=n-1, the inner loop is executed once, the total number of executions is n+n-1+n-2+...+2+1, sum Get n(n+1)/2.

​ So the number of executions of the program is about 3/2n 2 +3/2n, and the time complexity is O(n 2 ).

2.3.3.5 Common time complexity

Scan Almighty King 2022-07-25 12.10_14

​ Sort from low-order to high-order:

O(1) < O(logn) < O(n) < O(nlongn) < O(n2) < O(n3) < O(2n) < O(n!) < O(nn)

image-20220725183634365

​ What we discuss most often is actually the following time complexities

Scan Almighty King 2022-07-25 12.10_15

2.3.4 Worst case and average case

2.3.4.1 Three situations

​ In the example we mentioned earlier, the number of executions can actually be estimated because it is relatively fixed, but there are some algorithms whose execution times are uncertain, and there are several possible situations at this time.

  • Best case: Minimum number of runs (lower bound) for any input size
  • Worst case: maximum number of runs for any input size (upper bound)
  • Average case: expected number of runs for any input size

​ For example: search for a data x in an array of length N

Best case: 1 find
Worst case: N finds
Average case: N/2 finds

​In practice, the general concern is the worst case of the algorithm, which is the so-calledTime complexity refers to the worst case time complexity, for example, the time complexity of searching data in the array in the above example is O(N).

2.3.4.2 Average time complexity (only used in special cases)

​ Average time complexity, also known as "weighted average time complexity", "expected time complexity", why is it called weighted? Because, usually, to calculate the average time complexity, probability needs to be taken into account, that is, when calculating the average time complexity, a "weighted value" is needed to actually calculate the average time complexity.

​ Take an example to analyze

// n 表示数组 arr 的长度
int find(int[] arr, int n, int x) 
{
    
    
  int i = 0;
  int pos = -1;
  for (i = 0; i < n; i++)
  {
    
    
    if (arr[i] == x) 
    {
    
    
       pos = i;
       break;
    }
  }
  return pos;
}

​ The code is very simple, which means that in an array, search for the number x, if found, return the subscript, and return -1 if not found, the time complexity is O(1) in the best case, and the time complexity in the worst case Degree is O(n).

​ How is the average complexity calculated?

Let me talk about the simple average calculation formula first :

​ In the above code, there are many possibilities to find out whether x is executed. For example, the number of executions is 1, 2, 3...n, and the number of executions under each possibility is different. Add up the number of executions under all possible executions: 1 + 2 + 3 +······+ n + n ( 第二个 n 表示当 x 不存在的情况下遍历 arr 需要的执行次数), in addition, there are n + 1 possible cases (the extra 1 is not found), then the result is:

img

​ In big O-order notation, it is O(n).

​ This formula expresses: calculate the sum of the number of executions in all possible situations, and then divide by the number of possible situations. To put it bluntly, this is an absolute average result. Indicates that the probability that each outcome may occur is 1/(n+1).

​ How to use the weighted average formula to calculate?

​ There are 2 probabilities here:

  1. The probability of whether the x variable is in the array, there are 2 cases - in and out, so the probability is 1/2
  2. The probability that the x variable appears in a certain position of the array, there are n cases, but it will only appear once, so the probability is 1/n

​ x must exist in the array and appear in a certain position, so we multiply the two probabilities together, and the result is 1/(2n). On average, each possible occurrence probability is the same, so this number is The "weighted value" of all execution times.

​ How to use the weighted value to calculate the "complexity" of the above code?

(1+2+3+...+n+n)/2n, that is, replace the denominator with 2n and calculate (3n+1)/4.

​ In Big O notation, the time complexity is O(n).

To calculate the accurate average time complexity, it is necessary to accurately calculate this "weight value", and the weight value will be affected by the data range and data type. Therefore, it is necessary to adjust parameters in actual operation.

2.3.5 Example exercises

2.3.5.1 Binary Search
int BinarySearch(int* a, int n, int x)
{
    
    
	assert(a);
	int begin = 0;
	int end = n-1;
	// [begin, end]:begin和end是左闭右闭区间,因此有=号
	while (begin <= end)
	{
    
    
		int mid = begin + ((end-begin)/2);
		if (a[mid] < x)
			begin = mid+1;
		else if (a[mid] > x)
			end = mid-1;
		else
			return mid;
    }
	return -1
}

​ The number of binary search searches is actually uncertain. In the best case, it can be found once. In the worst case, it has to be divided into two until left>right. There are n numbers. Every time the number is divided, half of the numbers will be truncated, and finally truncated to There is only one number left, assuming that x times are truncated, each truncation n must be divided by 2, so there are n/2 x =1 in the end, that is, the number of executions x=log 2 n, and the time complexity is O( logn).

2.3.5.2 Factorial recursion
// 计算阶乘递归Fac的时间复杂度?
long long Fac(size_t N)
{
    
    
	if(0 == N)
		return 1;
    else
		return Fac(N-1)*N;
}

​ It seems that recursion has not been mentioned before, so what do you think the time complexity will be?

​ Every time you enter the function, the function after N-1 substitution will be called once until the value of N becomes 0, so f(N-1), f(N-2)...f(3), f( 2), f(1), f(0) so many functions, the number of executions is N, and the time complexity is O(n).

image-20220725180937782

2.3.5.3 Fibonacci Recursion
// 计算斐波那契递归Fib的时间复杂度?
long long Fib(size_t N)
{
    
    	
	if(N < 3)
		return 1;
    else
		return Fib(N-1) + Fib(N-2);
}

​ This is more complicated. It is best to know what a binary tree is (learn later). Simply put, a binary tree is a tree structure with at most two branches per branch, as shown in the figure

img

​ What is the relationship between binary trees and Fibonacci recursion? The Fibonacci recursive function call process can actually be represented by a binary tree, as shown in the figure

img

​ This is very intuitive. It is obvious that this algorithm has a lot of repeated calculations, so as n increases, the efficiency of the algorithm should be very low, so how to calculate the time complexity?

​ Pay attention to the picture, the number of executions of each layer can be calculated by counting how many function calls there are in this layer, and there are certain rules: they are all exponential times of 2, for example, the first layer is 2 0 and the second layer is 2 1 , the third layer is 2 2 , and so on, the nth layer is 2 n-1 , so the time complexity will come out—O(2 n ).

2.4 Space Complexity

​ Time complexity mainly measures how fast an algorithm runs, while space complexity mainly measures the extra space required for an algorithm to run.

In the early days of computer development, computers had very little storage capacity. So I am very concerned about the space complexity. However, with the rapid development of the computer industry, the storage capacity of computers has reached a very high level. So we no longer need to pay special attention to the space complexity of an algorithm.

​ The space complexity of the algorithm is realized by calculating the size of the auxiliary space required by the algorithm, and the calculation is the number of variables. The calculation rules of space complexity are basically similar to those of time complexity. Big O asymptotic notation is also used , which is recorded as: S(n)=O(f(n)), n is the input scale of the problem, and f(n) is the value of n function.

​ Generally speaking, when a program is executed on a machine, in addition to storing the program’s own instructions, constants, variables and input data, it also needs to store the storage unit for data operations. It is only necessary to analyze the algorithm required for implementation . Auxiliary units are available.

​ The stack space (storage parameters, local variables, some register information, etc.) required by the function at runtime has been determined during compilation, so the space complexity is mainly determined by the additional space explicitly requested by the function at runtime .

2.4.1 Example 1

// 计算BubbleSort的空间复杂度?
void BubbleSort(int* a, int n)
{
    
    
    assert(a);
	for (size_t end = n; end > 0; --end)
	{
    
    
		int exchange = 0;
		for (size_t i = 1; i < end; ++i)
		{
    
    
			if (a[i-1] > a[i])
		{
    
    
			Swap(&a[i-1], &a[i]);
			exchange = 1;
		}
	}
	if (exchange == 0)
		break;
	}
}

​ Is it O(n)? Doesn't a have n elements? Note that the a array is not an additional auxiliary space opened up according to the needs of the algorithm design! It is an essential space, because the bubble sorting algorithm we designed is to operate on this array, instead of the bubble sorting algorithm needing to open up an array. Looking at it this way, the extra space we open up is at most a constant number, so the space complexity is O(1).

2.4.2 Example 2

// 计算Fibonacci的空间复杂度?
// 返回斐波那契数列的前n项
long long* Fibonacci(size_t n)
{
    
    
	if(n==0)
		return NULL;
	long long * fibArray = (long long *)malloc((n+1) * sizeof(long long));
	fibArray[0] = 0;
	fibArray[1] = 1;
	for (int i = 2; i <= n ; ++i)
	{
    
    
		fibArray[i] = fibArray[i - 1] + fibArray [i - 2];
	}
	return fibArray;
}

​ This is an iterative method used to find the first n items of the Fibonacci sequence, and a fibArray array is applied on the heap, so the space complexity is O(n).

2.4.3 Example 3

// 计算阶乘递归Fac的空间复杂度?
long long Fac(size_t N)
{
    
    
	if(N == 0)
		return 1;
	return Fac(N-1)*N;
}

​ The time complexity of this recursion was mentioned earlier. In fact, n+1 stack frames are created during the recursion process, and one is opened for each function call. Only one function parameter N is created (the details of the stack are not concerned), and a total of Create n+1 N's, so the space complexity is O(n).

2.4.4 Example 4

// 计算斐波那契递归Fib的时间复杂度?
long long Fib(size_t N)
{
    
    	
	if(N < 3)
		return 1;
    else
		return Fib(N-1) + Fib(N-2);
}

img

​ Still the picture used earlier, let's take a look, first of all, calling f(n) will call f(n-1) and f(n-2), is it called f(n-1) first? After calling f(n-1), f(n-2) will be called. Calling f(n-1) will create a stack frame, right? Then the stack frame will be destroyed after calling f(n-1), and the space Return it to the system, and then call f(n-2), don't you have to create a stack frame, which space is used? The block used by the previously destroyed f(n-1) is used, which means that the function stack frames of f(n-1) and f(n-2) are created using the same block of space. Similarly, as shown in the figure:

image-20220727155839199

​ So there are only n function stack frames used back and forth, and each function stack frame creates a constant number of variables, so the space complexity is O(n).

​ If the auxiliary space required for the execution of the algorithm is a constant for the amount of input data, the algorithm is said to work in situ , and the space complexity is (1).


insert image description here

Guess you like

Origin blog.csdn.net/weixin_61561736/article/details/126216868