Data Structure and Algorithm (1) Complexity Analysis (Part 1): Time Complexity and Space Complexity

Complexity analysis is the essence of the entire algorithm learning, as long as it is mastered, the content of data structure and algorithm is basically half mastered.

Why is complexity analysis needed?

You may be a little confused, I run the code once, and through statistics and monitoring, I can get the execution time of the algorithm and the memory size it occupies. Why do time and space complexity analysis? Can this analysis method be more accurate than the data I get by actually running it again? First of all, I can say with certainty that your method of evaluating the execution efficiency of an algorithm is correct. Many data structure and algorithm books also gave this method a name, called post hoc statistics. However, this statistical method has very large limitations.

  1. The test results are very dependent on the test environment. The difference in hardware in the test environment will have a great impact on the test results.
    For example, we take the same piece of code and run it with Intel Core i9 processor and Intel Core i3 processor respectively. Needless to say, the i9 processor executes much faster than the i3 processor. Also, for example, the execution speed of code a on this machine is faster than that of code b. When we switch to another machine, there may be completely opposite results.
  2. The test results are greatly affected by the size of the data. We will talk about the sorting algorithm later. Let's take it as an example first. For the same sorting algorithm, the ordering degree of the data to be sorted is different, and the execution time of sorting will be very different. In extreme cases, if the data is already in order, the sorting algorithm does not need to do anything, and the execution time will be very short. In addition, if the test data scale is too small, the test results may not truly reflect the performance of the algorithm. For example, for small-scale data sorting, insertion sorting may actually be faster than quick sorting!

Therefore, we need a method that can roughly estimate the execution efficiency of the algorithm without using specific test data to test. This is the time and space complexity analysis method we are going to talk about today.

Time Complexity Representation Method

Example 1:

 int cal(int n) {
    
    
   int sum = 0;
   int i = 1;
   for (; i <= n; ++i) {
    
    
     sum = sum + i;
   }
   return sum;
 }

From the perspective of the CPU, each line of this code performs a similar operation: read data-operate-write data . Although the number of CPU executions and execution time corresponding to each line of code are different, we are only making a rough estimate here, so we can assume that the execution time of each line of code is the same, which is unit_time (that is, the unit consumed by executing each statement time ). Based on this assumption, what is the total execution time of this code?
The 2nd and 3rd lines of code require 1 unit_time execution time respectively, and the 4th and 5th lines are run n times, so they need 2n*unit_time execution time, so the total execution time of this code is (2n+2)* unit_time. It can be seen that the execution time T(n) of all codes is proportional to the execution times of each line of code .

Example 2:

 int cal(int n) {
    
    
   int sum = 0;
   int i = 1;
   int j = 1;
   for (; i <= n; ++i) {
    
    
     j = 1;
     for (; j <= n; ++j) {
    
    
       sum = sum +  i * j;
     }
   }
 }

The 2nd, 3rd, and 4th lines of code each require 1 unit_time execution time, the 5th and 6th lines of code are executed n times, requiring 2n * unit_time execution time, and the 7th and 8th lines of code are executed n² in a loop times, so it takes 2n² * unit_time execution time. Therefore, the total execution time of the entire code T(n) = (2n²+2n+3)*unit_time.

Although we don't know the specific value of unit_time, through the derivation process of the execution time of these two codes, we can get a very important rule, that is, the execution time T(n) of all codes is related to the execution times n of each line of code Proportional. We can summarize this law into a formula:

T(n)=O(f(n))

Among them, T(n) we have already mentioned, it represents the time of code execution ; n represents the size of the data scale ; f(n) represents the sum of the number of times each line of code is executed . Since this is a formula, it is denoted by f(n). O in the formula means that the execution time T(n) of the code is proportional to the expression f(n).
So, T(n) = O(2n+2) in the first example and T(n) = O(2n² +2n+3) in the second example. This is Big O time complexity notation.
Big O time complexity does not actually specifically represent the real execution time of the code, but represents the change trend of code execution time with the growth of data scale, so it is also called asymptotic time complexity, or time complexity for short .
When n is large, you can think of it as 10000, 100000. However, the low-order, constant, and coefficient parts in the formula do not affect the growth trend, so they can be ignored . We only need to record a maximum magnitude . If the time complexity of the two pieces of code just mentioned is expressed in big O notation, it can be recorded as: T(n) = O(n); T(n) = O(n² ).

Specific analysis of time complexity

1. Only pay attention to the piece of code with the most loop execution times.
When analyzing the time complexity of an algorithm or a piece of code, we only need to focus on the piece of code with the most loop execution times. For example,
in Example 1, the 2nd and 3rd lines of code Both are constant-level execution time, which has nothing to do with the size of n, so it has no effect on the complexity. The codes on the 4th and 5th lines are the ones with the most loop execution times, so this piece of code should be analyzed emphatically. As we said earlier, these two lines of code are executed n times, so the total time complexity is O(n).
Simply put, only focus on the most complex part of the code
2. The law of addition: the total complexity is equal to the complexity of the code with the largest magnitude

Example 3:

int cal(int n) {
    
    
   int sum_1 = 0;
   int p = 1;
   for (; p < 100; ++p) {
    
    
     sum_1 = sum_1 + p;
   }

   int sum_2 = 0;
   int q = 1;
   for (; q < n; ++q) {
    
    
     sum_2 = sum_2 + q;
   }
 
   int sum_3 = 0;
   int i = 1;
   int j = 1;
   for (; i <= n; ++i) {
    
    
     j = 1; 
     for (; j <= n; ++j) {
    
    
       sum_3 = sum_3 +  i * j;
     }
   }
 
   return sum_1 + sum_2 + sum_3;
 }

The above code can be divided into three parts. The first part of the code is executed 100 times. No matter how large n is, it must be executed 100 times, which belongs to a constant number of times (that is, a fixed number of times). However, the time complexity represents a The change trend of algorithm execution efficiency and data scale growth , so no matter how long the constant execution time is, we can ignore it. Because it has no effect on the growth trend by itself.
The time complexity of the second and third pieces of code is O(n) and O(n²).
Combining the time complexity of these three pieces of code, we take the largest order of magnitude. Therefore, the time complexity of the entire code is O(n²).
That is to say: the total time complexity is equal to the time complexity of the code with the largest magnitude .
Then we abstract this law into a formula: if T1(n)=O(f(n)), T2(n)=O(g(n)); then T(n)=T1(n)+T2( n)=max(O(f(n)), O(g(n))) =O(max(f(n), g(n))). Simply put, it is only concerned with the most complex and most frequent
operations 3.
Multiplication rule: the complexity of the nested code is equal to the product of the complexity of the code inside and outside the nest
Assuming T1(n) = O(n), T2(n) = O(n²), then T1(n) * T2(n) = O(n³).

Example 4:

int cal(int n) {
    
    
   int ret = 0; 
   int i = 1;
   for (; i < n; ++i) {
    
    
     ret = ret + f(i);
   } 
 } 
 
 int f(int n) {
    
    
  int sum = 0;
  int i = 1;
  for (; i < n; ++i) {
    
    
    sum = sum + i;
  } 
  return sum;
 }

When the program is nested, it is the complexity of the inner layer multiplied by the complexity of the outer layer.
As in Example 4, the time complexity of the entire cal() function is, T(n) = T1(n) * T2(n) = O(n *n) = O(n²).

Common Time Complexity Metrics

complexity scale time complexity
constant order O(1)
logarithmic order O(logn)
linear order O(n)
linear logarithmic order O(nlogn)
square order O(n²)
cubic order O(n³)
kth order O(n^k)
Exponential order O(2^n)
factorial order O(n!)

The above time complexities can be divided into two categories, polynomial level and non-polynomial level . Among them, there are only two non-polynomial levels: O(2^n) and O(n!).
When n becomes larger, the execution time of non-polynomial-level algorithms will increase sharply, and the execution time of solving problems will increase infinitely. Therefore, algorithms with non-polynomial time complexity are actually very inefficient algorithms.

Common Polynomial Complexity

1.O(1)

 int i = 8;
 int j = 6;
 int sum = i + j;

In general, as long as there are no loop statements or recursive statements in the algorithm, even if there are tens of thousands of lines of code, its time complexity is Ο(1).
2. O(logn)\O(nlogn)

 i=1;
 while (i <= n)  {
    
    
   i = i * 2;
 }

According to the complexity analysis method we mentioned earlier, the third line of code is the most frequently executed loop. Therefore, as long as we can calculate how many times this line of code is executed, we can know the time complexity of the entire code.
This code can be abstracted as 2^x <= n, that is, the loop ends when 2 to the power of x is greater than n.
Therefore, the time complexity of this code is O(logn). Coefficients can be ignored when using Big O notation for complexity .
3. O(m+n), O(m*n)

int cal(int m, int n) {
    
    
  int sum_1 = 0;
  int i = 1;
  for (; i < m; ++i) {
    
    
    sum_1 = sum_1 + i;
  }

  int sum_2 = 0;
  int j = 1;
  for (; j < n; ++j) {
    
    
    sum_2 = sum_2 + j;
  }

  return sum_1 + sum_2;
}

When the complexity of the code is determined by the size of the two data, one cannot simply use the addition rule and omit one of them. Therefore, the time complexity of the above code is O(m+n).

space complexity analysis,

The full name of time complexity is asymptotic time complexity, which means the growth relationship between the execution time of the algorithm and the data size. By analogy, the full name of space complexity is asymptotic space complexity (asymptotic space complexity), which represents the growth relationship between the storage space of the algorithm and the data scale.

void print(int n) {
    
    
  int i = 0;
  int[] a = new int[n];
  for (i; i <n; ++i) {
    
    
    a[i] = i * i;
  }

  for (i = n-1; i >= 0; --i) {
    
    
    print out a[i]
  }
}

Line 3 applies for an array of int type with a size of n. Other than that, the rest of the code does not occupy more space, so the space complexity of the whole code is O(n).
Our common space complexities are O(1), O(n), O(n²), and logarithmic complexities like O(logn) and O(nlogn) are usually not used.

Guess you like

Origin blog.csdn.net/weixin_44339850/article/details/107466005