"Data Structures and Algorithms beauty" <01> complexity analysis (on): how to analyze the efficiency of algorithms and resource consumption statistics?

We all know, data structures and algorithms to solve the problem itself, "fast" and "province", namely how to make your code run faster, how to make the code more save storage space . Therefore, the efficiency of the algorithm is a very important consideration indicators .

How to measure the efficiency of the algorithm that the code you write it? Here we must use what we talk about today: time, space complexity analysis . In fact, as long as the mentioned data structures and algorithms, it will not do without time and space complexity analysis. And I personally think that the complexity of the analysis is the essence of the entire algorithm to learn, once you have it, the contents of data structures and algorithms basically mastered the half .

The complexity of the analysis is too important, so I am going to use two terms of content. Hopefully after you finish school this content, under any scenario, the face of the complexity of any code analysis, you can do "Paodingjieniu" -like ease.

 

 

Why do we need the complexity of the analysis?

You may have some doubts, I run the code again, by statistics, monitor, you can get the algorithm execution time and memory size. Why do time and space complexity analysis? This method of analysis than I can actually run it again to get the data more accurate?

First of all, I can say for sure that you perform this assessment of the efficiency of the algorithm is correct. Many data structures and algorithms books back to this method has a name, called after the statistics . However, this kind of statistical methods have a very big limitations .

 

1. Test results are very dependent test environment

The test results would be different hardware test environment has a great influence. For example, we take the same piece of code, respectively, with Intel Core i9 processor and Intel Core i3 processors to run, needless to say, i9 processor speed is much faster than the i3 processor. There are, for example, was originally on the speed of a machine code that executes faster than b codes, such as when we change to another machine, there may be diametrically opposite results.

 

2. Test results are greatly affected by the size of the data

Later we'll talk about sorting algorithms, we acquire it for example. A sorting algorithm for the same degree of order is not the same data to be sorted, ordering the execution time will be very different. In extreme cases, if the data is already ordered, that sort algorithm does not need to do anything, the execution time will be very short. In addition, if the test data is too small, the test results may not reflect the true performance of the algorithm. For example, for small-scale data sorting, insertion sort might actually be faster than quick sort!

 

So I need to do a specific test data to test the efficiency of the algorithm implementation method it can be roughly estimated . This is what we talk about today's time and space complexity analysis.

 

 

Big O notation complexity

The efficiency of the algorithm, roughly speaking, is the time of the algorithm code execution. But how, without running the code, with the "naked eye" to get a piece of code execution time is it?

There is a very simple code segments, seeking 1,2,3 ... n and accumulated. Now, I'll take you one to estimate the execution time of this code.

int cal(int n) {
   int sum = 0;
   int i = 1;
   for (; i <= n; ++i) {
     sum = sum + i;
   }
   return sum; 
}

From the point of view of the CPU, each row of a similar code execution operations: read data - operation - write data . Although the number corresponding to each line of code executed by the CPU, the execution time is different, but here we are just a rough estimate, it can be assumed that each line of code execution time are the same for unit_time. On the basis of this assumption above, the total execution time of this code is how much?

Second and third lines, respectively, required a unit_time execution time, row 4, 5 run n times, it is necessary to 2n * unit_time execution time, the total execution time code is (2n + 2) * unit_time. Can be seen, the execution time of all codes T (n) is proportional to the number of times each line of code .

According to this analysis of ideas, we will look at the code.

int cal(int n) {
   int sum = 0;
   int i = 1;
   int j = 1;
   for (; i <= n; ++i) {
     j = 1;
     for (; j <= n; ++j) {
       sum = sum +  i * j;
     }
   }
}

We still assume that the execution time of each statement is unit_time. That total execution time of this code T (n) is how much?

2,3,4 lines, each row needs one unit_time execution time, the fifth and sixth line loop is executed n times, 2n * unit_time required execution time, execution of the loop line 7-8 n2 times, so it is necessary 2n2 * unit_time execution time. Therefore, the whole sections of code total execution time T (n) = (2n2 + 2n + 3) * unit_time.

Although we do not know the exact value unit_time, but by derivation of these two pieces of code execution time, we can get a very important law, that is, the execution time of all codes T (n) and the number of times each line of code is n proportional .

We can summarize this law into a formula. Note that the Big O will debut !

 

 

Let me explain this specific formula. Wherein, T (n) we have already mentioned, it represents a time code execution; n-represents the size of data size; f (n) represents the sum of the number of each line of code is executed. Because this is a formula, so with f (n) is represented. Formula from O, the execution time code T (n) and f (n) is proportional to the expression.

Therefore, the first example of T (n) = O (2n + 2), the second example of T (n) = O (2n2 + 2n + 3). This is the big O time complexity notation . Big O time complexity does not actually perform the specific code indicating real time, but rather the size of code execution time with the data growth trends, therefore, also known as progressive complexity of time (asymptotic time complexity), short time complexity .

When large n, you can think of it as 10000,100000. The formula of three parts of the low-order, constant coefficient is not about growth trends, it can be ignored . We only need to record a maximum order of magnitude on it, if big O notation time complexity just say that two pieces of code, it can be written as: T (the n-) = O (the n-); T (the n-) O = (N2) .

 

 

Time complexity analysis

Introduced in front of the origin O and representation of large time complexity. Now we look at how to analyze the time complexity of a piece of code? I've got three more practical ways to share with you.

 

1. only concerned with the most number of cycles execute a piece of code

I just said, the Big O complexity of such a representation only shows a trend . We often ignore the constant in the equation, the low-order coefficient, only need to record a maximum order of the magnitude of it. Therefore, we analyze in an algorithm, the time when a piece of code complexity, only focus on the implementation of the highest number of cycles that a piece of code on it . N of this core code execution times of the order, that is, the whole time to analyze the complexity of the code.

To facilitate your understanding, I take the previous example to illustrate.

 

int cal(int n) {
   int sum = 0;
   int i = 1;
   for (; i <= n; ++i) {
     sum = sum + i;
   }
   return sum;
}

 

Wherein the second and third lines are constants stage execution time, regardless of the size of n, the complexity for no effect. Loop executes the highest number is 4, 5 lines of code, so this piece of code to be the focus of analysis. We also talked about earlier, two lines of code to be executed n times, so the total time complexity is O (n).

 

 

2. addition rule: the complexity of the overall complexity is equal to the order of the largest part of the code

I have here a piece of code. You can first try to analyze it, and then look down with my analysis of whether the same ideas.

 

int cal(int n) {
   int sum_1 = 0;
   int p = 1;
   for (; p < 100; ++p) {
     sum_1 = sum_1 + p;
   }

   int sum_2 = 0;
   int q = 1;
   for (; q < n; ++q) {
     sum_2 = sum_2 + q;
   }
 
   int sum_3 = 0;
   int i = 1;
   int j = 1;
   for (; i <= n; ++i) {
     j = 1; 
     for (; j <= n; ++j) {
       sum_3 = sum_3 +  i * j;
     }
   }
 
   return sum_1 + sum_2 + sum_3;
 }

 

The code is divided into three parts, namely, seeking sum_1, sum_2, sum_3. We can analyze the time complexity of each part separately, and then put them together, and then take an order of magnitude as the maximum complexity of the whole code.

The time complexity of the first paragraph is how much? This code loop is executed 100 times, so the execution time is a constant, nothing to do with the size of n.

Here I have to emphasize that, even if this code is 10,000 cycles of 100,000 times, as long as a known number, nothing to do with n, is still a constant level of execution time. When n infinite time, it can be ignored. Although the execution time of the code will be greatly affected, but the complexity of the concept back to the time, it represented a trend change is the efficiency of algorithms and data volume growth, so no matter how much time constants of execution, we can ignored. Because it itself has no effect on growth.

The time complexity of the code and that the second paragraph of the third paragraph of the code is how much? The answer is O (n) and O (n2), you should be able to easily analyze it, I do not long-winded.

Overall time complexity of the three sections of the code, we take the largest of magnitude. Therefore, the time complexity on the whole code is O (n2). In other words: The total time complexity is equivalent to the time complexity of the order of the largest part of the code . Then we this law abstracted formula is: If T1 (n) = O (f (n)), T2 (n) = O (g (n)); then T (n) = T1 (n ) + T2 ( n) = max (O (f (n)), O (g (n))) = O (max (f (n), g (n))).

 

 

3. Multiplication Rule: complexity of nested tags is equal to the product of the nested inner and outer code complexity

I just talked about the addition rule a complexity analysis, here there is a multiplication rule . Analogy a bit, you should be able to "guess" what the formula is, right? If T1 (n) = O (f (n)), T2 (n) = O (g (n)); then T (n) = T1 (n ) * T2 (n) = O (f (n)) * O (g (n)) = O (f (n) * g (n)). That is, assuming T1 (n) = O (n ), T2 (n) = O (n2), the T1 ( n) * T2 (n) = O (n3). The implementation of specific code, we can multiply nested loops as a rule, I give an example to explain to you.

 

int cal(int n) {
   int ret = 0; 
   int i = 1;
   for (; i < n; ++i) {
     right = K + f (i);
   } 
 } 
 
 int f(int n) {
  int sum = 0;
  int i = 1;
  for (; i < n; ++i) {
    sum = sum + i;
  } 
  return sum;
 }

 

We Alone cal () function. Suppose f () is just an ordinary operation, the time that the first 4 to 6 lines of complexity is, T1 (n) = O (n). But the function f () itself is not a simple operation, its time complexity is T2 (n) = O (n), therefore, the entire CAL () function of time complexity is, T (n) = T1 (n) * T2 (n) = O (n * n) = O (n2).

I just say the analytical skills three kinds of complexity. However, you do not bother to use memory. In fact, the complexity of the analysis is the key to this thing. "Proficient." You just look at the case, more analysis, we will be able to achieve "no stroke win a trick."

 

 

Some common examples of time complexity analysis

 Although the code vary, but the common measure of complexity level is not much. I was a bit summed up what these complex metrics grade level covers almost all the code complexity metrics can contact you in the future.

 

For the newly listed measure of complexity level , we can roughly be divided into two categories , polynomial and non-polynomial order of magnitude of the order . Wherein the non-polynomial order of only two: O (2n) and O (n-!) .

我们把时间复杂度为非多项式量级的算法问题叫作 NP(Non-Deterministic Polynomial,非确定多项式)问题。

当数据规模 n 越来越大时,非多项式量级算法的执行时间会急剧增加,求解问题的执行时间会无限增长。所以,非多项式时间复杂度的算法其实是非常低效的算法。因此,关于 NP 时间复杂度我就不展开讲了。我们主要来看几种常见的多项式时间复杂度。

 

1. O(1)

首先你必须明确一个概念,O(1) 只是常量级时间复杂度的一种表示方法,并不是指只执行了一行代码。比如这段代码,即便有 3 行,它的时间复杂度也是 O(1),而不是 O(3)。

 int i = 8;
 int j = 6;
 int sum = i + j;

我稍微总结一下,只要代码的执行时间不随 n 的增大而增长,这样代码的时间复杂度我们都记作 O(1)。或者说,一般情况下,只要算法中不存在循环语句、递归语句,即使有成千上万行的代码,其时间复杂度也是Ο(1)

 

2. O(logn)、O(nlogn)

对数阶时间复杂度非常常见,同时也是最难分析的一种时间复杂度。我通过一个例子来说明一下。

 i=1;
 while (i <= n)  {
   i = i * 2;
 }

根据我们前面讲的复杂度分析方法,第三行代码是循环执行次数最多的。所以,我们只要能计算出这行代码被执行了多少次,就能知道整段代码的时间复杂度。

从代码中可以看出,变量 i 的值从 1 开始取,每循环一次就乘以 2。当大于 n 时,循环结束。还记得我们高中学过的等比数列吗?实际上,变量 i 的取值就是一个等比数列。如果我把它一个一个列出来,就应该是这个样子的:

所以,我们只要知道 x 值是多少,就知道这行代码执行的次数了。通过 2x=n 求解 x 这个问题我们想高中应该就学过了,我就不多说了。x=log2n,所以,这段代码的时间复杂度就是 O(log2n)。

现在,我把代码稍微改下,你再看看,这段代码的时间复杂度是多少?

 i=1;
 while (i <= n)  {
   i = i * 3;
 }

根据我刚刚讲的思路,很简单就能看出来,这段代码的时间复杂度为 O(log3n)。

实际上,不管是以 2 为底、以 3 为底,还是以 10 为底,我们可以把所有对数阶的时间复杂度都记为 O(logn)。为什么呢?

我们知道,对数之间是可以互相转换的,log3n 就等于 log32 * log2n,所以 O(log3n) = O(C * log2n),其中 C=log32 是一个常量。基于我们前面的一个理论:在采用大 O 标记复杂度的时候,可以忽略系数,即 O(Cf(n)) = O(f(n))。所以,O(log2n) 就等于 O(log3n)。因此,在对数阶时间复杂度的表示方法里,我们忽略对数的“底”,统一表示为 O(logn)。

如果你理解了我前面讲的 O(logn),那 O(nlogn) 就很容易理解了。还记得我们刚讲的乘法法则吗?如果一段代码的时间复杂度是 O(logn),我们循环执行 n 遍,时间复杂度就是 O(nlogn) 了。而且,O(nlogn) 也是一种非常常见的算法时间复杂度。比如,归并排序、快速排序的时间复杂度都是 O(nlogn)。

 

3. O(m+n)、O(m*n)

我们再来讲一种跟前面都不一样的时间复杂度,代码的复杂度由两个数据的规模来决定。老规矩,先看代码!

int cal(int m, int n) {
  int sum_1 = 0;
  int i = 1;
  for (; i < m; ++i) {
    sum_1 = sum_1 + i;
  }

  int sum_2 = 0;
  int j = 1;
  for (; j < n; ++j) {
    sum_2 = sum_2 + j;
  }

  return sum_1 + sum_2;
}

从代码中可以看出,m 和 n 是表示两个数据规模。我们无法事先评估 m 和 n 谁的量级大,所以我们在表示复杂度的时候,就不能简单地利用加法法则,省略掉其中一个。所以,上面代码的时间复杂度就是 O(m+n)。

针对这种情况,原来的加法法则就不正确了,我们需要将加法规则改为:T1(m) + T2(n) = O(f(m) + g(n))。但是乘法法则继续有效:T1(m)*T2(n) = O(f(m) * f(n))。

 

 

空间复杂度分析

前面,咱们花了很长时间讲大 O 表示法和时间复杂度分析,理解了前面讲的内容,空间复杂度分析方法学起来就非常简单了。

前面我讲过,时间复杂度的全称是渐进时间复杂度,表示算法的执行时间与数据规模之间的增长关系。类比一下,空间复杂度全称就是渐进空间复杂度(asymptotic space complexity),表示算法的存储空间与数据规模之间的增长关系。

我还是拿具体的例子来给你说明。(这段代码有点“傻”,一般没人会这么写,我这么写只是为了方便给你解释。)

 

void print(int n) {
  int i = 0;
  int[] a = new int[n];
  for (i; i <n; ++i) {
    a[i] = i * i;
  }

  for (i = n-1; i >= 0; --i) {
    print out a[i]
  }
}

 

跟时间复杂度分析一样,我们可以看到,第 2 行代码中,我们申请了一个空间存储变量 i,但是它是常量阶的,跟数据规模 n 没有关系,所以我们可以忽略。第 3 行申请了一个大小为 n 的 int 类型数组,除此之外,剩下的代码都没有占用更多的空间,所以整段代码的空间复杂度就是 O(n)。

我们常见的空间复杂度就是 O(1)、O(n)、O(n2),像 O(logn)、O(nlogn) 这样的对数阶复杂度平时都用不到。而且,空间复杂度分析比时间复杂度分析要简单很多。所以,对于空间复杂度,掌握刚我说的这些内容已经足够了。

 

 

内容小结

基础复杂度分析的知识到此就讲完了,我们来总结一下。

复杂度也叫渐进复杂度,包括时间复杂度和空间复杂度,用来分析算法执行效率与数据规模之间的增长关系,可以粗略地表示,越高阶复杂度的算法,执行效率越低。常见的复杂度并不多,从低阶到高阶有:O(1)、O(logn)、O(n)、O(nlogn)、O(n2)。等你学完整个专栏之后,你就会发现几乎所有的数据结构和算法的复杂度都跑不出这几个。

 

复杂度分析并不难,关键在于多练。 之后讲后面的内容时,我还会带你详细地分析每一种数据结构和算法的时间、空间复杂度。只要跟着我的思路学习、练习,你很快就能和我一样,每次看到代码的时候,简单的一眼就能看出其复杂度,难的稍微分析一下就能得出答案。

 

 

 

 


注: 本文出自极客时间(数据结构与算法之美),请大家多多支持王争老师。如有侵权,请及时告知。

 

Guess you like

Origin www.cnblogs.com/zzd0916/p/11926793.html