Data Structure and Algorithm II Algorithm Analysis

1. Algorithm Analysis

We have already introduced earlier that the ultimate goal of researching algorithms is how to spend less time and occupy less memory to fulfill the same requirements, and also demonstrate the differences in time consumption and space consumption between different algorithms through cases. But we cannot quantify the time and space occupied, so next we will learn about the description and analysis of algorithm time consumption and algorithm space consumption. The analysis of the time consumption of the algorithm is called the time complexity analysis of the algorithm, and the space consumption analysis of the algorithm is called the space complexity analysis of the algorithm.

1.1 Time complexity analysis of the algorithm

We want to calculate the time consumption of the algorithm, first we have to measure the execution time of the algorithm, so how to measure it?

After-the-fact analysis estimation method:

The easier way to think of it is that we execute the algorithm several times, and then take a timer to time it by the side. This method of post-event statistics looks really good, and it does not require us to really use a calculator to calculate by the side, because the computer Both provide a timing function. This statistical method mainly uses the computer timer to compare the running time of the programs compiled by different algorithms through the designed test program and test data, so as to determine the efficiency of the algorithm, but this method has great defects: It is necessary to realize the compiled test program based on the algorithm, which usually takes a lot of time and energy. After the test is over, if it is found that the test is a very bad algorithm, then all the previous work will be in vain, and different test environments (hardware environments) The difference in the test results is also very different.

public static void main(String[] args) {
    
    
	long start = System.currentTimeMillis();
	int sum = 0;
	int n=100;
	for (int i = 1; i <= n; i++) {
    
    
		sum += i;
	}
	System.out.println("sum=" + sum);
	long end = System.currentTimeMillis();
	System.out.println(end-start);
}

Pre-analysis and estimation method:
Before the computer program is written, the algorithm is estimated according to the statistical method. After summary, we find that the time consumed by a program written in a high-level language to run on the computer depends on the following factors:

  1. Strategies and solutions adopted by the algorithm;
  2. The quality of code generated by compilation;
  3. The input scale of the problem (the so-called problem input scale is the amount of input);
  4. the speed at which the machine executes instructions;

It can be seen that, regardless of these factors related to computer hardware and software, the running time of a program depends on the quality of the algorithm and the input scale of the problem.

If the algorithm is fixed, then the execution time of the algorithm is only related to the input size of the problem.
Let's take the previous summation case as an example again for analysis.

Requirements:
Calculate the sum from 1 to 100.
The first solution:

//如果输入量为n为1,则需要计算1次;
//如果输入量n为1亿,则需要计算1亿次;
public static void main(String[] args) {
    
    
	int sum = 0;//执行1次
	int n=100;//执行1次
	for (int i = 1; i <= n; i++) {
    
    //执行了n+1次
		sum += i;//执行了n次
	}
	System.out.println("sum=" + sum);
}

The second solution:

//如果输入量为n为1,则需要计算1次;
//如果输入量n为1亿,则需要计算1次;
public static void main(String[] args) {
    
    
	int sum = 0;//执行1次
	int n=100;//执行1次
	sum = (n+1)*n/2;//执行1次
	System.out.println("sum="+sum);
}

Therefore, when the input size is n, the first algorithm is executed 1+1+(n+1)+n=2n+3 times; the second algorithm is executed 1+1+1=3 times. If we regard the loop body of the first algorithm as a whole and ignore the judgment of the end condition, then the difference between the running time of the two algorithms is actually the difference between n and 1.

Why is the loop judgment executed n+1 times in Algorithm 1, which seems to be a large number, but it can be ignored? Let's look at the next example:

Requirement:
Calculate the result of 100 1+100 2+100 3+...100 100
code:

public static void main(String[] args) {
    
    
	int sum=0;
	int n=100;
	for (int i = 1; i <=n ; i++) {
    
    
		for (int j = 1; j <=n ; j++) {
    
    
			sum+=i;
		}
	}
	System.out.println("sum="+sum);
}

In the above example, if we want to accurately study how many times the condition of the loop is executed, it is a very troublesome thing, and since the code that actually calculates the sum is the loop body of the inner loop, when studying the efficiency of the algorithm , we only consider the execution times of the core code, which simplifies the analysis.

We study the complexity of the algorithm, focusing on an abstraction (law) of the growth of the algorithm when the input scale continues to increase, rather than precisely locating how many times it needs to be executed, because if this is the case, we have to consider recompilation Issues such as long-term optimization, it is easy to fall into the primary and secondary.

We don't care about the language used to write the program, nor what kind of computer these programs will run on, we only care about the algorithm it implements. In this way, regardless of the increment of the loop index and the conditions of loop termination, variable declaration, printing results, etc., when analyzing the running time of the program, the most important thing is to regard the program as an algorithm or a series of algorithms independent of the programming language. step. When we analyze the running time of an algorithm, the most important thing is to associate the number of core operations with the input scale.

insert image description here

1.1.1 Asymptotic growth of functions

Concept
Given two functions f(n) and g(n), if there exists an integer N such that f(n) is always greater than g(n) for all n>N, then we say f(n) grows asymptotically faster than g(n).

The concept seems a bit difficult to understand, so let's do a few tests next.

Test 1:
Assume that the input sizes of the four algorithms are all n:

  1. Algorithm A1 has to do 2n+3 operations, which can be understood as follows: first execute n cycles, after the execution is completed, there will be another n cycles, and finally there will be 3 operations;
  2. Algorithm A2 needs 2n ​​operations;
  3. Algorithm B1 has to do 3n+1 operations, which can be understood as follows: first execute n loops, then execute n loops, then execute n loops, and finally have 1 operation.
  4. Algorithm B2 needs 3n operations;

So, which of the above algorithms is faster?
insert image description here
insert image description here
Through the data table, compare Algorithm A1 and Algorithm B1:
when the input scale n=1, A1 needs to be executed 5 times, and B1 needs to be executed 4 times, so the efficiency of A1 is lower than that of B1;
when the input scale n=2, A1 It needs to be executed 7 times, and B1 needs to be executed 7 times, so the efficiency of A1 is the same as that of B1;
when the input size n>2, the number of executions required by A1 is always less than the number of executions required by B1, so the efficiency of A1 is higher than that of B1 efficient;

So we can conclude that:

When the input scale is n>2, the asymptotic growth of Algorithm A1 is smaller than that of Algorithm B1.
By observing the line graph, we found that as the input scale increases, Algorithm A1 and Algorithm A2 gradually overlap together, and Algorithm B1 and Algorithm B1 gradually overlap. Algorithm B2 gradually overlaps to one piece, so we conclude that
as the input size increases, the constant operation of the algorithm is negligible

Test 2:
Assume that the input sizes of the four algorithms are all n:

  1. Algorithm C1 needs to do 4n+8 operations
  2. Algorithm C2 needs to do n operations
  3. Algorithm D1 needs to do 2n^2 operations
  4. Algorithm D2 needs to do n^2 operations

So which of the above algorithms is faster?
insert image description here
insert image description here
Through the data table, compare Algorithm C1 and Algorithm D1:
When the input scale n<=3, the execution times of Algorithm C1 is more than that of Algorithm D1, so the efficiency of Algorithm C1 is lower;
when the input scale n>3, the execution times of Algorithm C1 is less than Algorithm D1, therefore, Algorithm D2 is less efficient,
so, in general, Algorithm C1 is better than Algorithm D1.

Through the line chart, compare and contrast algorithms C1 and C2:

As the input size increases, Algorithm C1 and Algorithm C2 almost overlap

Through the line chart, compare the algorithm C series and the algorithm D series:
as the input scale increases, even if the constant factor in front of n^2 is removed, the number of times of the D series is much higher than that of the C series.

Therefore, it can be concluded that
as the input size increases, the constant multiplied by the highest order term can be ignored

Test 3:
Suppose the input scale of the four algorithms is n:
Algorithm E1: 2n^2+3n+1;
Algorithm E2: n^2
Algorithm F1: 2n^3+3n+1
Algorithm F2: n^3
Then the above algorithm , which is faster?

insert image description here
insert image description here
Through the data table, compare Algorithm E1 and Algorithm F1:
When n=1, the number of executions of Algorithm E1 and Algorithm F1 is the same;
when n>1, the number of executions of Algorithm E1 is far less than the number of executions of Algorithm F1;
so Algorithm E1 Overall it is due to Algorithm F1.

Through the line chart, we can see that the algorithm F series will become special with the increase of n, and the algorithm E series will become slower with the increase of n compared with the algorithm F, so it can be concluded that the
highest The index of the item is large, and as n grows, the result will also grow very fast

Test 4:
Assume that the input size of the five algorithms is n:
Algorithm G: n^3;
Algorithm H: n^2;
Algorithm I: n
Algorithm J: logn
Algorithm K: 1

So which of the above algorithms is more efficient?
insert image description here
insert image description here
By observing the data tables and line charts, it is easy to draw a conclusion:
the smaller the highest power of n in the algorithm function, the higher the algorithm efficiency

In summary, when we compare the growth of the algorithm with the input scale, the following rules can be followed:

  1. Constants in algorithmic functions can be ignored;
  2. The constant factor of the highest power in the algorithm function can be ignored;
  3. The smaller the highest power in the algorithm function, the higher the algorithm efficiency.

1.1.2 Algorithm time complexity

1.1.2.1 Big O notation

definition

When analyzing the algorithm, the total execution times T(n) of the statement is a function of the problem size n, and then analyze the variation of T(n) with n and determine the magnitude of T(n). The time complexity of the algorithm is the time measure of the algorithm, denoted as: T(n)=O(f(n)). It means that as the problem size n increases, the growth rate of the algorithm execution time is the same as the growth rate of f(n), which is called the asymptotic time complexity of the algorithm, or time complexity for short, where f(n) is the problem size some function of n.

Here, we need to clarify one thing: the number of executions = execution time.
Use capital O() to reflect the notation of the time complexity of the algorithm, which we call the big O notation. In general, as the input size n increases, the algorithm with the slowest growth of T(n) is the optimal algorithm.

Below we use big O notation to express the time complexity of some summation algorithms:

Algorithm one

public static void main(String[] args) {
    
    
	int sum = 0;//执行1次
	int n=100;//执行1次
	sum = (n+1)*n/2;//执行1次
	System.out.println("sum="+sum);
}

Algorithm two

public static void main(String[] args) {
    
    
	int sum = 0;//执行1次
	int n=100;//执行1次
	for (int i = 1; i <= n; i++) {
    
    
		sum += i;//执行了n次
	}
	System.out.println("sum=" + sum);
}

Algorithm three

public static void main(String[] args) {
    
    
	int sum=0;//执行1次
	int n=100;//执行1次
	for (int i = 1; i <=n ; i++) {
    
    
		for (int j = 1; j <=n ; j++) {
    
    
			sum+=i;//执行n^2次
		}
	}
	System.out.println("sum="+sum);
}

If the execution times of the judgment condition and the execution times of the output statement are ignored, then when the input size is n, the execution times of the above algorithms are:
Algorithm 1: 3 times
Algorithm 2: n+3 times
Algorithm 3: n^2+2 Second-rate

If the time complexity of each of the above algorithms is expressed in big O notation, how should it be expressed? Based on our analysis of the asymptotic growth of the function, the following rules can be used to derive the representation of the big O order:

  1. Replace all additive constants in runtime with constant 1;
  2. In the modified number of runs, only higher-order terms are kept;
  3. If the highest-order term exists, and the constant factor is not 1, remove the constant multiplied by this term;

Therefore, the big O notation of the above algorithms are:
Algorithm 1: O(1)
Algorithm 2: O(n)
Algorithm 3: O(n^2)

1.1.2.2 Common big-O order

1. Linear order
generally contains non-nested loops involving linear order. The linear order means that as the input scale expands, the corresponding number of calculations increases linearly, for example:

public static void main(String[] args) {
    
    
	int sum = 0;
	int n=100;
	for (int i = 1; i <= n; i++) {
    
    
		sum += i;
	}
	System.out.println("sum=" + sum);
}

For the above code, the time complexity of its loop is O(n), because the code in the loop body needs to be executed n times

2. Square order
General nested loops belong to this time complexity

public static void main(String[] args) {
    
    
	int sum=0,n=100;
	for (int i = 1; i <=n ; i++) {
    
    
		for (int j = 1; j <=n ; j++) {
    
    
			sum+=i;
		}
	}
	System.out.println(sum);
}

In the above code, n=100, that is to say, every time the outer loop is executed, the inner loop is executed 100 times. If the program wants to get out of these two loops, it needs to execute 100*100 times. It is the square of n, so the time complexity of this code is O(n^2).

3. Cubic order
Generally, three-level nested loops belong to this time complexity

public static void main(String[] args) {
    
    
	int x=0,n=100;
	for (int i = 1; i <=n ; i++) {
    
    
		for (int j = i; j <=n ; j++) {
    
    
			for (int j = i; j <=n ; j++) {
    
    
				x++;
			}
		}
	}
	System.out.println(x);
}

In the above code, n=100, that is to say, every time the outer loop is executed, the middle loop is executed 100 times, and every time the middle loop is executed, the innermost loop needs to be executed 100 times. Out of the three loops, it needs to be executed 100100100 times, which is the cube of n, so the time complexity of this code is O(n^3).

4. Logarithmic order
Logarithm belongs to the content of high school mathematics. Our analysis program is mainly based on programs and supplemented by mathematics, so don't worry too much.

int i=1,n=100;
while(i<n){
    
    
	i = i*2;
}

Since each time after i*2, it is one step closer to n, assuming that there are x 2s multiplied to be greater than n, the loop will be exited. Since it is 2^x=n, x=log(2)n is obtained, so the time complexity of this cycle is O(logn);

For the logarithmic order, as the input scale n increases, no matter what the base is, their growth trend is the same, so we will ignore the base.

insert image description here
insert image description here
5. Constant order
Generally, those that do not involve loop operations are constant orders, because it will not increase the number of operations as n grows. For example:

public static void main(String[] args) {
    
    
	int n=100;
	int i=n+2;
	System.out.println(i);
}

The above code, regardless of the input size n, is executed twice. According to the big O derivation rule, the constant is replaced by 1, so the time complexity of the above code is O(1)

Here is a summary of common time complexities:

insert image description here
Their complexity, from low to high, is:

 O(1)<O(logn)<O(n)<O(nlogn)<O(n^2)<O(n^3)

According to the previous line chart analysis, we will find that starting from the square order, as the input scale increases, the time cost will increase sharply. Therefore, our algorithm pursues O(1), O( logn), O(n), O(nlogn) these kinds of time complexity, and if the time complexity of the algorithm is found to be square order, cubic order or more complex, then we can divide this algorithm is not advisable , needs to be optimized.

1.1.2.3 Time Complexity Analysis of Function Call

Before, we analyzed the time complexity of the algorithm code in a single function, and then we analyzed the time complexity during the function call.

Case number one:

public static void main(String[] args) {
    
    
	int n=100;
	for (int i = 0; i < n; i++) {
    
    
		show(i);
	}
}

private static void show(int i) {
    
    
	System.out.println(i);
}

In the main method, there is a for loop, and the loop body calls the show method. Since only one line of code is executed inside the show method, the time complexity of the show method is O(1), and the time complexity of the main method is O( n)

Case two:

public static void main(String[] args) {
    
    
	int n=100;
	for (int i = 0; i < n; i++) {
    
    
		show(i);
	}
}

private static void show(int i) {
    
    
	for (int j = 0; j < i; i++) {
    
    
		System.out.println(i);
	}
}

In the main method, there is a for loop, and the loop body calls the show method. Since there is also a for loop inside the show method, the time complexity of the show method is O(n), and the time complexity of the main method is O(n ^2)

Case three:

public static void main(String[] args) {
    
    
	int n=100;
	show(n);
	for (int i = 0; i < n; i++) {
    
    
		show(i);
	}
	for (int i = 0; i < n; i++) {
    
    
		for (int j = 0; j < n; j++) {
    
    
			System.out.println(j);
		}
	}
}

private static void show(int i) {
    
    
	for (int j = 0; j < i; i++) {
    
    
		System.out.println(i);
	}
}

In the show method, there is a for loop, so the time complexity of the show method is O(n). In the main method, the number of internal executions of the line of code show(n) is n, and the first for loop calls show method, so its execution times is n^2, only one line of code is executed in the second nested for loop, so its execution times is n^2, then the total execution times of the main method is n+n^2+n^2=2n^2+n. According to the big O derivation rules, remove n to retain the highest-order item, and remove the constant factor 2 of the highest-order item, so the final time complexity of the main method is O(n^2)

1.1.2.4 Worst case

From a psychological point of view, everyone has an expectation for what happens. For example, when seeing a half glass of water, someone will say: Wow, there is still half a glass of water! But some people will say: God, there is only half a glass of water. Most people are worried about future failures, and tend to plan for the worst when anticipating. In this way, even if the worst result occurs, the parties are psychologically prepared and it is easier to accept the result. If the worst outcome does not come, the parties will be very happy.

Algorithm analysis is similar. If there is a requirement:
there is an array storing n random numbers, please find the specified number from it.

public int search(int num){
    
    
	int[] arr={
    
    11,10,8,9,7,22,23,0};
	for (int i = 0; i < arr.length; i++) {
    
    
		if (num==arr[i]){
    
    
			return i;
	    }
	}
	return -1;
}

Best case:
the first number you look up is the expected number, then the time complexity of the algorithm is O(1)

Worst case:
the last number to find is the expected number, then the time complexity of the algorithm is O(n)

Average case:
the average cost of any numeric lookup is O(n/2)

The worst case is a guarantee. In the application, this is the most basic guarantee. Even in the worst case, the service can be provided normally. Therefore, unless otherwise specified, the running time we mentioned refers to Worst case running time.

1.2 Space complexity analysis of the algorithm

Computer hardware and software have experienced a relatively long history of evolution, especially as memory that provides an environment for computing. From the earlier 512k, it has experienced 1M, 2M, 4M...etc., to the current 8G, and even 16G and 32G, so in the early days, the memory usage of the algorithm during operation is also a problem that often needs to be considered. We can use the space complexity of the algorithm to describe the memory usage of the algorithm.

1.2.1 Common memory usage in java

1. Basic data type memory usage:
insert image description here

2. The way a computer accesses memory is one byte at a time
insert image description here
3. A reference (machine address) needs 8 bytes to represent:
for example: Date date = new Date(), then the variable date needs to occupy 8 bytes to represent

4. Create an object, such as new Date(). In addition to the memory occupied by the data stored inside the Date object (such as year, month, day, etc.), the object itself also has memory overhead. The overhead of each object is 16 bytes. Used to save the header information of the object.

5. The use of general memory, if it is less than 8 bytes, it will be automatically filled to 8 bytes:
insert image description here
6. Arrays in java are limited to objects, and they generally require additional memory because of the record length, a primitive data type Arrays of s generally require 24 bytes of header information (16 overhead for the own object, 4 bytes for the length and 4 padding bytes) plus the memory required to hold the value.

1.2.2 Space complexity of the algorithm

Understanding the most basic mechanism of java's memory can effectively help us estimate the memory usage of a large number of programs.
The calculation formula of the space complexity of the algorithm is recorded as: S(n)=O(f(n)), where n is the input size, and f(n) is the function of the statement about the storage space occupied by n.

Case:
Reverse the specified array elements and return the reversed content.
Solution one:

public static int[] reverse1(int[] arr){
    
    
	int n=arr.length;//申请4个字节
	int temp;//申请4个字节
	for(int start=0,end=n-1;start<=end;start++,end--){
    
    
		temp=arr[start];
		arr[start]=arr[end];
		arr[end]=temp;
	}
	return arr;
}

Solution two:

public static int[] reverse2(int[] arr){
    
    
	int n=arr.length;//申请4个字节
	int[] temp=new int[n];//申请n*4个字节+数组自身头信息开销24个字节
	for (int i = n-1; i >=0; i--) {
    
    
		temp[n-1-i]=arr[i];
	}
	return temp;
}

Ignoring the memory occupied by the judgment condition, we get the memory usage as follows:

Algorithm 1:
Regardless of the size of the incoming array, always apply for an additional 4+4=8 bytes;

Algorithm 2:
4+4n+24=4n+28;

According to the big O derivation rule, the space complexity of Algorithm 1 is O(1), and the space complexity of Algorithm 2 is O(n), so from the perspective of space occupation, Algorithm 1 is better than Algorithm 2.

Since there is a memory garbage collection mechanism in java, and jvm also optimizes the memory usage of programs (such as just-in-time compilation), we cannot accurately evaluate the memory usage of a java program, but knowing the basic memory usage of java allows us to The memory usage of the java program is estimated.

Since the memory of current computer equipment is generally relatively large, basically personal computers start with 8G, and the large ones can reach 32/64G, so the memory usage is generally not the bottleneck of our algorithm. In general, we directly refer to the complexity. The default is the time complexity of the algorithm .

However, if the program you are doing is embedded development, especially the built-in program on some sensor devices, since the memory of these devices is very small, generally a few kb, there is a requirement for the space complexity of the algorithm at this time, but generally Those who do java development are basically server development, and generally there is no such problem.

Guess you like

Origin blog.csdn.net/qq_33417321/article/details/121945595