[Introduction to Data Structure] Detailed explanation of the time complexity and space complexity of the algorithm


Foreword

In the C language stage, we have learned some sorting and searching algorithms, bubble sorting, quick sorting, binary search, etc., which algorithm is better, and how do we measure the quality of an algorithm? This article is to learn the time complexity and space complexity of the algorithm, I believe you will understand after learning.


(1) Algorithm efficiency

After the algorithm is written into an executable program, it needs to consume time resources and space (memory) resources when running. Therefore, to measure the quality of an algorithm, it is generally measured from two dimensions of time and space, that is, time complexity and space complexity.

Time complexity mainly measures how fast an algorithm runs, while space complexity mainly measures the extra space required for an algorithm to run.

With the rapid development of the computer industry, the storage capacity of computers has reached a very high level, and there is no need to pay special attention to the space efficiency of an algorithm, but to focus on its time complexity.


(2) Calculation of time complexity

1) What is time complexity

The == execution times == of the basic operations in the algorithm is the time complexity of the algorithm.

Let's go directly to an example to explain the specific calculation method.

//计算Func1中++count语句总共执行了多少次?
void Func1(int N)
{
    
    
	int count = 0;
	for (int i = 0; i < N; ++i)
	{
    
    
		for (int j = 0; j < N; ++j)
		{
    
    
			++count;
		}
	}
	for (int k = 0; k < 2 * N; ++k)
	{
    
    
		++count;
	}
	int M = 10;
	while (M--)
	{
    
    
		++count;
	}
	printf("%d\n", count);
}

Algorithm execution times function expression: F(N) = N 2 + 2 * N + 10

N = 10,F(N) = 130;N = 100,F(N) = 10210;N = 1000,F(N) = 1002010;

Through calculation, it is found that the larger N is, the smaller the impact on the result, so when actually calculating the time complexity, we do not need to calculate the exact number of executions, but only need to calculate the approximate number of executions, and use the big O progressive notation (estimation) to keep only the item that has the greatest impact on the result .


2) Big O asymptotic notation (estimation)

1. Deriving the big O order method:

  1. Replace all additive constants in run time with the constant 1.

  2. In the modified run count function, only the highest order term is kept.

  3. If the highest-order term exists and is not 1, remove the constant that is multiplied by this term. The result is big O order.

These methods seem a little confusing, let's give a few examples, such as:

  1. Execution times function F(N) = 10, after using the big O asymptotic notation, the time complexity is: O(1)
  2. Execution times function F(N) = N 2 + 2 * N + 10, after using the big O asymptotic notation, the time complexity is: O(N 2 )
  3. Execution times function F(N) = 2 * N + 10, after using the big O asymptotic notation, the time complexity is: O(N)

2. The time complexity of some algorithms has the best, average and worst cases:

Worst case: maximum number of runs for any input size (upper bound)

Average case: expected number of runs for any input size

Best case: Minimum number of runs (lower bound) for any input size

Example: Search for a data X in an array of length N

Best case: find it in 1 pass

Worst case: N times to find (generally subject to the worst case)

Average case: N/2 found

In practice, the general situation is to focus on the worst running situation of the algorithm , which is a kind of bottom-line thinking. There is nothing worse than this haha, so the time complexity of searching data in the array is O(N)


3) Time Complexity Calculation Example

Example 1:

// 计算Func2的时间复杂度
void Func2(int N)
{
    
    
	int count = 0;
	for (int k = 0; k < 2 * N; ++k)
	{
    
    
		++count;
	}
	int M = 10;
	while (M--)
	{
    
    
		++count;
	}
	printf("%d\n", count);
}

Execution times function F(N) = 2 * N + 10, after using the big O asymptotic notation, the time complexity is: O(N)

(keep the item with the greatest impact and remove the coefficient)

Example 2:

// 计算Func3的时间复杂度
void Func3(int N, int M)
{
    
    
	int count = 0;
	for (int k = 0; k < M; ++k)
	{
    
    
		++count;
	}
	for (int k = 0; k < N; ++k)
	{
    
    
		++count;
	}
	printf("%d\n", count);
}

Execution times function F(M, N) = M + N, there are two unknowns M and N, we don't know who is bigger and who is smaller,

So the time complexity can be written as: O(M + N) or O(max(M, N))

If qualified:

If it can be shown that M is much larger than N, then O(M)

If it can be shown that N is much larger than M, then O(N)

If it can be shown that M and N are about the same size, then O(M) or O(N)

(This question depends on the specific scene and whether there are specific restrictions)

Example 3:

// 计算Func4的时间复杂度
void Func4(int N)
{
    
    
	int count = 0;
	for (int k = 0; k < 100; ++k)
	{
    
    
		++count;
	}
	printf("%d\n", count);
}

Execution times function F(N) = 100, after using the big O asymptotic notation, the time complexity is: O(1)

(The 1 in the time complexity O(1) does not represent 1 time, but a constant time)

Example 4:

// 计算strchr的时间复杂度
// strchr - 定位字符串中第一个出现的字符
const char* strchr(const char* str, int character);
// 函数内部大致逻辑
while(*str)
{
    
    
    if(*str == character)
        return str;
    else
        ++str;
}

The if statement in the loop needs to be compared N times, and the time complexity is: O(N)

Example 5:

// 计算冒泡排序BubbleSort的时间复杂度
void BubbleSort(int* a, int n)
{
    
    
	assert(a);
	for (size_t end = n; end > 0; --end)
	{
    
    
		int exchange = 0;
		for (size_t i = 1; i < end; ++i)
		{
    
    
			if (a[i - 1] > a[i])
			{
    
    
				Swap(&a[i - 1], &a[i]);
				exchange = 1;  //发生了交换,赋值为1
			}
		}
		if (exchange == 0)  //没有发生交换,说明已经有序了
			break;
	}
}

If the initial file is in positive order, the sorting can be completed in one scan. So the best time complexity of bubble sorting is: O(n)

If the initial file is in reverse order, with n numbers, it is necessary to perform n-1 sorting, and each time repeatedly visits the element column to be sorted, and compares two adjacent elements in turn. In this case, the number of comparisons and exchanges reaches the maximum value:

The first trip compares n-1 times,

The 2nd trip compares n-2 times,

The third trip compares n-3 times,

…………,

The n-2th trip compares 2 times,

The n-1th pass compares 1 time.

A total of n-1 times, the cumulative number of executions of each if statement is:

n-1 + n-2 + n-3 + n-4 + …… + 3 + 2 + 1 = (n - 1)(1 + n - 1) / 2 = n(n-1) / 2

The formula for the sum of the first n terms of the arithmetic sequence is :

  • Sn = n * a1 + n (n - 1)d / 2

  • Sn = n(a1 + an) / 2

So the worst time complexity of bubble sorting is: O(n 2 )

Example 6:

// 计算二分查找BinarySearch的时间复杂度(二分查找前提是排序数组)
int BinarySearch(int* a, int n, int x)
{
    
    
	assert(a);
	int begin = 0;
	int end = n - 1;
	while (begin < end)
	{
    
    
        //int mid = begin + (end - begin) / 2;
		int mid = begin + ((end - begin) >> 1); // mid时排序数组a中间元素的下标
		if (x > a[mid])  // 大于中间元素
			begin = mid + 1;
		else if (x < a[mid])  // 小于中间元素
			end = mid - 1;
		else
			return mid;
	}
	return -1;
}

Binary (halved) search, find from n numbers, calculate how many times the number has been halved before finding the number, then the time complexity is the number of halved

Graphical description:

image-20210727182353848

Note: Logarithmic time complexity notation, O(log 2 N), can be simplified as O(logN)

Example 7:

// 计算阶乘递归Fac的时间复杂度
long long Fac(size_t N)
{
    
    
	if (1 == N)
		return 1;
	return Fac(N - 1) * N;
}

The number of recursions is N times, and the number of operations in each recursion is 1, so the time complexity is: O(N)

image-20210727204719748

Example 8:

// 计算斐波那契递归Fib的时间复杂度
long long Fib(size_t N)
{
    
    
	if (N < 3)
		return 1;
	return Fib(N - 1) + Fib(N - 2);
}

Graphical description:

image-20210727211543774

F(N) = 1 + 2 + 4 + 8 + …… + 2N -3 + 2N-2 = (1 - 2 N-2 * 2)/ (1 - 2) = 2N -1 - 1

Replenish:

image-20210727210805154

The right side of this binary tree is actually missing, and there is no N - 1 layer on the right side, so the calculated F(N) needs to subtract some calls (constants) missing on the right side

After using the big O asymptotic notation, the time complexity is: O(2 N )


4) Summary

Calculate the time complexity,

It is to calculate the execution times of the basic operations in the algorithm ,

Generally, the worst operating conditions of the algorithm shall prevail.

The final result is written in Big O asymptotic notation (estimation)


5) Some thoughts

  • In fact, the recursive algorithm of this Fibonacci sequence is very slow, with a lot of repeated calculations, and the time complexity is O(2 N ), which grows exponentially

N = 10,2 N = 1024

N = 20, 2 N = 1 million+

N = 30, 2 N = 1 billion+

N = 40, 2 N = 1 trillion+

N = 50, 2 N = 1000 trillion +

So try to calculate Fibonacci numbers in a non-recursive way, use an array, or use the method of three variables

Three variables f1, f2, f3,

f1 and f2 store the first two Fib numbers Fib1 and Fib2,

The third variable f3 calculates the Fib number Fib3 = Fib1 + Fib2,

Then move forward, f1 stores Fib2, f2 stores Fib3, and then calculates Fib4 and stores it in f3

Repeat the above process continuously

  • The time complexity of binary (halved) search is O(log 2 N), and the efficiency is very high

N = 1000,log2N ≈ 10(210 = 1024)

N = 1 million, log 2 N ≈ 20 (2 20 = 1 million+)

N = 1 billion, log 2 N ≈ 30 (2 30 = 1 billion+)

N = 1 trillion, log 2 N ≈ 40 (2 40 = 1 trillion+)

From this point of view, we can know that the binary search algorithm is a very good algorithm. Assuming that the ID numbers of all Chinese people have been sorted and stored (1.4 billion), how many times does it take to find a person at most?

log 2 1.4 billion ≈ 31 times (because 2 30 = 1 billion+, 2 31 = 2 billion+)

But binary search also has a fatal problem, that is, the data must be in order, and sorting is also a rather large project, so in practice, search is not used so much, and search is more about a data structure that will be mentioned later: search tree


(3) Calculation of space complexity

The space complexity is for a **The size of the storage space temporarily occupied by the algorithm during operationThe measure of **.

Space complexity is not how many bytes of space the program occupies, because this is not very meaningful. Space complexity is calculated by the number of variables.

The space complexity calculation rules are basically similar to the time complexity, and the big O asymptotic notation is also used.

Note: The stack space (storage parameters, local variables, some register information, etc.) required by the function at runtime has been determined during compilation, so the space complexity is mainly determined by the extra space that the function applies for at runtime.

Example 1:

// 计算BubbleSort的空间复杂度
void BubbleSort(int* a, int n)
{
    
    
	assert(a);
	for (size_t end = n; end > 0; --end)
	{
    
    
		int exchange = 0;
		for (size_t i = 1; i < end; ++i)
		{
    
    
			if (a[i - 1] > a[i])
			{
    
    
				Swap(&a[i - 1], &a[i]);
				exchange = 1;
			}
		}
		if (exchange == 0)
			break;
	}
}

The array a is not created because of this algorithm, so it does not participate in the calculation. When the algorithm is running, variables (end, exchange, i) are defined, which temporarily occupy a constant storage space, so the space complexity is: O(1)

At this time, some people may have doubts: exchange is redefined every time in the loop, why does it only take up a constant space?

Because the space can be reused, each function call will open up a stack frame, which stores local variables, parameters, etc. required for the function to run.

image-20210727225110280

Example 2:

// 计算Fibonacci的空间复杂度
// 返回斐波那契数列的前n项
long long* Fibonacci(size_t n)
{
    
    
	if (n == 0)
		return NULL;
	long long* fibArray = (long long*)malloc((n + 1) * sizeof(long long));
	fibArray[0] = 0;
	fibArray[1] = 1;
	for (int i = 2; i <= n; ++i)
	{
    
    
		fibArray[i] = fibArray[i - 1] + fibArray[i - 2];
	}
	return fibArray;
}

When the algorithm is running, malloc is used to dynamically open up and temporarily occupy N storage spaces, and the space complexity is: O(N)

Example 3:

// 计算阶乘递归Fac的空间复杂度
long long Fac(size_t N)
{
    
    
	if (N == 1)
		return 1;
	return Fac(N - 1) * N;
}

Recursive calls are made N times, up to N - 1 layers of recursion, and N stack frames are opened up, each stack frame uses a constant space, and the space complexity is: O(N)

image-20210727230903093

Consider this question:

// 计算斐波那契递归Fib的空间复杂度
long long Fib(size_t N)
{
    
    
	if (N < 3)
		return 1;
	return Fib(N - 1) + Fib(N - 2);
}

The space can be reused, the stack frame is opened, and it will be destroyed when it returns. At most N - 1 layers of recursion are opened, and N - 1 stack frames are opened. Each stack frame uses a constant space, and the space complexity is: O(N)

image-20210727232310264

(4) Common complexity comparison

image-20210727231747998

Time complexity comparison: O(1) < O(logn) < O(n) < O(nlogn) < O(n 2 ) < O(n 3 ) < O(2 n )

time complexity

The complexity is over, see you next time!

Guess you like

Origin blog.csdn.net/weixin_48025315/article/details/119156170