Lower time complexity but slower program execution?

I did an algorithm problem yesterday and found that the algorithm program with lower time complexity executes more slowly. During the analysis, I have a deeper understanding of the performance of programs and algorithms. The following is limited to the Go language implementation. The submission is on the English leetcode website. The result is different from the Chinese website. I don’t know if there is any difference in the calculation method.

Leetcode Question 239-Maximum sliding window of data

I only thought of the brute force solution. After each sliding discarding and adding a new value, the maximum value in the window is judged. The time complexity is O((n-k+1)*k). The official recommended deque to implement a linear time complexity, I algorithm Xiaohao see the bad results of the performance of his violent solution.

However, the fastest answer I saw after using deque AC turned out to be a violent method. Is the leetcode calculation inaccurate? Let me do a benchmark to verify it.

This is the fastest brute force algorithm in the answer:

func maxSlidingWindow(nums []int, k int) []int {
	res := make([]int, (len(nums)+1)-k)
	max := -10000

	for i := 0; i <= len(nums)-k; i++ {
		if max > -10000 && nums[i-1] != max && nums[i+k-1] < max {
			res[i] = max
			continue
		} else {
			max = -10000
		}
		for _, v := range nums[i : i+k] {
			if v > max {
				max = v
			}
		}
		res[i] = max
	}
	return res
}

I use a simple use case to test:

func Benchmark_maxSlidingWindow(b *testing.B) {
	a := []int{9, 10, 9, -7, -4, -8, 2, -6}
	for i := 0; i < b.N; i++ {
		_ = maxSlidingWindow(a, 5)
	}
}

The result is 38.4 ns/op

Benchmark_maxSlidingWindow-8   	27957754	        38.4 ns/op	      32 B/op	       1 allocs/op

This is the algorithm of the deque:

func maxSlidingWindow(nums []int, k int) []int {
	res := make([]int, (len(nums)+1)-k)
	win := make([]int, 0, k)
	for i, v := range nums {

		for len(win) > 0 && win[len(win)-1] < v {
			win = win[:len(win)-1]
		}
		win = append(win, v)
		if i >= k && win[0] == nums[i-k] {
			win = win[1:]
		}
		if i+1 >= k {
			res[i-k+1] = win[0]
		}
	}
	return res
}

The result is 172 ns/op

Benchmark_maxSlidingWindow-8   	 6647554	       172 ns/op	     112 B/op	       6 allocs/op

The brute force algorithm is indeed much faster. The results of the benchmark gave me inspiration: the deque made 6 memory allocations and the brute force solution was done only once. This not only leads to higher memory consumption, but also the reason for lower algorithm time complexity but longer program time.

In the algorithm of the deque, two slices are created, and the operations of adding elements to the slices are all append, making the program perform multiple memory allocations and copies. The brute force solution only creates a slice and declares its length. When modifying, it directly assigns values to the elements and does appendnot require memory allocation operations.

Some optimizations to the deque method:

func maxSlidingWindow(nums []int, k int) []int {
	res := make([]int, (len(nums)+1)-k)
	win := make([]int, 0, k)
	for i, v := range nums {
		if i >= k && win[0] == i-k {
			win = win[1:]
		}
		for len(win) > 0 && nums[win[len(win)-1]] <= v {
			win = win[:len(win)-1]
		}
		win = append(win, i)
		if i+1 >= k {
			res[i-k+1] = nums[win[0]]
		}
	}
	return res
}

The result is 67.0 ns/op, which reduces the time consumption by 60%

Benchmark_maxSlidingWindow-8   	17957082	        67.0 ns/op	      80 B/op	       2 allocs/op

The speed is much faster, but it is still slower than the brute force method due to the creation of one more slice.

Through pprof analysis of the program before improvement, it can be seen that the operation of memory and slice expansion consumes more CPU time than the execution of the algorithm code, and the performance of the algorithm has less impact on the overall performance of the program execution.

  flat  flat%   sum%        cum   cum%
 0.33s 23.24% 23.24%      0.83s 58.45%  runtime.mallocgc
 0.26s 18.31% 41.55%      1.18s 83.10%  runtime.growslice
 0.22s 15.49% 57.04%      1.42s   100%  algorithm/sliding_window.maxSlidingWindow
 0.15s 10.56% 67.61%      0.15s 10.56%  runtime.nextFreeFast (inline)
 0.07s  4.93% 72.54%      0.07s  4.93%  runtime.memclrNoHeapPointers
 0.05s  3.52% 76.06%      0.05s  3.52%  runtime.pageIndexOf (inline)
 0.04s  2.82% 78.87%      0.04s  2.82%  runtime.memmove
 0.04s  2.82% 81.69%      0.04s  2.82%  runtime.releasem (inline)
 0.02s  1.41% 83.10%      0.02s  1.41%  runtime.(*mspan).objIndex (inline)
 0.02s  1.41% 84.51%      0.02s  1.41%  runtime.(*spanSet).pop

Just yesterday I saw an article "Learning Rob Pike's 6 Programming Principles" on the Go language Chinese website . One of the views on the "relationship between algorithm complexity and the amount of data processed" is a good summary:

The father of Unix and the co-founder of Go, Ken Thompson, further emphasized Principles 3 and 4: If you don't know for sure, just exhaustively. Simple, violent methods are often the best choice. Usually we think that quick sort is the "fastest" sorting algorithm. However, the problems we encounter daily are generally less numbers to be sorted. At this time, simple bubble sorting is more suitable. If you are an interested reader, you will find that the sorting algorithm in the programming language is implemented, and you will choose different sorting algorithms according to different data volumes. For example, the implementation of the Go language sorting algorithm uses quick sort when the number of elements exceeds 12.

Lower time complexity but slower program execution?

Guess you like