[Data Structure] Binary Tree - Heap Sort

The time complexity of heap sorting by array last time is O(NlogN), and the space complexity is O(N).

After optimization, the time complexity is O(NlogN), and the space complexity is O(1).

Consider the optimization scheme:

1. Adjust the build heap downward.

2. Adjust the build heap upward.

A heap is built just to find the largest or smallest element in the heap.

1. Adjust the heap up

Use upward adjustment, the idea of ​​inserting data to build a heap.

The essence of the optimization scheme is to process on the original array without opening up new space, so that the space complexity becomes O(1).

void HeapSort2(int* a,int n)
{
	for (int i = 1; i < n; i++)
	{
		AdjustUp(a, i);
	}
	for(int i= 0;i<n;i++)
		printf("%d ", a[i]);
	printf("\n");
}

Now calculate its time complexity. For the convenience of calculation, a full binary tree is used for calculation.

The worst-case scenarios are considered below:

The second layer: 2^1 nodes, adjusted up by 1 layer;

The third layer: 2^2 nodes, move up 2 layers;

……

The hth layer: 2^(h-1) nodes, move up the h-1 layer.

Then the number of nodes that need to be moved:

T (n) = 2^1 * 1+ 2^2*2 +…+ 2^(h-1)*(h-1)

According to the summation formula of the sequence dislocation subtraction method, it can be known that

T(n) = (N+1) * ( log(N+1) - 2 )+2 when n->infinity T(n) = NlogN

The time complexity of building up the heap is O(N*logN).

2. Downward adjustment to build heap

Prerequisites: The pair must be a large heap or a small heap.

So first adjust the array.

Start with the penultimate non-leaf node (the last node's parent).

Because the leaf node can be regarded as a large heap or a small heap by itself.

Drawing:

Adjustment times: 4 times

Code:

void HeapSort3(int* a, int n)
{
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)
	{
		AdjustDown(a, n, i);
	}

	for (int i = 0; i < n; i++)
		printf("%d ", a[i]);
	printf("\n");
}

Space complexity: O(1), time complexity O(N)

Now prove the time complexity:

The first layer: 2^0 nodes, need to move down the h-1 layer.

The second layer: 2^1 nodes, need to move down the h-2 layer.

The third layer: 2^2 nodes, need to adjust the h-3 layer down.

……

The h-1th layer, 2^(h-2) nodes, need to move down 1 layer.

So the total number of moves:

T(n) = 2^0 *(h-1) + 2^1 *(h-2) + …+ 2^(h-2)* 1

According to the dislocation subtraction method, it can be obtained:

T(n) = 2^h - 1 - h

and n = 2^h - 1

T(n) = n - log(n+1) 

When n is infinite, it is approximated that T(n) = n

The proof is complete.

Since there are already two methods of building a heap upwards and building a heap downwards, which one is more suitable for building a team, and which is more suitable for seeing a large heap? ?

Ascending -- build a heap

Reason: If the smallest number of small heaps in ascending order is already in the first position, the rest of the relationships are all messed up, and the heap needs to be rebuilt. Numbers, so the time complexity of O(N) is not as easy as direct traversal.

In short, building small heaps in ascending order is inefficient and complicated.

Descending -- build a small heap

Then how to build a large heap to complete the sorting?

The essence of building a large heap is to maintain the stability of the heap as much as possible.

After finding the largest number, swap the number at the root position with the last number, adjust downwards in the n-1 group of data, find the next smallest number, and exchange the root with the number at the n-1th position, so that Implement ascending order.

Relevant code:

void HeapSort4(int* a, int n)
{
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)
	{
		//建大堆
		AdjustDown(a, n, i);
	}
	//找到最后一个数的下标
	int end = n - 1;
	while (end > 0)
	{
		//交换根和最后一个位置的值
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);
		end--;
		
	}
	for (int i = 0; i < n; i++)
		printf("%d ", a[i]);
	printf("\n");
}

Top-k problem

Find the top k largest elements or the smallest elements in the data combination. In general, the amount of data is relatively large.

The simplest method in Top-k is sorting, if the amount of data is very large, sorting will have problems (data cannot be loaded into memory at one time), but heap sort can solve it.

1. Use the first element in the array to build a heap.

Find the largest to build a small heap, and to find the smallest to build a large heap.

Similar to the principle of building large heaps in ascending order and building small heaps in descending order.

2. Replace the top element by comparing the remaining nk elements with the top element.

Remember to adjust downwards.

void TopK(int* a,int n,int k)
{
	//建小堆
	int* topk = (int*)malloc(sizeof(int) * k);
	assert(topk);
	for (int i = 0; i < k; i++)
	{
		topk[i] = a[i];
	}
	//向下调整建堆
	for (int i = (k - 1 - 1) / 2; i >= 0; i--)
	{
		AdjustDown(topk, k, i);
	}
	for (int j = k; j < n; j++)
	{
		if (topk[0] < a[j])
		{
			topk[0] = a[j];
			AdjustDown(topk, k, 0);
		}
	}
	for (int i = 0; i < k; i++)
	{
		printf("%d ", topk[i]);
	}
	printf("\n");
	free(topk);
}
int main()
{    
	int n = 10000;
	int* a = (int*)malloc(sizeof(int) * n);
	assert(a);
	srand(time(0));
	for (int i = 0; i < n; i++)
	{
		a[i] = rand() % 10000;
	}
	a[0] = 100001;
	a[101] = 100002;
	a[159] = 123456;
	a[7459] = 100003;
	a[7412] = 100009;
	a[7826] = 111111;
	a[7712] = 222222;
	a[9635] = 999999;
	TopK(a,n,8);
	return 0;
}

Time complexity O( K+ K*log(NK) )

Space complexity: O(K)

When K<<N, the efficiency is also quite high.

If there is something wrong, please point it out. 

Guess you like

Origin blog.csdn.net/weixin_61932507/article/details/124082628