Detailed Heap Sort + TOP-K Problem

The last link:  implementation of heap + heap sorting



content

Analysis of the previous article Heap Sort

ideas

time and space complexity

Optimized heap sort 

build heap up

analyze

diagram

time complexity

Code

 Adjust heap build down

 analyze

diagram

time complexity 

Code

Summarize 

sort 

Thought analysis

sorting ideas

Proof of building small heaps in descending order

Code

Small heap becomes large heap:

TOP-K problem

What is the TOP-K problem?

ideas

time complexity

space complexity

Code

random data test 

Add a sorter


Analysis of the previous article Heap Sort

The last article introduced the binary tree notes and implemented a simple heap sort at the end:

ideas

First create a heap and use the properties of the top of the heap: select the largest or smallest

Use the property of removing the top element of the heap: find the next largest or the next smallest

sort the array

+

time and space complexity

The time complexity of insertion and deletion is O(logN), the worst case is the height of the binary tree

Because it is sequentially inserted and deleted, it is related to the number of nodes, so the time complexity of the sorting algorithm is O(N*logN)

The space complexity is O(N), because a heap needs to be created first, and array data is inserted, and the size is related to the size of the array

void HeapSort(int* a,int size)
{
	HP hp;
	HeapInit(&hp);

    //时间复杂度O(N*logN)
	for (int i = 0;i<size;i++)
	{
		HeapPush(&hp,a[i]);
	}
	HeapPrint(&hp);
	int j = 0;

    //时间复杂度O(N*logN)
	while (!HeapEmpty(&hp))
	{
		a[j] = HeapTop(&hp);
		j++;
		HeapPop(&hp);
	}
	HeapDestroy(&hp);
}


 Optimized heap sort 

Optimization goal: time complexity O(N*logN)

                  Space complexity O(1)

Before, the heap was created first, and then the array was inserted. This time, we built the heap directly in the array to turn the array into a heap, so that the space complexity of the heap sorting algorithm is O(1)

There are two ways to build a heap in an array: adjust the heap up

                                           Adjust heap build down

build heap up

In order to facilitate the explanation, the upward adjustment of the heap is shown here, taking building a small heap as an example

analyze

Directly in the array size is the number of array elements

When adjusting upwards , ensure that the tree ending with the starting node must be a heap

The first number is the top of the heap, starting from the second number and adjusting upwards

From front to back, adjust upwards in order


diagram


time complexity

Code

//向上调整
//建小堆为例
void Up(HPDataType* a,size_t child)
{
	size_t parent = (child - 1) / 2;
	while (child>0)
	{
		if (a[child] < a[parent])
		{
			swap(&a[child], &a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
			break;
		}
	}
}

void HeapSort(int* a,int size)
{
	//向上调整建堆
	//分析后是件复杂度为O(N*logN)
	for (int i = 1;i<size;i++)
	{
		Up(a,i);
	}
}

 Adjust heap build down

 analyze

When adjusting downward , ensure that the left subtree and right subtree of the tree whose starting node is the top of the heap are the heap.

Adjust downward from the first non-leaf node as the top of the heap

Adjust from back to front

diagram

time complexity 

Code

//建小堆为例
void Down(HPDataType* a, size_t parent, size_t size)
{
	size_t child = parent * 2 + 1;
	while (child < size)
	{
		if (child + 1 <size && a[child+1] < a[child])
		{
			child++;
		}
		if (a[child] < a[parent])
		{
			swap(&a[child],&a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			break;
		}
	}
}

void HeapSort(int* a,int size)
{
	//向下调整建堆
	//分析后是件复杂度为O(N+logN)=O(N)

	for (int i = (size-1-1)/2; i>=0; i--)
	{
		Down(a,i, size);
	}
}

Summarize 


The time complexity of adjusting the heap up is O(N*logN)

Downward adjustment to build the heap
, the time complexity is O(N)

Therefore, it is more efficient to use downward adjustment to build a heap

sort 

build heap in ascending order

Build small heap in descending order


Thought analysis

1. Note: we just resize the array up or down to make it a heap

           No functional interfaces such as deletion of the top element of the heap, insertion into the heap, etc. are created

           Therefore, HeapTop into the array and HeapPop to delete the top of the heap cannot be used

2. Some friends may say that I can build these two function interfaces and use them again?

No, if you do this, you will inevitably open up a new array to put the top element of the heap into

The space complexity becomes O(N)

The originally built heap is swapped and deleted at the top and the end of the heap

does not meet our optimization goals

3. Then to make the space complexity O(1), you must sort in the original array


sorting ideas

1. Swap the elements of the top and end nodes of the heap

2. Adjust downward from the top of the heap for the first n-1 nodes of the heap

3. At this time, the top element of the heap is the largest or smallest element

4. Swap the top element of the heap with the n-1th element

5. Repeat the above process to complete ascending or descending order

Note: the subscript of the end node is updated


Proof of building small heaps in descending order

 The same analysis can be done by building a large heap in ascending order.

Conclusion: build large heaps in ascending order and build small heaps in descending order


Code

Adjust the heap down first (high efficiency)

i is the subscript of the first non-leaf node

Record the last data subscript end

When end=1, end the swap and adjust down

Note: Adjust function parameters down

a is the starting address of the array

parent is the subscript of the parent node

size is the number of elements to adjust

void Down(HPDataType* a, size_t parent, size_t size);

In the following code, pay attention to the meaning of end

Before while represents the last element subscript

while represents the number of elements to be adjusted

void HeapSort(int* a,int size)
{
	//升序建大堆
	//降序建小堆
	for (int i = (size-1-1)/2; i>=0; i--)
	{
		Down(a,i, size);
	}
	//最后一个数据的下标
	size_t end = size - 1;
	while (end>0)
	{
		swap(&a[0],&a[end]);
		Down(a,0,end);
		end--;
	}
}

This is how heap sort can be done.

To decide ascending or descending order, create a large heap or a small heap

Small heap becomes large heap:

You can change the greater than or less than sign when building the heap.

Comparison symbols for child and child+1, child and parent


TOP-K problem

What is the TOP-K problem?

That is, to find the top K largest elements or the smallest elements in the data combination , in general, the amount of data is relatively large.
For example: professional top 10, world top 500, rich list, top 100 active players in the game, etc.


For the Top-K problem, the simplest and most direct way I can think of is to sort

If the amount of data is very large (tens of G), sorting is not desirable, the memory will be very large, and the efficiency will be extremely low

The best way is to use heap sort to solve

ideas

1. Use the first K elements in the data set to build the first k largest elements of the heap, then build the     first k smallest elements of
    the small heap

, then build the large heap 2. Use the remaining NK elements to compare with the top elements of the heap in turn

    When the heap is small, the element larger than the top of the heap replaces the top of the heap

    Adjust down to ensure the structure of the heap

    For large heaps, replace the top of the heap with elements smaller than the top of the heap

    Adjust down to ensure the structure of the heap

After comparing in turn, the heap is the largest or smallest top K elements in all data

Just traverse once


time complexity

The heap is established as K, and the number of NK remaining in the worst case must be adjusted

The number of adjustments is logK*(NK) times

O (K+logK*(NK))

The size of K is uncertain and cannot be omitted

space complexity

Only need to open up K spaces to build the heap

O(K)

Code

//TOP-K问题
void PrintTopK(int* a, int n, int k)
{
	// 1. 建堆--用a中前k个元素建堆
	int* kHeap = (int*)malloc(sizeof(int)*k);
	if (kHeap == NULL)
	{
		printf("malloc fail\n");
		exit(-1);
	}

	//将前k个数插入数组kHeap中
	for (int i = 0;i<k;i++)
	{
		kHeap[i] = a[i];
	}

	//在数组里面建小堆
	for (int i = (k - 1 - 1) / 2; i >= 0; i--)
	{
		Down(a, i, k);
	}

	// 2. 将剩余n-k个元素依次与堆顶元素交换,不满则则替换
	for (int i = k;i<n;i++)
	{
		if (a[i]>kHeap[0])
		{
			kHeap[0] = a[i];
			Down(kHeap,0,k);
		}
	}

	// 3. 打印最大或最小的前k个
	for (int j = 0;j<k;j++)
	{
		printf("%d ",kHeap[j]);
	}
	printf("\n");


	free(kHeap);
}

random data test 

Generate random numbers within 100000 and turn 10 random positions within 100000 into numbers larger than 100000

Find the ten largest numbers out of 10,000 numbers

run the code

void TestTopk()
{
	int n = 10000;
	int* a = (int*)malloc(sizeof(int)*n);
	srand(time(0));
	for (size_t i = 0; i < n; ++i)
	{
		a[i] = rand() % 1000000;
	}
	a[5] = 1000000 + 1;
	a[1231] = 1000000 + 2;
	a[531] = 1000000 + 3;
	a[5121] = 1000000 + 4;
	a[115] = 1000000 + 5;
	a[2335] = 1000000 + 6;
	a[9999] = 1000000 + 7;
	a[76] = 1000000 + 8;
	a[423] = 1000000 + 9;
	a[3144] = 1000000 + 10;
	PrintTopK(a, n, 10);
}


int main()
{
	TestTopk();
	return 0;
}

operation result:

Can get maximum 10 numbers, but they are unordered

Add a sorter 

//TOP-K问题
void PrintTopK(int* a, int n, int k)
{
	// 1. 建堆--用a中前k个元素建堆
	int* kHeap = (int*)malloc(sizeof(int)*k);
	if (kHeap == NULL)
	{
		printf("malloc fail\n");
		exit(-1);
	}

	//将前k个数插入数组kHeap中
	for (int i = 0;i<k;i++)
	{
		kHeap[i] = a[i];
	}

	//在数组里面建小堆
	for (int i = (k - 1 - 1) / 2; i >= 0; i--)
	{
		Down(a, i, k);
	}

	// 2. 将剩余n-k个元素依次与堆顶元素交换,不满则则替换
	for (int i = k;i<n;i++)
	{
		if (a[i]>kHeap[0])
		{
			kHeap[0] = a[i];
			Down(kHeap,0,k);
		}
	}

	// 3. 排序
	//最后一个数据的下标
	size_t end = k - 1;
	while (end>0)
	{
		swap(&kHeap[0], &kHeap[end]);
		Down(kHeap, 0, end);
		end--;
	}


	// 4. 打印排序后的前k个
	for (int j = 0;j<k;j++)
	{
		printf("%d ",kHeap[j]);
	}
	printf("\n");


	free(kHeap);
}

operation result


The notes on heap sorting and TOP-K issues are over here. All partners are welcome to exchange comments in the comment area, please like, please, like! !

Guess you like

Origin blog.csdn.net/weixin_53316121/article/details/124065512