[Data structure] Binary tree - how to realize the heap

Table of contents

1. The sequential structure of a binary tree

2. The concept and structure of the heap

Three, the implementation of the heap

Fourth, the application of the heap

4.1 Heap sort

4.1.1 Heap building

4.1.2 Using heap deletion idea to sort

4.2 TOP-K problem

Many times, our competitors are ourselves, not others.


1. The sequential structure of a binary tree

  Ordinary binary trees are not suitable for storage in arrays, because there may be a lot of wasted space. The complete binary tree is more suitable for sequential structure storage. In reality, we usually store the heap ( a binary tree ) in an array of sequential structures . It should be noted that the heap here and the heap in the virtual process address space of the operating system are two different things. One is the data structure, and the other is the management in the operating system. A region of memory is segmented.

2. The concept and structure of the heap

If there is a set of key codes K = { k0, k1, k2..., k(n-1)} [0, 1, 2,..., n-1 are all subscripts], put all its elements Store in a one-dimensional array in the order of a complete binary tree, and satisfy: Ki<=k(2*i+1) and Ki<=k2*i+2【Ki>=k(2*i+1) And Ki>=k(2*i+2)] i=0, 1, 2..., it is called a small pile [or a large pile]. The heap with the largest root node is called the largest heap or large root heap, and the heap with the smallest root node is called the smallest heap or small root heap.
The nature of the heap : 1) The value of a node in the heap is always not greater than or not less than the value of its parent node; 2) The heap is always a complete binary tree.

Understanding : heaps are divided into large heaps and small heaps ; large heaps/large root heaps: the data of the father in the tree is greater than or equal to the child; small heaps/small root heaps: the data of the father in the tree is less than or equal to the child

Problems solved by the heap : heap sorting, TOP-K

Three, the implementation of the heap

heap.h

#pragma once

#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <stdbool.h>

typedef int HPDataType;

typedef struct Heap
{
	HPDataType* a;
	size_t size;
	size_t capacity;
}HP;

void HeapInit(HP* php);
void HeapDestory(HP* php);
void HeapPrint(HP* php);
void Swap(HPDataType* pa, HPDataType* pb);
void HeapPush(HP* php, HPDataType x);
void HeapPop(HP* php);
bool HeapEmpty(HP* php);
size_t HeapSize(HP* php);
HPDataType HeapTop(HP* php);

heap.c


#include "heap.h"

void HeapInit(HP* php)
{
	assert(php);
	php->a = NULL;
	php->size = php->capacity = 0;
}
void HeapDestory(HP* php)
{
	assert(php);
	free(php->a);
	php->a = NULL;
	php->size = php->capacity = 0;
}

//按数组打印
void HeapPrint(HP* php)
{
	assert(php);
	for (size_t i = 0; i < php->size; ++i)
	{
		printf("%d ", php->a[i]);
	}
	printf("\n");
}

void Swap(HPDataType* pa, HPDataType* pb)
{
	HPDataType tmp = *pa;
	*pa = *pb;
	*pb = tmp;
}

bool HeapEmpty(HP* php)
{
	assert(php);
	return php->size == 0;
}
//多少个数据
size_t HeapSize(HP* php)
{
	assert(php);
	return php->size;
}
HPDataType HeapTop(HP* php)
{
	assert(php);
	assert(php->size > 0);
	return php->a[0];
}
void AdjustUp(HPDataType* a, size_t child)
{
	size_t parent = (child - 1) / 2;
	//这个比较取决于大小堆
	//小堆
	//最后一次比较,是parent是0,进行比较,当再次进行调整后。就不需要进行了,此时的child等于0,parent也是0[因为size_t是正整数】
	//-1/2还是等于0
	while (child > 0)
	{
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
			break;//跳出循环
		}
	}
}

void HeapPush(HP* php, HPDataType x)
{
	assert(php);
	数据插入数组后
	//先判断是否有地方进行扩容
	if (php->size == php->capacity)
	{
		size_t newCapacity = php->capacity == 0 ? 4 : (2 * (php->capacity));
		//开辟空间,要有一个临时变量进行开辟,否则如果开辟失败,里面的数据就都找不到了
		HPDataType* tmp = (HPDataType*)realloc(php->a, sizeof(HPDataType) * newCapacity);
		if (tmp == NULL)
		{
			printf("malloc fail\n");
			exit(-1);
		}
		php->a = tmp;
		php->capacity = newCapacity;
	}
	php->a[php->size] = x;
	(php->size)++;//先插入,后size++,此时size这个下标的位置并没有值
	向上调整的算法,成为堆
	size_t child = (php->size) - 1;
	AdjustUp(php->a, child);
}

 Heap insertion : first insert a number to the end of the array [after inserting this number, the concept of the heap may not be satisfied], and then perform an upward adjustment algorithm until the heap is satisfied

void AdjustDown(HPDataType* a, size_t root, size_t size)
{
	//找出小的
	//注意:可能没有右孩子
	size_t parent = root;
	size_t child = parent * 2 + 1;
	while (child < size)
	{
		//避免越界
		if (child + 1 < size && a[child] > a[child + 1])
		{
			child++;
		}
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			break;//跳出循环
		}
	}
}

void HeapPop(HP* php)
{
	assert(php);
	//当删除数据的时候,要判断有没有值
	assert(php->size > 0);
	Swap(&php->a[0], &php->a[php->size - 1]);
	php->size--;
	AdjustDown(php->a, 0, php->size);
}

Deletion of the heap : Deleting the heap is to delete the data at the top of the heap [the smallest or largest data] , exchange the data at the top of the heap with the last data , then delete the last data in the array, and then perform the downward adjustment algorithm. [Submit first, delete later, and adjust the algorithm downward]     

Downward adjustment algorithm : first find out the smaller (larger) of the two child nodes, then compare it with the parent node and exchange it. The data of the parent node is always less than or equal to (greater than or equal to) the child node, and then from the exchanged Children compare down]

The time complexity of heap insertion and deletion is O(logN)  

Fourth, the application of the heap

4.1 Heap sort

Heap sorting is to use the idea of ​​heap to sort, which is divided into two steps:
1. Build a heap (build a heap on an array, then the space complexity of heap sorting is O(1))
Ascending order: build a large pile
Descending order: build a small heap
2. Use the idea of ​​​​heap deletion to sort

4.1.1  Heap building

 There are two ways to build a heap: (1) Use the idea of ​​adjusting upwards and inserting data to build a heap. Inserting data into a new array is to adjust upwards to achieve sorting in the process of continuous insertion [Code 1] (2) Use downward adjustment [starting from the penultimate non-leaf node, that is, the father of the last node, ie (size -1-1)/2] [Find the parent node, sort it downwards, and then decrease the parent node by one [to find each small heap], sort it down one by one, and it becomes a heap. 】【Code 2】

[After the heap is built, you can make the array a heap]

Code 1 shows:

void Swap(HPDataType* pa, HPDataType* pb)
{
	HPDataType tmp = *pa;
	*pa = *pb;
	*pb = tmp;
}


void AdjustUp(HPDataType* a, size_t child)
{
	size_t parent = (child - 1) / 2;
	//这个比较取决于大小堆
	//小堆
	//最后一次比较,是parent是0,进行比较,当再次进行调整后。就不需要进行了,此时的child等于0,parent也是0[因为size_t是正整数】
	//-1/2还是等于0
	while (child > 0)
	{
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
			break;//跳出循环
		}
	}
}

void HeapSort(int* a, int n)
{
	//升序,建大堆,向上
	size_t i = 0;
	for (i = 1; i < n; ++i)
	{
		AdjustUp(a, i);
	}
}

int main()
{
	int a[] = { 4, 3, 10 , 2, 5, 9 };
	HeapSort(a, sizeof(a) / sizeof(int));
	for (int i = 0; i < sizeof(a) / sizeof(int); i++)
	{
		printf("%d ", a[i]);
	}
	printf("\n");
	return 0;
}

Code 2 shows:

void HeapSort(int* a, int n)
{
	//升序,建堆,向上
	/*int i = 0;
	for (i = 1; i < n; ++i)
	{
		AdjustUp(a, i);
	}*/
    //向下
	int i = 0;
	for (i = (n - 2) / 2; i >= 0; --i)
	{
		AdjustDown(a, i, n);
	}
}

 The time complexity of building a heap:

Building up the heap : first, the number of nodes in each layer is 2^(h-1); building a heap is to insert data from the second layer, and the second layer has 2^(2-1) nodes, which becomes a heap, upward The worst number of adjustments is 2^(2-1)*1; the third layer has 2^(3-1) nodes, which become a heap, and the number of upward adjustments is 2^(3-1)*2;… ...; Then adjust the cumulative number of heaps up to 2^(2-1)*1+2^(3-1)*2+2^(4-1)*3+...+2^(h-1) *(h-1). This is an arithmetic sequence * geometric sequence. Using dislocation subtraction, the number of calculations can be 2^h*(h-2)+2; the final time complexity is O(N*logN)

Build a heap downward : first, the number of nodes in each layer is 2^(h-1); the heap is built from (starting from the penultimate non-leaf node) [this non-leaf node is not necessarily the last of the penultimate layer One, but at this time, the heap can be regarded as a full-level binary tree [the time complexity of the two is not much different], then the non-leaf node at this time is the last one of the penultimate layer] The penultimate layer starts to adjust downward , until the end of the downward adjustment of the first layer, each layer has 2^(h-1) nodes, each node and the lower part become a heap, and the worst number of downward adjustments for each node is 2^(h -1)*(h); Then adjust the cumulative number of heap building downwards to 2^0*(h-1)+2^1*(h-2)+2^2*(h-2)+…+ 2^(h-2)*1, this is an arithmetic sequence * geometric sequence. Using dislocation subtraction, the number of times can be calculated as 2^h-1-h, because 2^h-1=N,; the final time complexity is O(N).

Summary: It is best to build a heap downwards

Build heaps: build large heaps in ascending order, and build small heaps in descending order . [If you build small heaps in ascending order, the smallest number is already in the first position, and the next smallest number needs to be continuously built and selected. Then the total time complexity is O(N^2). In this case, it is better to traverse the selection directly, and the time complexity is also O(N^2)] [Ascending order should build a large pile]

4.1.2 Using heap deletion idea to sort

Ascending order, large heap as an example : after building a large heap, the maximum value is at the front, then, the maximum value and the last value [the subscript is n-1] are swapped, and then the heap is built regardless of the subscript n-1 , and then the maximum value is exchanged with the last value again [the subscript is n-2]. The array is sorted until the element with subscript 0 is exchanged with the element with subscript 1. [Time complexity: O(N*logN)]

void HeapSort(int* a, int n)
{
	//升序,建堆,向上
	/*int i = 0;
	for (i = 1; i < n; ++i)
	{
		AdjustUp(a, i);
	}*/
    //向下
	int i = 0;
	for (i = (n - 2) / 2; i >= 0; --i)
	{
		AdjustDown(a, i, n);
	}
    size_t end = n - 1;
	while (end > 0)
	{
		Swap(&a[0], &a[end]);
		AdjustDown(a, 0, end);
		--end;
	}
}

4.2 TOP-K problem

 N numbers to find the top K largest/smallest

TOP-K problem: Find the top K largest elements or smallest elements in the data combination. Generally, the amount of data is relatively large .
For example: the top 10 professional players, the world's top 500, the rich list, the top 100 active players in the game, etc.
For the Top-K problem, the most simple and direct way that can be thought of is sorting, but: if the amount of data is very large, sorting is not advisable (possibly
data cannot be loaded into memory all at once). The best way is to use the heap to solve it. The basic idea is as follows:
1. Use the first K elements in the data set to build a heap
For the first k largest elements, build a small heap
For the first k smallest elements, build a large heap
2. Use the remaining NK elements to compare with the top elements in turn, and replace the top elements if they are not satisfied
After comparing the remaining NK elements with the top elements of the heap in turn, the remaining K elements in the heap are the first K smallest or largest elements sought.

The time complexity is: O(K+logK*(NK)); the space complexity is: O(K).

void PrintTopK(int* a, int n, int k)
{
	// 建堆--用a中前k个元素建堆
	int* kminHeap = (int*)malloc(sizeof(int) * k);
	if (kminHeap == NULL)
	{
		printf("malloc fail \n");
		exit(-1);
	}
	//前k个元素,放在数组里面
	for (int i = 0; i < k; ++i)
	{
		kminHeap[i] = a[i];
	}

	// 建小堆
	for (int j = (k - 2) / 2; j >= 0; --j)
	{
		AdjustDown(kminHeap, j, k);//k指的是下标,数组最后元素的下标,为了方便找到父节点
	}

	// 2. 将剩余n-k个元素依次与堆顶元素交换,不满则则替换
	for (int i = k; i < n; ++i)
	{
		if (a[i] > kminHeap[0])
		{
			kminHeap[0] = a[i];
			AdjustDown(kminHeap, 0, k);
		}
	}

	for (int j = 0; j < k; ++j)
	{
		printf("%d ", kminHeap[j]);
	}
	printf("\n");
	free(kminHeap);
}

void TestTopk()
{
	int n = 10000;
	int* a = (int*)malloc(sizeof(int) * n);
	srand(time(0));
	for (size_t i = 0; i < n; ++i)
	{
		a[i] = rand() % 1000000;
	}
	a[5] = 1000000 + 1;
	a[1231] = 1000000 + 2;
	a[531] = 1000000 + 3;
	a[5121] = 1000000 + 4;
	a[115] = 1000000 + 5;
	a[2305] = 1000000 + 6;
	a[99] = 1000000 + 7;
	a[76] = 1000000 + 8;
	a[423] = 1000000 + 9;
	a[0] = 1000000 + 1000;
	PrintTopK(a, n, 10);
}

Guess you like

Origin blog.csdn.net/m0_57388581/article/details/131629324