Implementation of heap in binary tree

complete binary tree

Full binary tree : The degree of each node of the binary tree reaches the maximum value (2), so the total number of nodes can be calculated by geometric sum: 2^k-1

Complete binary tree : except for the last level. The degrees of the previous nodes are all full, and the last layer can be full, but it must be continuous from left to right , so the full binary tree is also a special form of the complete binary tree, and the range of its summary points is: 2^(k-1) ~ 2 ^k-1

A full binary tree is also a special form of a complete binary tree. 

 Sequential structure and implementation of binary trees

In fact, the heap is a special binary tree structure (complete binary tree), but the heap sort is quite fast compared to the bubble sort, and the heap can also quickly find the first few highest values ​​​​in a pile of data, and the heap It is precisely implemented using a sequence table array, so let’s first understand the binary tree sequence structure.

 The sequential structure of a binary tree

The form of a complete binary tree can fit well with an array. Because of the continuity of nodes, it is very suitable to store a complete binary tree in an array with a sequential table structure, and the corresponding child nodes and parent nodes can be found directly through the relationship of the array subscripts.

 Other forms of binary trees are not suitable for storage in arrays, as they may be complicated and waste space.

The concept and structure of heap 

First of all, we need to know that the structure of the heap is a complete binary tree, and it is divided into two types: large (root) heap and small (root) heap.

 There are only two forms of heaps: large heaps and small heaps. Others that do not meet the conditions do not belong to the heap .

Large (root) heap: The value stored by any node in the tree is less than or equal to the value stored by its parent node.

Small (root) heap: The value stored in any node in the tree is greater than or equal to the value stored in its parent node

Basic implementation of heap

 The heap can be implemented using an array with a sequence table structure, so you can learn from the previous stack implementation method. The following is how to write a large (root) heap:

Functions that need to be implemented

typedef struct Heap
{
	int sz;
	int capacity;
	int* arr;

}Heap;

void Init(Heap* hp);//初始化堆
void Push(Heap* hp, int x);//增数据
void Pop(Heap* hp);//删数据
int GetTop(Heap* hp);//得到根数据
void Destroy(Heap* hp);//空间释放
void Init(Heap* hp)
{
	hp->arr = (int*)malloc(sizeof(int) * 3);
	hp->capacity = 3;
	hp->sz = 0;//指向实际数据的下一个节点
}

void Swap(int* p1, int* p2)
{
	int tmp = *p1;
	*p1 =  *p2;
	*p2 = tmp;
}

void Adjust_up(int* arr,int child)
{
	while (child > 0)
	{
		int parent = (child - 1) / 2;//不可以作为while的条件,child==0时
		if (arr[child] > arr[parent])
		{
			Swap(&arr[child], &arr[parent]);
			child = parent;
		}
		else
			return;

	}
}
void Push(Heap* hp, int x)
{
	if (hp->sz == hp->capacity)
	{
		hp->capacity *= 2;
		int* tmp = (int*)realloc(hp->arr, sizeof(int) * hp->capacity);
		assert(tmp);
		hp->arr = tmp;
	}
	hp->arr[hp->sz] = x;
	
	hp->sz++;
	//向上调整保证是堆
	Adjust_up(hp->arr,hp->sz-1);
}

void Adjust_down(int* arr,int last)
{
	int parent = 0;
	int child = parent * 2 + 1;//假设较大值是左孩子
	while (child<last)
	{
		if (child + 1 < last && arr[child] < arr[child + 1])//先防止越界,再验证较大值(只有一个左孩子时可能会越界)
			child += 1;

		if (arr[child] > arr[parent])
		{
			Swap(&arr[child], &arr[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
			return;
	}
}
void Pop(Heap* hp)//删除根节点的数据
{
	assert(hp->sz);//保证有数据
	//防止改变堆的父子大小关系,保证其他数据的关系不变,则选择尾元素换到头
	hp->arr[0] = hp->arr[hp->sz - 1];
	hp->sz--;

	//向下调整
	Adjust_down(hp->arr,hp->sz);
}

int GetTop(Heap* hp)
{
	assert(hp->sz);
	return hp->arr[0];
}

void Destroy(Heap* hp)
{
	free(hp->arr);
	 
	hp->capacity = hp->sz = 0;
}

In fact, here we just need to look at the implementation of the two functions Adjust_down and Adjust_up. The main reason is that when inserting data, upward adjustment of data is used, and when deleting the root element, downward adjustment of data is used. These two functions must be ensured before use. The parent-child relationship of the original data will not change, that is: except for added or deleted data, the rest of the subtrees are in the form of a heap.

sort using heap 

After the implementation of the heap, we know that a data is stored according to the heap, and the value of the root node is either the maximum value or the minimum value, so if we do it a few more times, we can get the maximum value (minimum value) in sequence. If it is implemented by implementing a large (root) heap, first insert the numbers in the array one by one by adjusting upward to build a heap, so that the root is the largest number. At this time, there are two ways: 1. Continue to build the pile again with the remaining numbers in an upward adjustment manner, and find the second largest number (this changes the parent-child relationship between other numbers, so all must be re-inserted). 2. Exchange the last number with the root (maximum value), and then use downward adjustment (similar to the Pop function) to implement the next operation (this will not change the parent-child relationship of the original data, so the subtree is still in the form of a large pile) ). So the second way is much easier.


Adjust heap building method upward

void Swap(int* p1, int* p2)
{
	int tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

void Push(int* arr,int child)
{
	while (child > 0)
	{
		int parent = (child - 1) / 2;
		if (arr[child] > arr[parent])
		{
			Swap(&arr[child], &arr[parent]);
			child = parent;
		}
		else
			return;

	}
}
void Pop(int* arr,int len)
{

	int parent = 0;
	int child = parent * 2 + 1;//假设较大值是左孩子
	while (child<len)
	{
		if (child + 1 < len && arr[child] < arr[child + 1])//先防止越界,再验证较大值(只有一个左孩子)
			child += 1;//可能存在越界

		if (arr[child] > arr[parent])
		{
			Swap(&arr[child], &arr[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
			return;
	}
}
int main()
{

	int arr[] = { 6,1,7,0,3,5,8,2,9,4 };
	int len = sizeof(arr) / sizeof(int);
	for (int i = 1; i < len; i++)//向上建堆
	{
		Push(arr, i);//传下标
	}

	while(len>0)
	{
		Swap(&arr[0], &arr[len-1]);//将最大值放到最后面
		Pop(arr, len - 1);//传最后一个数的下一个下标
		printf("%d ", arr[len-1]);
		len--;//每循环一次就排好了一个数
	}

	return 0;
}

One thing to note is that every time you find the largest number and move it to the end of the array, you will get the current maximum value every time you adjust downward. The data will gradually decrease by one, so after proceeding in sequence, the data will be in ascending order. The methods are arranged, that is: using the large root heap method to achieve ascending order, and using the small root heap method to achieve descending order.

Adjust downward heap building method

The above heap sorting actually uses upward adjustment to build the heap, exchanges the first and last numbers, and uses downward adjustment to rebuild the heap. So can we also use downward adjustment to build a heap? The condition for us to use downward adjustment in the face of a complete data is: the left and right subtrees of the node are both in the form of large heaps (small heaps). Only then.

Suppose we want to create a small heap. Taking the array in the above code as an example, our current goal is to find a node that can be adjusted downward, that is: the left and right subtrees of the node are both small heaps, and the small heap we can determine from the figure is the one closest to the leaf. The branch node of the node, because the two branches of the branch node are leaf nodes, it must be a small heap, that is, the node 3 in the figure, so we start to adjust downward from this position, and continue to execute in the opposite direction, find The previous branch nodes are processed from back to front. This ensures that the left and right subtrees of the node you adjust downward at any time must be heaps.

The steps are:

Code

void Swap(int* p1, int* p2)
{
	int tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

void Adjust_down(int* arr, int i,int len)
{
	int parent = i;
	int child = i * 2 + 1;

	while (child < len)
	{
		if (child + 1 < len && arr[child] > arr[child + 1])//防止有右孩子不存在的情况
			child++;//找较小的子节点
		if (arr[parent] > arr[child])//如果父节点本来就小,不用换
		{
			Swap(&arr[parent], &arr[child]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
			return;
	}

}
int main()
{
	int arr[]= { 6,1,7,0,3,5,8,2,9,4 };
	int len = sizeof(arr) / sizeof(int);
	//先向下调整建堆
	for (int i = (len - 1 - 1) / 2; i >= 0; i--)//i指向元素下标而len-1是最后一个元素下标
	{
		Adjust_down(arr, i,len);//len是最后一个元素的下一个下标
	}
	//向下调整找最值
	while (len > 0)
	{
		Swap(&arr[0], &arr[len - 1]);//交换一次就保留一个最小值放到最后面
		Adjust_down(arr, 0, len - 1);//此时最后一个元素就不计算在内
		printf("%d ", arr[len - 1]);//从后向前打印数据
		len--;
	}

	return 0;
}

Complexity analysis

Adjust upward heap building: When there are many nodes in the layer, more adjustments are needed. So the more data there is, the more adjustments are performed and the higher the time complexity. 


Adjusting downward to build a heap: Compared with upward adjustment, adjusting downward to build a heap obviously avoids the problem that when there are many nodes in the layer, more adjustments are needed. On the contrary, the more nodes in the layer, the more adjustments are needed. Less, the time complexity is O(n-logn), that is: O(n). The calculation is left to you, the same way as above.


The time complexity of the above method to implement heap sorting is O(n+n*logn), that is: O(n log n)        

TOP-K problem in the heap

We learned that heap sort is much more efficient than bubble sort, and that adjusting downward to build a heap is more efficient than adjusting upward to build a heap, so we extended it to the issue of TOP-K in the heap, namely: Find the first K largest (smallest) numbers among all numbers.

Then just build all the given data into a heap and adjust the sorting downwards. But 1. What if the given data is too large and the space is not enough? 2. I only want the top K highest values, but then do the same as before If you build a heap to store all data, will there be a serious waste of space?

Suppose we want to find the top 5 largest values ​​of all data: So we thought of building a small heap that can only store 5 numbers, and then compare the following numbers with the value at the top of the heap. If it is greater than the top of the heap, Just exchange the numbers, adjust downward, rebuild the heap, and then perform the above operations on the subsequent numbers in sequence to complete.

void Swap(int* p1, int* p2)
{
	int tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}
void Adjust_down(int* arr, int i, int len)//i是下标,len是堆大小
{
	int parent = i;
	int child = parent * 2 + 1;
	while (child<len)
	{
		if (child + 1 < len && arr[child] > arr[child + 1])
			child += 1;
		if (arr[parent] > arr[child])
		{
			Swap(&arr[parent], &arr[child]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
			return;
	}
}
int main()
{
	srand((unsigned int)time(NULL));
	FILE* pf = fopen("data.txt", "w");
	if (pf == NULL)
	{
		perror("pf=NULL");
		return 1;
	}
	for (int i = 0; i < 1000; i++)
	{
		int x = rand() % 1000;//产生一万个随机数放在文件里
		fprintf(pf, "%d\n", x);
	}
	fclose(pf);//处理完数据之后一定要及时关闭,否则数据可能丢失

	//直接开辟相应大小的空间
	int k = 7;
	int* arr  = (int*)malloc(sizeof(int) * k);
	assert(arr);
	//读取文件中前面7个数据放到数组中
	FILE* po = fopen("data.txt", "r");
	assert(po);
	
	for(int i=0;i<k;i++)
	{
		fscanf(po, "%d", &arr[i]);//将前7个数存到数组里
	}
	for (int i = (k - 2) / 2; i >= 0; i--)//七个数向下调整建小堆
	{
		Adjust_down(arr, i, k);
	}
	//读取文件后面的数据
	int val = 0;
	while (feof(po)==0)
	{
		fscanf(po, "%d", &val);
		if (val > arr[0])
		{
			arr[0] = val;
			Adjust_down(arr, 0, k);//每执行一次,堆顶都是最小值
		}
	}
	for (int i = 0; i < k; i++)
		printf("%d ", arr[i]);

	return 0;
}

When there is too much data, we will store the data in files and read the data one by one in the files. Using a small heap actually ensures that the minimum value is at the top of the heap, so every time we get a new value, it is compared with the top of the heap (minimum value). If it is larger, we can directly replace the minimum value and adjust it again. Now in the heap The minimum value is at the top of the heap.... This is a lot more convenient.

Guess you like

Origin blog.csdn.net/C_Rio/article/details/130966556