Data structure - heap (C language implementation)

what is heap

A heap is a special data structure, which is a complete binary tree and satisfies the heap property, that is, the value of a parent node is always greater or less than the value of its child nodes. If the value of the parent node is always greater than the value of the child node, then we call it a large root heap; conversely, if the value of the parent node is always smaller than the value of the child node, then we call it a small root heap. In a heap, the root node has the largest value (large root heap) or the smallest value (small root heap), so it is also called the top of the heap. Heaps are often used in scenarios such as sorting and topK problems.
insert image description here

implementation of the heap

This article is implemented in C language and separated from header files and source files. It will also gradually introduce the implementation ideas of each interface and provide reference codes.

heap structure definition

The structure definition of the heap is actually a special sequence table, which is similar to the stack. Therefore, it is necessary to use a pointer to point to the dynamically opened memory, a variable pointing to the current subscript position, and a capacity to record the current dynamic memory.
insert image description here

heap initialization interface

The implementation idea of ​​the heap initialization interface is as follows. First, to change a heap, we need to pass its address. So the parameter part needs to be written as Hp*. At the beginning of the interface, judge the legality of the pointer. Then open up dynamic memory and judge the effectiveness of dynamic memory. Finally, initialize the structure members.
insert image description here

heap destruction interface

We should develop the good habit of releasing the space for dynamic application and emptying it in time after free. Finally, set the size and capacity to zero.
insert image description here

heap insert data interface

The implementation idea of ​​the heap insertion interface is as follows. Assert judges the validity of the pointer. This is a good programming habit. It is recommended that you also develop this habit at ordinary times. First determine whether the capacity is full, and if it is full, expand the capacity. Then the logic of inserting data directly below is actually similar to a sequence table. Directly insert the data into the position of the size subscript, just use ++size. Finally, call the upward adjustment heap building interface to keep the heap structure unchanged.
insert image description here

Adjust the heap interface upward

First, the subscript position of the parent node must be deduced based on the subscript position of the child. Then start to adjust upward. The process of upward adjustment is a cyclic process. The iteration condition of the loop is that when the child is greater than the subscript of the root node, the loop will continue to be supported. The loop terminates when the child node is smaller than the parent node. If the parent node is smaller than the child node, perform the data exchange of the corresponding subscript, and then iterate the child node subscript and the parent node subscript.
insert image description here

insert image description here

Check if the heap is empty

The idea of ​​judging whether the heap is empty is relatively simple, similar to the idea of ​​judging the emptyness of the sequence table. When the next subscript that can be inserted into data is 0, it means an empty heap.
insert image description here

Heap delete data interface

To delete the data in the heap, should you delete the data at the top of the heap or the data at the bottom of the heap? The answer is to delete the data at the top of the heap, because deleting the data at the bottom of the heap is of little value. And deleting the top of the heap can generate some value, such as sorting or collecting some top K data. For example, when we want to choose a computer in the shopping app, we can sort it by sales volume, which is also a scenario for heap applications. Back to the topic, the implementation idea of ​​deleting the top of the heap is as follows. We exchange the data on the top of the heap with the last data, and then use size– to achieve the effect of deleting the data on the top of the heap, and greatly improve the efficiency. Finally, adjust the heap downwards.
insert image description here

insert image description here

Adjust the heap interface downward

The implementation idea of ​​downward adjustment heap building is as follows. First, the process of downward adjustment is a cycle, and its termination condition is parent > size. Inside the loop body is the core idea of ​​downward adjustment. The parent is larger (smaller) than the left and right children. This article takes the realization of a large pile as an example. Here is a more important concept. Since the bottom layer of the heap is stored in a sequential table, the left and right children of the same father are stored adjacently. That is, the subscript of the left child + 1 is the subscript of the right child. Let the father compare with the larger one of the left and right children, and if the father is smaller than the child, exchange the position, and then iterate. Note: The condition for downward adjustment is that the left and right subtrees must be heaps.
insert image description here

Get heap top data

In fact, it is the first element of the access sequence table. However, providing an interface in this way is very consistent with the interface and greatly improves the readability of the code.
insert image description here

Get the number of valid data in the heap

Since our size starts from 0, just return size directly.
insert image description here

Complete implementation code

//Heap.h文件
#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
#include<stdbool.h>

//默认起始容量
#define DefaultCapacity 4

//存储的数据类型
typedef int HpDataType;

typedef struct Heap
{
    
    
	HpDataType* data;
	int size;//可以插入数据的下标
	int capacity;//容量
}Hp;


//初始化
void HpInit(Hp* pHp);

//堆的销毁
void HpDestroy(Hp* pHp);

//插入数据
void HpPush(Hp* pHp, HpDataType x);

//向上调整建堆
void AdjustUp(HpDataType* data, int child);

//判断是否为空
bool HpEmpty(Hp* pHp);

//删除数据
void HpPop(Hp* pHp);

//向下调整建堆
void AdjustDown(HpDataType* data,int size, int parent);

// 取堆顶的数据
HpDataType HpTop(Hp* pHp);

// 堆的数据个数
int HpSize(Hp* pHp);
// Heap.c文件
#include"Heap.h"

//初始化
void HpInit(Hp* pHp) 
{
    
    
	//判断合法性
	assert(pHp);

	//开辟动态空间
	HpDataType* tmp = (HpDataType*)malloc(sizeof(HpDataType) * DefaultCapacity);
	if (tmp == NULL)//判断合法性
	{
    
    
		perror("malloc fail");
		return;
	}

	//初始化
	pHp->data = tmp;
	pHp->size = 0;
	pHp->capacity = DefaultCapacity;
}

//堆的销毁
void HpDestroy(Hp* pHp)
{
    
    
	//判断合法性
	assert(pHp);

	//释放内存和清理
	free(pHp->data);
	pHp->data = NULL;
	pHp->size = pHp->capacity = 0;

}


void Swap(HpDataType* p1, HpDataType* p2)
{
    
    
	HpDataType tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

//向上调整建堆
void AdjustUp(HpDataType* data, int child)
{
    
    
	//判断指针有效性
	assert(data);
	int parent = (child - 1) / 2;
	while (child > 0)
	{
    
    
		//向上调整呢
		if (data[child] > data[parent])
		{
    
    
			Swap(&data[child], &data[parent]);
		}
		else
		{
    
    
			break;
		}	
		//迭代
		child = parent;
		parent = (child - 1) / 2;
	}

}

//插入数据
void HpPush(Hp* pHp, HpDataType x)
{
    
    
	//判断指针有效性
	assert(pHp);

	//判断容量是否满了
	if (pHp->size == pHp->capacity)
	{
    
    
		HpDataType* tmp = (HpDataType*)realloc(pHp->data,sizeof(HpDataType) * pHp->capacity * 2);
		if (tmp == NULL)//判断空间合法性
		{
    
    
			perror("malloc fail");
			return;
		}
		//扩容后
		pHp->data = tmp;
		pHp->capacity *= 2;
	}

	//数据入堆
	pHp->data[pHp->size] = x;
	pHp->size++;

	//向上调整建堆
	AdjustUp(pHp->data, pHp->size - 1);

}
void AdjustDown(HpDataType* data, int size, int parent)
{
    
    
	//断言检查
	assert(data);

	int child = parent * 2 + 1;

	while (child < size)
	{
    
    
		//求出左右孩子较大的那个下标
		if (child + 1 < size && data[child + 1] > data[child])
		{
    
    
			child++;
		}
		//父亲比孩子小就交换位置
		if (data[child] > data[parent])
		{
    
    
			//交换
			Swap(&data[child], &data[parent]);
			//迭代
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
    
    
			break;
		}
	}

}

void HpPop(Hp* pHp)
{
    
    
	//断言检查
	assert(pHp);

	//删除数据
	Swap(&pHp->data[0], &pHp->data[pHp->size-1]);
	pHp->size--;

	//向下调整建堆
	AdjustDown(pHp->data,pHp->size-1,0);

}

//判断是否为空
bool HpEmpty(Hp* pHp)
{
    
    
	assert(pHp);
	
	return pHp->size == 0;
}

// 取堆顶的数据
HpDataType HpTop(Hp* pHp)
{
    
    
	assert(pHp);

	return pHp->data[0];
}

// 堆的数据个数
int HpSize(Hp* pHp)
{
    
    
	assert(pHp);

	return pHp->size;
}

summary

Operating the data structure of the heap is like eating wife cakes. You eat sweet cakes, but it is not certain whether your wife made them. However, when you eat it, you imagine that the cake made by your wife has a special flavor. On the logical structure of the heap, what you operate is a tree, and on the underlying storage is a sequence table. This is a relatively abstract place, which needs to test our ability to draw pictures and read code debugging.

heap sort

Heap sorting is actually a common use of the heap data structure. The core idea of ​​heap sorting is to use the idea of ​​heap deletion to perform sorting operations. Heap sort is an unstable sort with time complexity O(N*logN). As for the explanation of the stability of sorting, I will introduce it to you in the following blog.

Implementation of heap sort

The implementation idea of ​​heap sorting is as follows. First, determine the sorting order and build the data into heaps, build large heaps in ascending order, and build small heaps in descending order. It is recommended to use downward adjustment to build a heap. Because the time complexity is O(logN), if you use upward adjustment to build the heap, then the time complexity is O(N*logN). This kind of time complexity is too expensive to find the top data of the heap, so it is better to traverse it directly (time complexity).
insert image description here

Then use the idea of ​​​​heap deletion to sort. The following is an example of sorting in ascending order.
insert image description here

//堆排序--排升序建大堆
void HeapSort(int* arr, int n)
{
    
    
	//向下建堆,效率更高
	for (int i = (n - 1 - 1) / 2; i >= 0; --i)
	{
    
    
		AdjustDown(arr,n-1,i);
	}

	//排序
	//利用堆删除的思想进行排序
	int end = n - 1;
	while (end > 0)
	{
    
    
		//交换
		int tmp = arr[0];
		arr[0] = arr[end];
		arr[end] = tmp;
		//调整堆
		AdjustDown(arr, end-1, 0);
		end--;
	}
}

Analysis on the Time Complexity of Building Heap and Heap Sort

adjust build down

In the previous implementation of heap sorting, it is mentioned that the downward adjustment of the heap is more efficient, because the time complexity of the downward adjustment of the heap is O(N). Next, I will lead you to briefly analyze the time complexity of adjusting the heap downwards.
insert image description here

Adjust build up

The time complexity of upward adjustment heap building is O(N*logN). Let's look at the time complexity problem of upward adjustment.
insert image description here

heap sort

The time complexity of heap sort is O(N logN). The complexity of adjusting the heap down is O(N), which has been analyzed above. The sorting part is O(N logN) combined with adjusting the heap downwards from the first non-leaf node .
·

summary

For the time complexity of building a heap and the complexity of heap sorting described above, it is actually enough to write down a conclusion. Of course, from the perspective of implementation, it is not difficult to analyze the approximate efficiency gap between upward adjustment and downward adjustment of pile building. Because the downward adjustment starts from the first non-leaf node, the worst case is to adjust half of the nodes less than the upward adjustment. This has already won a lot in terms of efficiency.

Introduction to TOPK Problems

The TOPK problem refers to the problem of finding the top K largest or smallest data in a set of data. Common solutions include heap sorting, quick sorting, merge sorting, etc. This problem often arises in fields such as data analysis and machine learning. Of course, there is a special scenario where it is very wonderful to use the heap for TOK screening. Assuming that there are 10 billion integers now, and the first 50 numbers are required, we can build a small heap, and as long as the traversed data is larger than the top data of the heap, replace it into the heap (adjust downward), and finally get the largest top 50 number. Let's take a simple example to feel it.

void AdjustDownSH(HpDataType* data, int size, int parent)
{
    
    
	//断言检查
	assert(data);

	int child = parent * 2 + 1;

	while (child < size)
	{
    
    
		//求出左右孩子较大的那个下标
		if (child + 1 < size && data[child + 1] < data[child])
		{
    
    
			child++;
		}
		//父亲比孩子小就交换位置
		if (data[child] < data[parent])
		{
    
    
			//交换
			Swap(&data[child], &data[parent]);
			//迭代
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
    
    
			break;
		}
	}

}

void PrintTopK(const char* file, int k)
{
    
    
	// 1. 建堆--用a中前k个元素建小堆
	int* topk = (int*)malloc(sizeof(int) * k);
	assert(topk);

	FILE* fout = fopen(file, "r");
	if (fout == NULL)
	{
    
    
		perror("fopen error");
		return;
	}

	// 读出前k个数据建小堆
	for (int i = 0; i < k; ++i)
	{
    
    
		fscanf(fout, "%d", &topk[i]);
	}

	for (int i = (k - 2) / 2; i >= 0; --i)
	{
    
    
		AdjustDownSH(topk, k, i);
	}

	// 2. 将剩余n-k个元素依次与堆顶元素交换,不满则则替换
	int val = 0;
	int ret = fscanf(fout, "%d", &val);
	while (ret != EOF)
	{
    
    
		if (val > topk[0])
		{
    
    
			topk[0] = val;
			AdjustDownSH(topk, k, 0);
		}

		ret = fscanf(fout, "%d", &val);
	}

	for (int i = 0; i < k; i++)
	{
    
    
		printf("%d ", topk[i]);
	}
	printf("\n");

	free(topk);
	fclose(fout);
}

void CreateNDate()
{
    
    
	// 造数据
	int n = 10000;
	srand(time(0));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");
	if (fin == NULL)
	{
    
    
		perror("fopen error");
		return;
	}

	for (size_t i = 0; i < n; ++i)
	{
    
    
		int x = rand() % 10000;
		fprintf(fin, "%d\n", x);
	}

	fclose(fin);
}

int main()
{
    
    
	CreateNDate();
	PrintTopK("data.txt", 10);

	return 0;
}

Guess you like

Origin blog.csdn.net/m0_71927622/article/details/131070174