[Data structure] Heap: heap implementation, heap sorting, TOP-K problem

The concept and structure of heap

If there is a set of key codes K = { , , , ,}, stores all its elements in a one-dimensional array in the order of a complete binary tree, and satisfies: <= and<= ( >= and>= ) i = 0 , 1 , 2..., is called a small heap ( or a large heap Heap ) . The heap with the largest root node is called the maximum heap or large root heap, and the heap with the smallest root node is called the minimum heap or small root heap.
Properties of heap:
  • The value of a node in the heap is always no greater than or no less than the value of its parent node;
  • The heap is always a complete binary tree.

Small root heap: the parent node is greater than or equal to the child node

Large root heap: the parent node is less than or equal to the child node 

Heap implementation 

Implement the interface of the heap

#define CRT_SECURE_NO_WARNING 1
#pragma once
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<assert.h>
#include<stdbool.h>

//二叉树-堆
typedef int HPDataType;
typedef struct Heap
{
	HPDataType* a;
	int size;
	int capacity;
}HP;


void AdjustUp(HPDataType* a, int child);

void AdjustDown(HPDataType* a, int n, int parent);

//交换
void Swap(HPDataType* p1, HPDataType* p2);
//打印
void HeapPrint(HP* php);
//初始化
void HeapInit(HP* php);
//
void HeapInitArray(HP* php, int* a, int n);
//销毁
void HeapDestroy(HP* php);
//插入
void HeapPush(HP* php, HPDataType x);
//删除
void HeapPop(HP* php);
//返回最顶数据
HPDataType HeapTop(HP* php);
//判空
bool HeapEmpty(HP* php);

Heap initialization

//初始化
void HeapInit(HP* php)
{
	assert(php);

	php->a = NULL;
	php->size = 0;
	php->capacity = 0;
}

pile of prints

void HeapPrint(HP* php)
{
	assert(php);

	//最后一个孩子下标为size-1
	for (size_t i = 0; i < php->size; i++)
	{
		printf("%d ", php->a[i]);
	}
	printf("\n");
}

Heap destruction

//销毁
void HeapDestroy(HP* php)
{
	assert(php);

	free(php->a);
	php->a = NULL;
	php->size = php->capacity = 0;
}

Get the top root data

//获取根数据
HPDataType HeapTop(HP* php)
{
	assert(php);
	assert(php->size > 0);

	return php->a[0];
}

 exchange

void Swap(HPDataType* p1, HPDataType* p2)
{
	HPDataType tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

Insertion into the heap (insert at the end)

Consider expansion first, insert data to the end, and then use the upward adjustment method.

//插入数据
void HeapPush(HP* php, HPDataType x)
{
	assert(php);

	//扩容
	if (php->size == php->capacity)//有效元素个数和容量是否相等
	{
		//相等的情况分两种:1.容量为0,先扩4个sizeof   2.容量占用满了,扩2个
		int newCapacity =php->capacity == 0 ? 4 : php->capacity * 2;
		//返回扩容后的内存新地址							//扩容后的新大小
		HPDataType* tmp = (HPDataType*)realloc(php->a, sizeof(HPDataType) * newCapacity);
		
		if (tmp == NULL)
		{
			perror("realloc fail");
			exit(-1);
		}

		//扩容后的新地址
		php->a = tmp;
		//新容量
		php->capacity = newCapacity;
	}

	//   php->size下标位置  先将x放最后,后面再调整
	php->a[php->size] = x;
	//   有效数据++
	php->size++;
	//   向上调整     //size-1为新插入数据的下标
	AdjustUp(php->a, php->size - 1);

}

Adjust upward (this time using a small pile)

The premise of upward adjustment: the left and right subtrees are heaps, and the time complexity is O(logN)

//向上调整                    //新插入的数据下标
void AdjustUp(HPDataType* a, int child)
{   
	//定义其父节点的下标
	int parent = (child - 1) / 2;
	//循环
	while (child > 0)
	{
		//如果子小于父就交换  (小堆)
		if (a[child] < a[parent])
		{
			//数值交换
			Swap(&a[child], &a[parent]);
			//下标
			child = parent;
			parent = (parent - 1) / 2;
		}
		else
		{
			break;
		}
	}
}

Deletion of heap (delete root)

Check it first to see if there are any elements that can be deleted. The root data first exchanges positions with the last child, and then the child is adjusted downwards.

//删除
void HeapPop(HP* php)
{
	assert(php);
	//确保有元素可删
	assert(php->size > 0);

	//最后一个孩子和要删除的根交换
	Swap(&php->a[0], &php->a[php->size - 1]);
	//有效元素size减减,相当于删除了交换后的原来的根
	--php->size;

	//删除后向下调整
	AdjustDown(php->a, php->size, 0);

}

Adjust downward (small pile used this time)

The premise of downward adjustment: the left and right subtrees are heaps 

//向下调整
void AdjustDown(HPDataType* a, int n, int parent)
{
	int child = parent * 2 + 1;
	//n下标位置已经没有数了
	while (child < n)
	{
		//选小的孩子往上浮(小堆)
		if (child + 1 < n && a[child + 1] < a[child])
		{
			++child;
		}
		//若小的孩子都小于父,则交换
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			//交换后下来的数重新变成parent,继续向下调整
			parent = child;
			child = parent * 2 + 1;
		}
	}
}

Heap sort

1. Build a pile
Ascending order: build a large pile
Descending order: Build a small pile
2. Use heap deletion idea to sort
Building a heap: Adjusting the method upward. The time complexity of building a heap: O(N*logN)
           The time complexity of adjusting the method to build a heap downwards: O(N)
You can use the idea of ​​heap deletionDownward adjustment methodExchange the top of the stack and the last element, and then replace the largest and next largest.. ....Put it back to achieve ascending order.
void HeapSort(int* a, int n)
{
	//建堆  这里可以选建大堆还是小堆
	// 向下调整建堆
	// O(N)
	for (int i = (n-1-1)/2; i >= 0; i--)
	{
		AdjustDown(a, n, i);
	}

	int end = n - 1;
	while (end > 0)
	{
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);
		--end;
	}
}

TOP-K problem

TOP-K Problem: Find the top K largest or smallest elements in data combination Generally speaking, the amount of data is relatively large .
For example: before major 10 famous, world wide 500 Strong, rich list, top 100 active players in the game, etc.
For Top-K problem, the simplest and most direct way that can be thought of is sorting. However: if the amount of data is very large, sorting is difficult. Not very advisable ( possibly
The data cannot be loaded into the memory at once ) . The best way is to use a heap to solve the problem. The basic idea is as follows:
1. Use the first K elements in the data set to build a heap
First k largest elements, then build a small heap
First k minimum elements, then build a big heap
2. Use the remaining N-K elements to compare with the top element of the heap in turn, and it is not satisfied Then replace the top element of the heap
After comparing the remaining N-K elements with the top element of the heap, the remaining K The elements are the first K minimum or maximum elements

First create a data.txt text file containing 10,000,000 numbers.

void CreateNDate()
{
	// 造数据
	int n = 10000000;
	srand(time(0));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");
	if (fin == NULL)
	{
		perror("fopen error");
		return;
	}

	for (int i = 0; i < n; ++i)
	{
		int x = (rand() + i) % 10000000;
		fprintf(fin, "%d\n", x);
	}

	fclose(fin);
}

Build a small heap for the first k items (the top element of the heap is the smallest among k). The remaining n-k elements are compared with the top elements of the heap in turn. If they are larger than k, they are inserted into the heap (insert heap insertion downward adjustment method). After completion, the first k elements are printed. .

void PrintTopK(const char* filename, int k)
{
	// 1. 建堆--用a中前k个元素建堆
	FILE* fout = fopen(filename, "r");
	if (fout == NULL)
	{
		perror("fopen fail");
		return;
	}

	//给堆开辟空间
	int* minheap = (int*)malloc(sizeof(int) * k);
	if (minheap == NULL)
	{
		perror("malloc fail");
		return;
	}

	for (int i = 0; i < k; i++)
	{
		fscanf(fout, "%d", &minheap[i]);
	}

	// 前k个数建小堆
	for (int i = (k - 2) / 2; i >= 0; --i)
	{
		AdjustDown(minheap, k, i);
	}


	// 2. 将剩余n-k个元素依次与堆顶元素交换,不满则则替换
	int x = 0;
	while (fscanf(fout, "%d", &x) != EOF)
	{
		if (x > minheap[0])
		{
			// 替换你进堆
			minheap[0] = x;
			AdjustDown(minheap, k, 0);
		}
	}


	for (int i = 0; i < k; i++)
	{
		printf("%d ", minheap[i]);
	}
	printf("\n");

	free(minheap);
	fclose(fout);
}

Assuming k equals 5, successfully print out the first 5 largest data

Guess you like

Origin blog.csdn.net/m0_64476561/article/details/134385491