【Data structure】What is this pile

Table of contents

1. The sequential structure of the binary tree

2. The concept and structure of the heap

3. Implementation of the heap

3.1 Up-adjustment algorithm and down-adjustment algorithm

3.2 Heap creation

 3.3 Space complexity of building a heap

3.4 Heap insertion

 3.5 Heap deletion

 3.6 Implementation of the heap code

4. Application of the heap

4.1 Heap sort

4.2 TOP-K problem


First of all, a heap is a data structure, a special complete binary tree, which is stored in a sequential structure. Before learning the heap, we first learn the sequential structure of the binary tree, and then start to learn the focus of this article---the heap .

1. The sequential structure of the binary tree

Ordinary binary trees are not suitable for storage in arrays, because there may be a lot of wasted space. The complete binary tree is more suitable for sequential structure storage. In reality, we usually store the heap (a binary tree) in an array of sequential structures . It should be noted that the heap here and the heap in the virtual process address space of the operating system are two different things. One is the data structure, and the other is the management in the operating system. A region of memory is segmented .

Sequential storage of a complete binary tree:

Sequential storage of incomplete binary trees:

 It can be found that there may be a lot of space waste in the incomplete binary tree.

2. The concept and structure of the heap

If there is a set of key codes K= { k0, k1, k2, ..., k(n-1) }, store all its elements in a one-dimensional array in the order of a complete binary tree, and satisfy : K i <= K( 2i+1 ) and K i <= K( 2i+2 ) (K i >= K( 2i+1) and K >= K( 2i+2 ) ) i =0,1 , 2..., it is called a small pile (or a large pile). The heap with the largest root node is called the largest heap or large root heap, and the heap with the smallest root node is called the smallest heap or small root heap.

Generally speaking:

Heaps can be divided into big root piles and small root piles, referred to as big piles and small piles:

Large heap: All parent nodes in the heap are larger than child nodes. The root node is the largest.

Small heap: All parent nodes in the heap are smaller than child nodes. The root node is the smallest.

Properties of the heap:

  • The value of a node in the heap is always not greater than or not less than the value of its parent node;
  • The heap is always a complete binary tree.

Example:

example:

1. The following key sequence is heap: ()
A 100,60,70,50,32,65
B 60,70,65,50,32,100

C 65,100,70,32,50,60

D 70,65,100,32,50,60

E 32,50,100,70,65,60

F 50,100,70,65,60,32

Answer A

3. Implementation of the heap

3.1 Up-adjustment algorithm and down-adjustment algorithm

There are two algorithms for building a heap, one is an upward adjustment algorithm, and the other is a downward adjustment algorithm. Now we give an array, which is logically regarded as a complete binary tree.

Algorithm for upward adjustment:

Prerequisite: The previous elements have formed a heap before they can be adjusted .

For example, insert 5 after the small heap below

 will adjust the inserted data up to the appropriate position

Code:

//交换
void Swap(HeapDataType* p1, HeapDataType* p2)
{
	HeapDataType tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}
//小堆
void AdjustUp(HeapDataType* arr, int child)
{
	int parent = (child - 1) / 2;
	while (child > 0)
	{
		if (arr[child] < arr[parent])
		{
            //交换
			Swap(&arr[child],&arr[parent]);
			
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
			break;
		}
	}
}

 The judging conditions of the upward adjustment algorithm for building a large heap and a small heap are different and slightly different.

Adjust the algorithm down:

We can adjust it into a small heap through the downward adjustment algorithm starting from the root node.

The downward adjustment algorithm has a premise: the left and right subtrees must be a heap to be adjusted .

int array[] = {27,15,19,18,28,34,65,49,25,37};

The left and right subtrees rooted at 27 all satisfy the nature of a small heap, only the root node does not, so you only need to adjust the root node down to a suitable position to form a heap

Code

//向下调整
//完全二叉树没有左孩子,肯定没有右孩子   n是数组元素个数
void AdjustDown(HeapDataType* arr, int n, int parent)
{
	int child = parent * 2 + 1;
	while (child < n)
	{
        //找出左右孩子中最小的
		if (child + 1 < n && arr[child] > arr[child + 1])
		{
			child++;
		}
		if (arr[child] < arr[parent])
		{
            //交换
			Swap(&arr[child], &arr[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			break;
		}
	}
}

3.2 Heap creation

Here, the downward adjustment algorithm is used to create, because the upward adjustment method has a large time complexity, which will be discussed later

Below we give an array, which logically can be regarded as a complete binary tree, but it is not a heap yet. Now we use an algorithm to build it into a heap. The left and right subtrees of the root node are not heaps, how do we adjust it? Here we start to adjust from the last subtree of the first non-leaf node to the tree of the root node, and then we can adjust it into a pile .

int a[] = {1,5,3,8,7,6};

Proceed as follows: 

 3.3 Space complexity of building a heap

Because the heap is a complete binary tree, and the full binary tree is also a complete binary tree, here we use a full binary tree to prove it for simplicity (the time complexity is originally an approximation, and a few more nodes will not affect the final result):

Adjust the algorithm to build the heap downwards:

 Therefore, the time complexity of downward adjustment algorithm to build a heap is O(N).

Adjust the algorithm to build up the heap:

The upward adjustment algorithm needs to be adjusted upwards from the second node. After the adjustment is completed, the third node is adjusted upwards and backwards in turn until the adjustment is completed.

 Therefore, the time complexity of upward adjustment algorithm to build a heap is O(N*(logN)).

So it is recommended to use the downward adjustment algorithm here.

3.4 Heap insertion

First insert a 10 to the end of the array, and then perform an upward adjustment algorithm until the heap is satisfied.

 3.5 Heap deletion

Deleting the heap is to delete the data at the top of the heap, replace the data at the top of the heap with the last data, then delete the last data in the array, and then perform the downward adjustment algorithm .

 3.6 Implementation of the heap code

typedef int HPDataType;
typedef struct Heap
{
	HPDataType* a;
	int size;
	int capacity;
}Heap;

// 堆的构建
void HeapCreate(Heap* hp, HPDataType* a, int n);
// 堆的销毁
void HeapDestory(Heap* hp);
// 堆的插入
void HeapPush(Heap* hp, HPDataType x);
// 堆的删除
void HeapPop(Heap* hp);
// 取堆顶的数据
HPDataType HeapTop(Heap* hp);
// 堆的数据个数
int HeapSize(Heap* hp);
// 堆的判空
int HeapEmpty(Heap* hp);

Implementation:

//交换元素
void Swap(HPDataType* p1, HPDataType* p2)
{
	HPDataType t = *p1;
	*p1 = *p2;
	*p2 = t;
}
//向下调整 小堆
void AdjustDown(HPDataType* arr, int n, int parent)
{
	int child = parent * 2 + 1;
	while (child < n)
	{
		//找最小子树
		if (child + 1 < n && arr[child] > arr[child + 1])
		{
			child++;
		}

		if (arr[child] < arr[parent])
		{
			Swap(&arr[child], &arr[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			return;
		}
	}
}
// 堆的构建
void HeapCreate(Heap* hp, HPDataType* a, int n)
{
	assert(hp);
	hp->a = (HPDataType*)malloc(sizeof(HPDataType) * n);
	if (hp->a == NULL)
	{
		return;
	}
	hp->capacity = n;
	hp->size = n;
	for (int i = 0; i < n; i++)
	{
		hp->a[i] = a[i];
	}
	
	for (int i = (n-1-1)/2; i >= 0; i--)
	{
		AdjustDown(hp->a, n, i);
	}
 }
// 堆的销毁
void HeapDestory(Heap* hp)
{
	assert(hp);
	free(hp->a);
	hp->a = NULL;
	hp->size = hp->capacity = 0;
}

//向上调整  小堆
void AdjustUp(HPDataType* arr, int child)
{
	int parent = (child - 1) / 2;
	while (child > 0)
	{
		if (arr[child] < arr[parent])
		{
			Swap(&arr[child], &arr[parent]);

			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
			return;
		}
	}
}
// 堆的插入
void HeapPush(Heap* hp, HPDataType x)
{
	assert(hp);
	if (hp->size == hp->capacity)
	{
		int newcapacity = hp->a == NULL ? 4 : hp->capacity * 2;
		HPDataType* ptr = (HPDataType*)realloc(hp->a, sizeof(HPDataType) * newcapacity);
		if (ptr == NULL)
		{
			perror("realloc fail");
			return;
		}
		hp->a = ptr;
		hp->capacity = newcapacity;
	}
	hp->a[hp->size] = x;
	hp->size++;
	//向上调整
	AdjustUp(hp->a, hp->size - 1);
}

// 堆的删除
void HeapPop(Heap* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));
	Swap(&hp->a[0], &hp->a[hp->size - 1]);
	hp->size--;
	//向下调整
	AdjustDown(hp->a, hp->size, 0);

}
// 取堆顶的数据
HPDataType HeapTop(Heap* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));
	return hp->a[0];
}
// 堆的数据个数
int HeapSize(Heap* hp)
{
	assert(hp);
	return hp->size;
}
// 堆的判空
int HeapEmpty(Heap* hp)
{
	assert(hp);
	return hp->size == 0;
}

4. Application of the heap

4.1 Heap sort

Heap sorting is to use the idea of ​​heap to sort. It is divided into two steps:
1. Build a heap

  • Ascending order: Build big pile
  • Descending order: build a small heap

2. Use the idea of ​​​​heap deletion to sort.

Downward adjustment is used in both heap building and heap deletion, so heap sorting can be completed by mastering downward adjustment .

Example: ascending order:

#include<stdio.h>
//交换
void swap(int* p1, int* p2)
{
	int t = *p1;
	*p1 = *p2;
	*p2 = t;
}

//向下调整 大堆
void Adjustdown(int* arr, int n, int parent)
{
	int child = parent * 2 + 1;
	while (child < n)
	{
		if (child + 1 < n && arr[child] < arr[child + 1])
		{
			child++;
		}
		if (arr[child] > arr[parent])
		{
			swap(&arr[child], &arr[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			break;
		}
	}
}

void HeapSort(int* a, int n)
{
	//建堆  升序建大堆
	//向下调整
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)
	{
		Adjustdown(a, n, i);
	}
	
	//排序 
	int end = n - 1;
	while (end)
	{
		swap(&a[end], &a[0]);
		Adjustdown(a, end, 0);
		end--;
	}
}

int main()
{
	
	int arr[] = { 10,50,40,20,30,60,70 };
	int sz = sizeof(arr) / sizeof(int);
	HeapSort(arr, sz);
	for (int i = 0; i < sz; i++)
	{
		printf("%d ", arr[i]);
	}
	
	return 0;
}

4.2 TOP-K problem

TOP-K problem: Find the top K largest elements or smallest elements in the data combination. Generally, the amount of data is relatively large. For example: the top 10 professional players, the world's top 500, the rich list, the top 100 active players in the game, etc.

For the Top-K problem, the most simple and direct way that can be thought of is sorting, but: if the amount of data is very large, sorting is not advisable (it may not be possible to load all the data into memory at once). The best way is to use the heap to solve it. The basic idea is as follows:

1. Use the first K elements in the data set to build a heap

  • For the first k largest elements, build a small heap
  • For the first k smallest elements, build a large heap

2. Use the remaining NK elements to compare with the top elements in turn , and replace the top elements if they are not satisfied, and then adjust downward.

After comparing the remaining NK elements with the top elements of the heap in turn, the remaining K elements in the heap are the first K smallest or largest elements sought.

Sample code: Here you need to generate the file and open the file to change several numbers to a larger value to simulate the maximum value in the data, then comment out the CreateNDate() function to simulate the TOP-K problem, because writing the file will change the original The data is overwritten.

#include<stdio.h>
#include<time.h>
void swap(int* p1, int* p2)
{
	int t = *p1;
	*p1 = *p2;
	*p2 = t;
}

//小堆
void Adjustdown(int* arr, int n, int parent)
{
	int child = parent * 2 + 1;
	while (child < n)
	{
		if (child + 1 < n && arr[child] > arr[child + 1])
		{
			child++;
		}
		if (arr[child] < arr[parent])
		{
			swap(&arr[child], &arr[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
			break;
		}
	}
}

void CreateNDate()
{
	// 造数据
	int n = 10000;
	srand((unsigned int)time(NULL));
	FILE* file = fopen("data.txt", "w");
	for (int i = 0; i < n; i++)
	{
		fprintf(file,"%d\n", rand() % 10000);//生成10000以内的随机数
	}

	fclose(file);
}

void PrintTopK(int k)//最大的k个数
{
	CreateNDate();//造数据,选这些数据中的最大值
	FILE* file = fopen("data.txt", "r");

	//建立小堆
	int* arr = (int*)malloc(sizeof(int) * k);
	for (int i = 0; i < k; i++)
	{
		fscanf(file, "%d", &arr[i]);
	}
	for (int i = (k-1-1)/2; i >=0 ; i--)
	{
		Adjustdown(arr, k, i);
	}

	int a = 0;
	while (fscanf(file, "%d", &a)!=EOF)
	{
		if (arr[0] < a)
		{
			swap(&arr[0], &a);
		}
		Adjustdown(arr, k, 0);
	}
	for (int i = 0; i < k; i++)
	{
		printf("%d ", arr[i]);
	}
	fclose(file);
	free(arr);
}


int main()
{
	PrintTopK(5);
	return 0;
}

end of article
 

Guess you like

Origin blog.csdn.net/qq_72916130/article/details/131980184