Data structure: the implementation of the hand-torn diagram heap and the application of TopK

important concept

To talk about heaps, let's talk about two concepts about binary trees

  1. Full binary tree: If the number of nodes in each layer of a binary tree is the maximum, then the binary tree is a full binary tree
    insert image description here

  2. Complete binary tree: A complete binary tree is a very efficient data structure. A complete binary tree is a deformation of a full binary tree. For a binary tree with n nodes in a tree with a depth of k, if and only if each node is the same as a full binary tree with a depth of k The nodes numbered from 1 to n
    insert image description here
    in the above show the full binary tree and the complete binary tree

How the tree is stored

sequential storage

Any data structure must be stored in a certain way in memory, so how to store it specifically? has the following rules

The first is sequential storage, that is, it is stored in the form of a sequential table, and the storage form is as follows:

insert image description here
But obviously, such storage will cause a very serious waste of memory for incomplete binary trees

chain storage

Compared with sequential storage, chained storage has its own advantages. The rules of chained storage are as follows:

insert image description here

Define a structure that contains these three members, and these three members can contain all the information of a tree

The following focuses on how the sequential structure of the binary tree is realized

heap concept

First of all, it must be clear that the heap here and the heap of malloc do not mean the same thing. The former means a data structure, while the latter is a part of the operating system.

A heap is a complete binary tree that satisfies the following properties

The value of a node in the heap is always not greater than or not less than the value of its parent node The
heap is always a complete binary tree

Then why is it not greater than or not less than? Because the heap is also divided, the heap is divided into a large heap and a small heap

Before introducing the large heap and the small heap, let’s explain how the sequential storage of the heap is stored. Take the following figure as an example

insert image description here

The above picture is a complete binary tree, where the parent node of the binary tree is always smaller than the child node, then this is a small heap, and the storage form in the memory is shown in the figure below. When storing, it is indeed stored in an array, and the order follows from top to Stored in left-to-right order

The large pile is basically the same as the above picture, when the parent node is always larger than the child node

implementation of the heap

Then as a data structure, it will have its own use, let's analyze how the heap is implemented

From the above storage structure, it can be seen that in fact, every array can be regarded as a binary tree. Due to the particularity of the heap, the first problem is how to sort the numbers in an array to meet the requirements of the heap.

Adjust Algorithm Up

This algorithm is mainly used to insert elements in the heap. When inserting an element, due to the large/small heap, the inserted element may not meet the requirements of the heap. At this time, the upward adjustment algorithm needs to be used.

The application scenario of this algorithm is that when an element is to be inserted into a heap, this algorithm can be used to insert it so that the subsequent binary tree is still a heap, provided that the binary tree before insertion must meet the requirements of the heap

The flow of the algorithm is this

insert image description here

First of all, there is a heap originally, and there is a new element 12 to be inserted into the heap. Its position should be the child node of 15, but due to the rules of the small heap, 12 is smaller than 15, so the position of 12 here should be exchanged with 15 , and then compare 12 with its previous generation, and find that 12 is less than 10, which satisfies the rules of the small heap, so the new heap becomes as shown in the figure on the right, and the insertion of the heap is completed so far

It has to be mentioned here that the ultimate goal of the algorithm for the inserted element to be adjusted upwards is its ancestor. As long as it does not meet the rules with the previous generation, it will be exchanged until it becomes the ancestor of the generation it is in.

Some tricks in the implementation process

Knowing the serial number of the child node, how to find the parent node?

Due to the rounding of the computer's division sign, the parent node == (child node-1) / 2

Implement build heap

According to the above two steps, we can start to build the heap

The first is to insert the array into the heap

int main()
{
    
    
	HP hp;
	HeapInit(&hp);
	int arr[] = {
    
     9,8,6,5,43,2,1 };
	int sz = sizeof(arr) / sizeof(arr[0]);
	for (int i = 0; i < sz; i++)
	{
    
    
		HeapPush(&hp, arr[i]);
	}

	return 0;
}

Here it is assumed to be inserted directly without any algorithm adjustment, then the result should be like this
insert image description here
If the upward adjustment algorithm is used for adjustment, the subsequent result is like this

insert image description here

void Swap(HPDataType* child, HPDataType* parent)
{
    
    
	HPDataType tmp;
	tmp = *child;
	*child = *parent;
	*parent = tmp;
}

void AdjustUp(HP* php, int child)
{
    
    
	int parent = (child - 1) / 2;
	while (child > 0)
	{
    
    
		if (php->a[child] < php->a[parent])
		{
    
    
			Swap(&php->a[child], &php->a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
    
    
			break;
		}
	}
}

void HeapPush(HP* php, HPDataType x)
{
    
    
	assert(php);

	if (php->size == php->capacity)
	{
    
    
		int newcapacity = php->capacity == 0 ? 4 : php->capacity * 2;
		php->a = (HPDataType*)malloc(sizeof(HPDataType) * newcapacity);
		if (php->a == NULL)
		{
    
    
			perror("malloc fail");
			return;
		}
	}

	php->a[php->size] = x;
	php->size++;

	AdjustUp(php, php->size - 1);
}

It can be seen from this that such an algorithm can correctly sort the heap, so that the heap is built

Next, we perform other operations related to the heap

Implement the heap operation

When there is data in the heap, it is necessary to get out of the heap, so how does the data get out of the heap?

First of all, it is necessary to clarify who is out of the heap. Beginners may think that it is the last element of the heap. In fact, such an operation is meaningless. Realize the function of heap?

If you don't think about it, this function is very simple. Wouldn't it be good to just overwrite the content behind the array? In fact, such an idea is wrong, the reason is that can the overwritten heap maintain its original status? The original parent-child relationship will become a brother relationship, and the original brother relationship will also change due to the lack of one element. The whole process will change a lot. Therefore, a second algorithm is introduced here to adjust the algorithm downward

The design of this algorithm is also very ingenious. Assuming that we are building a small heap now, the element on the top of the heap is the smallest element. Now we let the smallest element on the top of the heap exchange positions with the last element of the entire heap. Then the top element at this time becomes another element, but the rest of the heap still conforms to the rules of the small heap (the original minimum heap top exchanged is not counted in the heap, and has been popped), then the downward adjustment algorithm can be used , let this new top-of-heap element adjust down so that the goal

The figure below can explain this principle well.

insert image description here
So now we need to figure out what is the downward adjustment algorithm

downward adjustment algorithm

First declare the conditions for the use of this algorithm. This algorithm is applicable when the other parts except the top of the heap meet the conditions of small or large heaps. It can be used. Simply put, it can be used when popping the top of the heap.

The principle used is also quite simple. Suppose we have a small heap here, then the top element of the heap is popped up. At this time, the second smallest element in the heap must be the son of the top element of the heap, so we let the last leaf of the heap act as This new top of the heap can pop up the top element while keeping the overall structure of the heap unchanged, and then compare the top element with the son below, whoever is younger will be the new top of the heap , the second smallest element is generated after the exchange. Of course, if the height of the tree is very high, it may be necessary to continue the exchange after the exchange until the leaf returns to the last layer. This process can also be achieved by means of loops. With this By adjusting the algorithm downwards, the top element of the heap can be popped up while turning it into a new heap, and the minimum or maximum value can be continuously found

Then let's implement the algorithm

void AdjustDown(HP* php, int n, int parent)
{
    
    
	assert(php);
	int child = parent * 2 + 1;
	while (child < n)
	{
    
    
		if (child + 1 < n && php->a[child + 1] < php->a[child])
		{
    
    
			child++;
		}
		if (php->a[child] < php->a[parent])
		{
    
    
			Swap(&php->a[child], &php->a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
    
    
			break;
		}
	}
}
void HeapPop(HP* php)
{
    
    
	assert(php);
	Swap(&php->a[0], &php->a[php->size - 1]);
	php->size--;

	AdjustDown(php, php->size, 0);
}

heap sort

The following explains another role of the heap, which can be used for heap sorting

First explain the principle of heap sorting: Suppose there are 10 numbers here, and now these 10 numbers are built into a small heap, then the element on the top of the heap is the minimum value of these 10 numbers, and then let the number and the last element call the position, In this way, the minimum value reaches the last position, and then the downward adjustment algorithm can adjust the second smallest element. Follow the above process again to get a new number, so that the function of descending order can be realized.

The specific operation process is as follows

void HeapSort(HPDataType* a, int size)
{
    
    
	assert(a);

	//建堆
	for (int i = (size - 1 - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(a, size, i);
	}
	
	//排序
	int end = size - 1;
	while (end > 0)
	{
    
    
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);
		end--;
	}
}

Such sorting is also valid

insert image description here

So what's so good about heap sorting? From the perspective of time complexity, the time complexity of heap sorting is only O(NlogN), and overall efficiency is still possible

TopK

The really powerful function of the heap is that it is powerful to find the largest or smallest 10 of the numbers of a large magnitude. Assuming that the number is 100 million or even billion, then if we still use normal sorting to Look, then the whole process will be quite troublesome. Sort all these numbers and find the largest or smallest ones. The time and space complexity consumed by this process is incalculable, and even the computer does not have enough memory for you to build such a huge number. space

Therefore, the heap can solve this problem very well. The function of the heap is mainly reflected in the fact that it can filter out the data you want. The principle of topk is introduced below.

Suppose we now have 10,000 numbers, and we want to find the largest 5 of them, so how to use the heap to implement it?
First, we build a heap of the first five numbers. Suppose we are looking for the largest five numbers, then we build a small heap, and then let the subsequent numbers sequentially from the top of the heap to see if they can enter the heap. Suppose this number is greater than the top element of the heap, then let this element be called the top element of the heap, and then adjust it downward, and then compare the next element with the top of the heap...

According to the implementation of this idea, the elements in the heap can be the largest five elements among all numbers, so that the goal can be achieved

Let's simulate this process

First of all, we need to obtain the 10,000 data. The following shows a way to obtain the amount of data

void CreateData()
{
    
    
	int n = 10000;
	srand(time(0));
	FILE* pf = fopen("test.txt", "w");
	if (pf == NULL)
	{
    
    
		perror("fopen fail");
		return;
	}
	for (int i = 0; i < n; i++)
	{
    
    
		int x = rand() % 10000;
		fprintf(pf, "%d\n", x);
	}
	fclose(pf);
}

After getting the information, start to realize the function of topk

void PrintTopK()
{
    
    
	Heap hp = {
    
     0,0,0 };
	HeapCreate(&hp,hp.a,4);
	FILE* pf = fopen("test.txt", "r");
	if (pf == NULL)
	{
    
    
		perror("fopen fail");
		return;
	}
	int* kmaxheap = (int*)malloc(sizeof(int) * 5);
	if (kmaxheap == NULL)
	{
    
    
		perror("malloc fail");
		return;
	}
	for (int i = 0; i < 5; i++)
	{
    
    
		fscanf(pf, "%d", &kmaxheap[i]);
		HeapPush(&hp, kmaxheap[i]);
	}
	int val = 0;
	while (!feof(pf))
	{
    
    
		fscanf(pf, "%d", &val);
		if (val > kmaxheap[0])
		{
    
    
			kmaxheap[0] = val;
			AdjustDown(kmaxheap, 5, 0);
		}
	}
	for (int i = 0; i < 5; i++)
	{
    
    
		printf("%d ", kmaxheap[i]);
	}
}

Guess you like

Origin blog.csdn.net/qq_73899585/article/details/131702630
Recommended