Article Directory
important concept
To talk about heaps, let's talk about two concepts about binary trees
-
Full binary tree: If the number of nodes in each layer of a binary tree is the maximum, then the binary tree is a full binary tree
-
Complete binary tree: A complete binary tree is a very efficient data structure. A complete binary tree is a deformation of a full binary tree. For a binary tree with n nodes in a tree with a depth of k, if and only if each node is the same as a full binary tree with a depth of k The nodes numbered from 1 to n
in the above show the full binary tree and the complete binary tree
How the tree is stored
sequential storage
Any data structure must be stored in a certain way in memory, so how to store it specifically? has the following rules
The first is sequential storage, that is, it is stored in the form of a sequential table, and the storage form is as follows:
But obviously, such storage will cause a very serious waste of memory for incomplete binary trees
chain storage
Compared with sequential storage, chained storage has its own advantages. The rules of chained storage are as follows:
Define a structure that contains these three members, and these three members can contain all the information of a tree
The following focuses on how the sequential structure of the binary tree is realized
heap concept
First of all, it must be clear that the heap here and the heap of malloc do not mean the same thing. The former means a data structure, while the latter is a part of the operating system.
A heap is a complete binary tree that satisfies the following properties
The value of a node in the heap is always not greater than or not less than the value of its parent node The
heap is always a complete binary tree
Then why is it not greater than or not less than? Because the heap is also divided, the heap is divided into a large heap and a small heap
Before introducing the large heap and the small heap, let’s explain how the sequential storage of the heap is stored. Take the following figure as an example
The above picture is a complete binary tree, where the parent node of the binary tree is always smaller than the child node, then this is a small heap, and the storage form in the memory is shown in the figure below. When storing, it is indeed stored in an array, and the order follows from top to Stored in left-to-right order
The large pile is basically the same as the above picture, when the parent node is always larger than the child node
implementation of the heap
Then as a data structure, it will have its own use, let's analyze how the heap is implemented
From the above storage structure, it can be seen that in fact, every array can be regarded as a binary tree. Due to the particularity of the heap, the first problem is how to sort the numbers in an array to meet the requirements of the heap.
Adjust Algorithm Up
This algorithm is mainly used to insert elements in the heap. When inserting an element, due to the large/small heap, the inserted element may not meet the requirements of the heap. At this time, the upward adjustment algorithm needs to be used.
The application scenario of this algorithm is that when an element is to be inserted into a heap, this algorithm can be used to insert it so that the subsequent binary tree is still a heap, provided that the binary tree before insertion must meet the requirements of the heap
The flow of the algorithm is this
First of all, there is a heap originally, and there is a new element 12 to be inserted into the heap. Its position should be the child node of 15, but due to the rules of the small heap, 12 is smaller than 15, so the position of 12 here should be exchanged with 15 , and then compare 12 with its previous generation, and find that 12 is less than 10, which satisfies the rules of the small heap, so the new heap becomes as shown in the figure on the right, and the insertion of the heap is completed so far
It has to be mentioned here that the ultimate goal of the algorithm for the inserted element to be adjusted upwards is its ancestor. As long as it does not meet the rules with the previous generation, it will be exchanged until it becomes the ancestor of the generation it is in.
Some tricks in the implementation process
Knowing the serial number of the child node, how to find the parent node?
Due to the rounding of the computer's division sign, the parent node == (child node-1) / 2
Implement build heap
According to the above two steps, we can start to build the heap
The first is to insert the array into the heap
int main()
{
HP hp;
HeapInit(&hp);
int arr[] = {
9,8,6,5,43,2,1 };
int sz = sizeof(arr) / sizeof(arr[0]);
for (int i = 0; i < sz; i++)
{
HeapPush(&hp, arr[i]);
}
return 0;
}
Here it is assumed to be inserted directly without any algorithm adjustment, then the result should be like this
If the upward adjustment algorithm is used for adjustment, the subsequent result is like this
void Swap(HPDataType* child, HPDataType* parent)
{
HPDataType tmp;
tmp = *child;
*child = *parent;
*parent = tmp;
}
void AdjustUp(HP* php, int child)
{
int parent = (child - 1) / 2;
while (child > 0)
{
if (php->a[child] < php->a[parent])
{
Swap(&php->a[child], &php->a[parent]);
child = parent;
parent = (child - 1) / 2;
}
else
{
break;
}
}
}
void HeapPush(HP* php, HPDataType x)
{
assert(php);
if (php->size == php->capacity)
{
int newcapacity = php->capacity == 0 ? 4 : php->capacity * 2;
php->a = (HPDataType*)malloc(sizeof(HPDataType) * newcapacity);
if (php->a == NULL)
{
perror("malloc fail");
return;
}
}
php->a[php->size] = x;
php->size++;
AdjustUp(php, php->size - 1);
}
It can be seen from this that such an algorithm can correctly sort the heap, so that the heap is built
Next, we perform other operations related to the heap
Implement the heap operation
When there is data in the heap, it is necessary to get out of the heap, so how does the data get out of the heap?
First of all, it is necessary to clarify who is out of the heap. Beginners may think that it is the last element of the heap. In fact, such an operation is meaningless. Realize the function of heap?
If you don't think about it, this function is very simple. Wouldn't it be good to just overwrite the content behind the array? In fact, such an idea is wrong, the reason is that can the overwritten heap maintain its original status? The original parent-child relationship will become a brother relationship, and the original brother relationship will also change due to the lack of one element. The whole process will change a lot. Therefore, a second algorithm is introduced here to adjust the algorithm downward
The design of this algorithm is also very ingenious. Assuming that we are building a small heap now, the element on the top of the heap is the smallest element. Now we let the smallest element on the top of the heap exchange positions with the last element of the entire heap. Then the top element at this time becomes another element, but the rest of the heap still conforms to the rules of the small heap (the original minimum heap top exchanged is not counted in the heap, and has been popped), then the downward adjustment algorithm can be used , let this new top-of-heap element adjust down so that the goal
The figure below can explain this principle well.
So now we need to figure out what is the downward adjustment algorithm
downward adjustment algorithm
First declare the conditions for the use of this algorithm. This algorithm is applicable when the other parts except the top of the heap meet the conditions of small or large heaps. It can be used. Simply put, it can be used when popping the top of the heap.
The principle used is also quite simple. Suppose we have a small heap here, then the top element of the heap is popped up. At this time, the second smallest element in the heap must be the son of the top element of the heap, so we let the last leaf of the heap act as This new top of the heap can pop up the top element while keeping the overall structure of the heap unchanged, and then compare the top element with the son below, whoever is younger will be the new top of the heap , the second smallest element is generated after the exchange. Of course, if the height of the tree is very high, it may be necessary to continue the exchange after the exchange until the leaf returns to the last layer. This process can also be achieved by means of loops. With this By adjusting the algorithm downwards, the top element of the heap can be popped up while turning it into a new heap, and the minimum or maximum value can be continuously found
Then let's implement the algorithm
void AdjustDown(HP* php, int n, int parent)
{
assert(php);
int child = parent * 2 + 1;
while (child < n)
{
if (child + 1 < n && php->a[child + 1] < php->a[child])
{
child++;
}
if (php->a[child] < php->a[parent])
{
Swap(&php->a[child], &php->a[parent]);
parent = child;
child = parent * 2 + 1;
}
else
{
break;
}
}
}
void HeapPop(HP* php)
{
assert(php);
Swap(&php->a[0], &php->a[php->size - 1]);
php->size--;
AdjustDown(php, php->size, 0);
}
heap sort
The following explains another role of the heap, which can be used for heap sorting
First explain the principle of heap sorting: Suppose there are 10 numbers here, and now these 10 numbers are built into a small heap, then the element on the top of the heap is the minimum value of these 10 numbers, and then let the number and the last element call the position, In this way, the minimum value reaches the last position, and then the downward adjustment algorithm can adjust the second smallest element. Follow the above process again to get a new number, so that the function of descending order can be realized.
The specific operation process is as follows
void HeapSort(HPDataType* a, int size)
{
assert(a);
//建堆
for (int i = (size - 1 - 1) / 2; i >= 0; i--)
{
AdjustDown(a, size, i);
}
//排序
int end = size - 1;
while (end > 0)
{
Swap(&a[0], &a[end]);
AdjustDown(a, end, 0);
end--;
}
}
Such sorting is also valid
So what's so good about heap sorting? From the perspective of time complexity, the time complexity of heap sorting is only O(NlogN), and overall efficiency is still possible
TopK
The really powerful function of the heap is that it is powerful to find the largest or smallest 10 of the numbers of a large magnitude. Assuming that the number is 100 million or even billion, then if we still use normal sorting to Look, then the whole process will be quite troublesome. Sort all these numbers and find the largest or smallest ones. The time and space complexity consumed by this process is incalculable, and even the computer does not have enough memory for you to build such a huge number. space
Therefore, the heap can solve this problem very well. The function of the heap is mainly reflected in the fact that it can filter out the data you want. The principle of topk is introduced below.
Suppose we now have 10,000 numbers, and we want to find the largest 5 of them, so how to use the heap to implement it?
First, we build a heap of the first five numbers. Suppose we are looking for the largest five numbers, then we build a small heap, and then let the subsequent numbers sequentially from the top of the heap to see if they can enter the heap. Suppose this number is greater than the top element of the heap, then let this element be called the top element of the heap, and then adjust it downward, and then compare the next element with the top of the heap...
According to the implementation of this idea, the elements in the heap can be the largest five elements among all numbers, so that the goal can be achieved
Let's simulate this process
First of all, we need to obtain the 10,000 data. The following shows a way to obtain the amount of data
void CreateData()
{
int n = 10000;
srand(time(0));
FILE* pf = fopen("test.txt", "w");
if (pf == NULL)
{
perror("fopen fail");
return;
}
for (int i = 0; i < n; i++)
{
int x = rand() % 10000;
fprintf(pf, "%d\n", x);
}
fclose(pf);
}
After getting the information, start to realize the function of topk
void PrintTopK()
{
Heap hp = {
0,0,0 };
HeapCreate(&hp,hp.a,4);
FILE* pf = fopen("test.txt", "r");
if (pf == NULL)
{
perror("fopen fail");
return;
}
int* kmaxheap = (int*)malloc(sizeof(int) * 5);
if (kmaxheap == NULL)
{
perror("malloc fail");
return;
}
for (int i = 0; i < 5; i++)
{
fscanf(pf, "%d", &kmaxheap[i]);
HeapPush(&hp, kmaxheap[i]);
}
int val = 0;
while (!feof(pf))
{
fscanf(pf, "%d", &val);
if (val > kmaxheap[0])
{
kmaxheap[0] = val;
AdjustDown(kmaxheap, 5, 0);
}
}
for (int i = 0; i < 5; i++)
{
printf("%d ", kmaxheap[i]);
}
}