Data Structure | Heap Implementation Upward Adjustment Algorithm Downward Adjustment Algorithm Heap Sort


1. The concept and structure of heap

If there is a key set K = { k0,k1,k2, … , kn-1 }, store all its elements in a one-dimensional array in the order of a complete binary tree.
And satisfy: ki<=k2 i+1 and ki<=k2 i+2 (ki >=k2 i+1 and ki>=k2 i+2 ) i = 0, 1, 2..., then it is called a small heap ( or large piles).The heap with the largest root node is called the largest heap or large root heap , and the heap with the smallest root node is called the smallest heap or small root heap .

Properties of the heap:

  1. The value of a node in the heap is always not greater than or not less than the value of its parent node;
  2. The heap is always a complete binary tree .

After understanding the concept of heaps, you will definitely have doubts, what are the big root heaps and the small root heaps ?

insert image description here

  • You can see the representation of the logical structure. The heap is a complete binary tree , but on the basis of the complete binary tree , two concepts of the large root heap and the small root heap are divided .
  • But in fact, the real appearance of the heap in memory should be the form on the right, that is, the form of the storage structure , which is stored in the form of an array . The tree structure on the left, that is, the logical structure is what we imagined, and the purpose is to facilitate our understanding.
  • Then through the above explanation, we can know that there is a relationship between each node in the heap . We all know that there are father nodes and child nodes in the tree . Now can we use **subscript** to represent the relationship between these two nodes.

Use the subscript of the parent node to find the subscript of the child node:

LeftChild = Parent * 2 + 1; //The node subscript of the left child

RightChild = Parent * 2 + 2; //The node subscript of the right child

Use the subscript of the child node to find the subscript of the parent node:

Parent = (Child - 1) / 2;

insert image description here

If you find it difficult to understand, you can take a few data to calculate, and you will find that it is so.

2. Upward adjustment algorithm

After understanding the concept of the heap, we need to learn an algorithm calledAdjust Algorithm Up

1. Algorithm diagram analysis

The so-called upward adjustment literally means to continuously adjust the number with the above number, so what exactly is going on?
insert image description here

  • Looking at the picture above, if the newly inserted number is smaller than the father, no adjustment will be made. At this time, the heap is still a big root heap.
  • Then when we look at the situation below the picture, we can see that we have inserted a number 100 in the blank position at the end of the heap . At this time, in order to ensure that the heap maintains a large root heap , we must combine our newly inserted number with its Compared with the parent node , if the newly inserted number is greater than the father , then exchange positions with the father .
  • So how do we find its parent node, remember what we mentioned before.Use the subscript of the child node to find the subscript of the parent node: Parent = (Child - 1) / 2;
  • At this time, 100 is greater than 30, so we exchange the positions of the two numbers. After the exchange, continue to compare 100 with his father and continue to adjust.

2. Specific code implementation

Knowing the idea of ​​​​the algorithm, we need to convert it into code. Let's take a look at how to express it in code.

Here a is the structure of the data structure of the heap. We will introduce the details later. child is the subscript of the data we inserted. We need to use child to compare with its father to make adjustments.

void AdjustUp(HPDataType* a, int child)

After we have a child node, we need to find its parent node. If you forget here, you can scroll up and we have introduced it above.

int parent = (child - 1) / 2;

Next, the father node is found, and when the child is greater than the father, we do an exchange.

while(child > 0)
{
    
    
	if (a[child] > a[parent])如果孩子大于父亲就交换
	{
    
    
		Swap(&a[child], &a[parent]);//交换孩子和父亲
	}
}
  • But do we only need to do one swap? Obviously not. This is an ongoing process and we need to swap multiple times, so we need to use a loop.
  • So what is the condition of the loop? It can be seen that what I wrote is child>0 , because we are doing upward adjustment. The upward adjustment is compared with our own father node, and if it is greater than the father node, we will exchange. Then we must have a father node to exchange. If the subscript of my child is 0 at this time, aren’t we the data at the top of the heap ? We are already the largest, and there is no father for comparison, so we The condition for loop execution is child>0.

After the exchange, we need to update the position of the child node and continue to make adjustments.

child = parent;
parent = (child - 1) / 2;

Complete code implementation

void AdjustUp(HPDataType* a, int child)
{
    
    
	int parent = (child - 1) / 2;
	while(child > 0)//孩子节点的下标大于0我们执行循环.
	{
    
    
		if (a[child] > a[parent])如果孩子大于父亲就交换
		{
    
    
			Swap(&a[child], &a[parent]);//交换孩子和父亲
			
			//更新孩子和父亲的位置
			child = parent;
			parent = (child - 1) / 2;
		}
		else//如果孩子小于父亲,则不做调整,break退出循环.
		{
    
    
			break;
		}
	}
}

//Swap交换函数
void Swap(HPDataType* p1, HPDataType* p2)
{
    
    
	HPDataType x = *p1;
	*p1 = *p2;
	*p2 = x;
}

3. Adjust the algorithm downward! !

The downward adjustment algorithm is used more in actual use, so we need to focus on mastering

1. Algorithm diagram analysis

Likewise, the downward adjustment algorithm literally compares the data to its children and adjusts accordingly. We also use Dagenheap as an example here.
insert image description here
Note: The downward adjustment algorithm has a premise [the left and right subtrees must be a heap before they can be adjusted]
That is to say, to perform downward adjustment, we must ensure that the left and right subtrees of the root node are all small root heaps or large root heaps before we can perform downward adjustment.

  • We can see that in the first case of the picture, 100 is larger than its child nodes at this time. Since we want a large root heap, we will not make adjustments at this time.
  • Then we look at the second situation. At this time, the root node is 20, and 20 is smaller than its child. Then we have to exchange positions with its children, but which child should we exchange positions with? We need to exchange with the older of the two children , then we need to find the subscript of the child node, remember the formula we mentioned above to find the child node through the father node.
    LeftChild = Parent * 2 + 1; //The node subscript of the left child
    RightChild = Parent * 2 + 2; //The node subscript of the right child

2. Specific code implementation

Also know the idea of ​​​​the algorithm, and then implement it with code.

It can be seen that the downward adjustment algorithm is passed in to the parent node. This is because the downward adjustment algorithm is adjusted downwards, that is, adjusted with its own children, so we pass in the father instead of the child. The n here represents the size of the array, that is, the size of the heap. In the above figure, n==6. It is the condition we need to use to end the loop.

void AdjustDown(HPDataType* a, int n, int parent)

The child is found through the parent node, but why is there only one child? Didn't it just say that we need to exchange with the older of the two children? , this is we assume that child is the largest child, and then after entering the loop, we compare the right child with the left child, then select the larger one, and update the value of child.

int child = parent * 2 + 1;//此时假设左孩子就是最大的孩子

//child+1<n 是判断孩子存不存在,如果child+1==n就说明越界了
if (child + 1 < n && a[child+1] > a[child])
{
    
    
			++child;
}

If the older child is bigger than the father, we exchange their positions, then update the position of the father node, and continue to make adjustments. Note that our loop condition here is child<n, that is, the maximum value of child is n-1, that is The last element of the heap, if child is greater than or equal to n, is accessed out of bounds.

while (child < n)
	{
    
    
		// 选出左右孩子中大的那一个
		if (child + 1 < n && a[child+1] > a[child])
		{
    
    
			++child;
		}

		if (a[child] > a[parent])
		{
    
    
			Swap(&a[child], &a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}

Specific code implementation

// 左右子树都是大堆/小堆
void AdjustDown(HPDataType* a, int n, int parent)
{
    
    
	int child = parent * 2 + 1;
	while (child < n)
	{
    
    
		// 选出左右孩子中大的那一个
		if (child + 1 < n && a[child+1] > a[child])
		{
    
    
			++child;
		}

		if (a[child] > a[parent])
		{
    
    
			Swap(&a[child], &a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
    
    
			break;
		}
	}
}

3. Implementation of the heap

The heap is essentially an array, size refers to the number of data in the heap, and capacity refers to the right capacity.

typedef int HPDataType;
typedef struct Heap
{
    
    
	HPDataType* a;
	int size;
	int capacity;
}HP;

1. Heap initialization

We initialize the size of four bytes for the heap

void HeapInit(HP* php)
{
    
    
	assert(php);

	php->a = (HPDataType*)malloc(sizeof(HPDataType)*4);
	if (php->a == NULL)
	{
    
    
		perror("malloc fail");
		return;
	}

	php->size = 0;
	php->capacity = 4;
}

2. Heap insertion

The insertion of the heap is actually to insert a number into the array. When the number of elements in the array is equal to the capacity, we need to expand the capacity. In addition to the insertion of the expansion heap, an upward adjustment algorithm is also needed . Here [php->size - 1] refers to the last element of the array, which is the element we newly inserted. When we insert data into the heap, We need to judge whether the heap is still a large root heap or a small root heap, and we need to make an upward adjustment to the newly inserted number.

void HeapPush(HP* php, HPDataType x)
{
    
    
	assert(php);

	if (php->size == php->capacity)
	{
    
    
		HPDataType* tmp = (HPDataType*)realloc(php->a, sizeof(HPDataType) * php->capacity*2);
		if (tmp == NULL)
		{
    
    
			perror("realloc fail");
			return;
		}
		php->a = tmp;
		php->capacity *= 2;
	}

	php->a[php->size] = x;
	php->size++;

	AdjustUp(php->a, php->size - 1);
}

3. Heap deletion! !

If there is an insert operation, there must be a delete operation, but let's think about whether to delete the data at the top of the heap or the data at the end of the heap? Let's look at the code first.

void HeapPop(HP* php)
{
    
    
	assert(php);
	assert(!HeapEmpty(php));

	// 删除数据
	Swap(&php->a[0], &php->a[php->size - 1]);
	php->size--;

	AdjustDown(php->a, php->size, 0);
}

HPDataType HeapTop(HP* php)
{
    
    
	assert(php);
	return php->a[0];
}

It can be seen that we exchanged the data at the top and end of the heap, and then [php->size–] deleted the elements at the end of the heap, and then executed the downward adjustment algorithm [AdjustDown(php->a, php->size, 0) 】,Bundlepile topElements with subscript 0 are adjusted downwards.
insert image description here

5. Take the top element of the heap

[php->a[0]] is the first element of the heap, which is the top of the heap.

HPDataType HeapTop(HP* php)
{
    
    
	assert(php);
	return php->a[0];
}

6. Determine whether the heap is empty

When [php->size == 0] is established, the heap is empty.

bool HeapEmpty(HP* php)
{
    
    
	assert(php);
	return php->size == 0;
}

7. Return the number of data in the heap

int HeapSize(HP* php)
{
    
    
	assert(php);
	return php->size;
}

7. Destruction of the heap

void HeapDestroy(HP* php)
{
    
    
	assert(php);

	free(php->a);
	php->a = NULL;
	php->capacity = php->size = 0;
}

8. Heap construction

There are two ways to build a heap, one is to adjust the heap upwards, and the other is to adjust the heap downwards

First of all, let's look at the first type, which is to adjust the heap upwards. Here we insert a data and perform an upward adjustment algorithm on the inserted data.

/*建堆*/
void HeapInitArray(Hp* php, HpDataType* a, int n)
{
    
    
	assert(php);
	HeapInit(hp);
	for (int i = 0; i < n; ++i)
	{
    
    
		HeapPush(hp, a[i]);
	}
}

Next is the second type, which is to adjust the heap downwards. It can be seen that we are not inserting a number to make an adjustment like the previous method, but making an adjustment to the whole heap.

void HeapInitArray(HP* php, int* a, int n)
{
    
    
	assert(php);

	php->a = (HPDataType*)malloc(sizeof(HPDataType) * n);
	if (php->a == NULL)
	{
    
    
		perror("malloc fail");
		return;
	}

	php->size = n;
	php->capacity = n;

	// 建堆
	for (int i = ((n-1)-1)/2; i >= 0; --i)
	{
    
    
		AdjustDown(php->a, php->size, i);
	}
}

But be careful when using downward adjustmentsThe downward adjustment algorithm has a premise [the left and right subtrees must be a heap before they can be adjusted]
insert image description here

It can be seen that the left and right subtrees of 2 are not a heap, and the left and right subtrees of 1 and 5 are not a heap either. So we need to start adjusting from the penultimate layer, which islast non-leaf nodeStart to adjust, which is the position of 5. Then how do we find this node? We know the position of the last element, which is n-1. Then we know that the child node requires the parent node. The formula Parent = (Child - 1) / 2 we mentioned above ; The subscript of the last non-leaf node can be calculated as (n-1)-1/2 ;

4. The time complexity of the two adjustment algorithms

Here we do not do the analysis, but give the conclusion directly.
[Upward adjustment algorithm], its time complexity is O(NlogN);
[Downward adjustment algorithm], its time complexity is O(N);
therefore, the downward adjustment algorithm should be used first.

Five. Heap application

1. Heap sort

Heap sorting – time complexity: O(Nlog2N);
Sort ascending – build a large heap
Sort descending – build a small heap

// 排升序 -- 建大堆 -- O(N*logN)
void HeapSort(int* a, int n)
{
    
    
	// 建堆 -- 向上调整建堆 -- O(N*logN)
	/*for (int i = 1; i < n; ++i)
	{
		AdjustUp(a, i);
	}*/

	// 建堆 -- 向下调整建堆 -- O(N)
	for (int i = (n - 1 - 1) / 2; i >= 0; --i)
	{
    
    
		AdjustDown(a, n, i);
	}

	int end = n - 1;
	while (end > 0)
	{
    
    
		Swap(&a[end], &a[0]);
		AdjustDown(a, end, 0);

		--end;
	}
}

2. TOP-K problem

TOP-K problem: Find the top K largest elements or smallest elements in the data combination. Generally, the amount of data is relatively large.
For example: the top 10 professional players, the world's top 500, the rich list, the top 100 active players in the game, etc.

For the Top-K problem, the most simple and direct way that can be thought of is sorting, but: if the amount of data is very large, sorting is not advisable (maybe all the data cannot be loaded into memory at once)
. The best way is to use the heap to solve it. The basic idea is as follows:

  • Use the first K elements in the data set to build a heap
    . The first k largest elements will build a small heap.
    The first k smallest elements will build a large heap.

  • Use the remaining NK elements to compare with the top elements in turn, and replace the top elements if they are not satisfied.
    After comparing the remaining NK elements with the top elements in turn, the remaining K elements in the heap are the first K smallest elements sought. or the largest element.

void PrintTopK(const char* file, int k)
{
    
    
	// 1. 建堆--用a中前k个元素建小堆
	int* topk = (int*)malloc(sizeof(int) * k);
	assert(topk);

	FILE* fout = fopen(file, "r");
	if (fout == NULL)
	{
    
    
		perror("fopen error");
		return;
	}

	// 读出前k个数据建小堆
	for(int i = 0; i < k; ++i)
	{
    
    
		fscanf(fout, "%d", &topk[i]);
	}

	for (int i = (k-2)/2; i >= 0; --i)
	{
    
    
		AdjustDown(topk, k, i);
	}

	// 2. 将剩余n-k个元素依次与堆顶元素交换,不满则则替换
	int val = 0;
	int ret = fscanf(fout, "%d", &val);
	while (ret != EOF)
	{
    
    
		if (val > topk[0])
		{
    
    
			topk[0] = val;
			AdjustDown(topk, k, 0);
		}

		ret = fscanf(fout, "%d", &val);
	}

	for (int i = 0; i < k; i++)
	{
    
    
		printf("%d ", topk[i]);
	}
	printf("\n");

	free(topk);
	fclose(fout);
}

//随机生成10000000个数,存储到文件中
//求该数据中前K个数据。
void CreateNDate()
{
    
    
	// 造数据
	int n = 10000000;
	srand(time(0));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");
	if (fin == NULL)
	{
    
    
		perror("fopen error");
		return;
	}

	for (size_t i = 0; i < n; ++i)
	{
    
    
		int x = rand() % 10000;
		fprintf(fin, "%d\n", x);
	}

	fclose(fin);
}

Summarize

  • This time we have learned about the heap, which is essentially an array.
  • Then we learned [Upward Adjustment Algorithm] and [Downward Adjustment Algorithm], knowing their time complexity [Upward Adjustment Algorithm] its time complexity is O(NlogN), [Downward Adjustment Algorithm] its time complexity The complexity is O(N); so the downward adjustment algorithm should be used first.
  • Then we learned heap sorting, which is one of the eight sorting algorithms, and its time complexity is O(NlogN);
  • Finally, we learned the TOP-K problem-that is, to find the first K largest or smallest elements in the data combination.

The above is the whole content of this article. If you have any questions, please leave a message in the comment area. If you feel good, please leave your three links.

Guess you like

Origin blog.csdn.net/2301_77412625/article/details/129925668