【Data structure】Push heap implementation, punch heap sorting, step on Top-k

1. The sequential structure of a complete binary tree

The logical structure of the heap adopts a complete binary tree , and the heap is to store the complete binary tree in an array under certain conditions . We can use the complete binary tree to better learn the heap.

Ordinary binary trees are not suitable for storage in arrays, because there may be a lot of wasted space. The complete binary tree is more suitable for storage using sequential structures (arrays).
insert image description here

As shown in the figure above , the array structure of the complete binary tree (stored in order from top to bottom and from left to right) is called a heap (the array in the figure above satisfies the conditions, so it can be called a heap, and the conditions will be described below)

  • It should be noted that the heap here and the heap in the virtual process address space of the operating system are two different things, one is a data structure, and the other is a segment of an area that manages memory in the operating system.

2. The concept and structure of the heap

Does that mean that as long as it is a complete binary tree, putting it in an array in order is a heap? of course not

There are two types of heaps:

  1. The value of a node in a heap is always not greater than the value of its parent node is called: large heap
  2. The value of a node in a heap is always not less than the value of its parent node is called: small heap

insert image description here
We derive the properties of the heap:

  • The value of a node in the heap is always not greater than or not less than the value of its parent node;
  • The heap is always a complete binary tree. (A complete binary tree can be logically formed)

Only arrays that satisfy these two points can be called heaps

3. Implementation of the heap

Now given an array, it is logically regarded as a complete binary tree. We already know that the heap is divided into two types: large heap and small heap, so how can we turn it into a complete heap?

int arr[8] = {
    
     7,4,2,9,3,4 };

insert image description here

  • The difficulty of the data structure of the heap lies in the creation, insertion and deletion of the heap, so these three functions will be discussed together, and the remaining functions will be explained last.

Here you need to learn a method to complete the construction of the heap: heap downward adjustment

1. Heap adjustment down

According to the logical structure of a complete binary tree, starting from the parent node of the last node, starting from the subscript of the node, comparing the size with its own two child nodes (maybe a child node) and exchanging positions according to the situation (big heap comparison bigger child nodes, small heaps compared to smaller child nodes), all the way down to the root node, and you end up with big heaps/small heaps

  • The premise of the downward adjustment algorithm: the left and right subtrees of the root node must be heaps before adjustment can be made. Otherwise, in an already disordered situation, it is impossible to find a suitable element by using the value of the root node to compare and search. Therefore, the downward adjustment can only be compared from the parent node of the last node (it can also be said to be the first non-leaf node from the bottom) to ensure that the root node

Let's make a big pile of these and draw a flowchart of it:
insert image description here

  • Compare downwards from the last parent node, build a large heap, move the maximum value to the top, and switch the array as shown above

insert image description here

  • Then use the next parent node to compare with the maximum value in its child node, and exchange with the maximum value

insert image description here

  • After comparing and exchanging the maximum value of the next parent node with its child node, check whether the child node after exchange is the maximum value of its child node, if not, continue to adjust downward, otherwise, adjust to the root node and build a heap Finish.

As we mentioned before, the storage form of this heap is an array, and the last node is the element with the largest subscript in the array, so how do we find its parent node?

This is a fixed formula, according to this formula:
When we know the subscript of the child node, we can find the subscript of its parent node, and if we know the subscript of the parent node, we can find the subscript of the child node.

The subscript is as follows:

insert image description here

  • Know the subscript of the father node, find the subscript of the child node
    father = 0:
    left child = father * 2 + 1 = 1;
    right child = father * 2 + 2 = 2;
    father = 3:
    left child = father * 2 + 1 = 7;
    Right child = father * 2 + 2 = 8;
    Knowing the father node, the formula for finding the child node is as follows:
    Left child = father * 2 + 1;
    Right child = father * 2 + 2;

  • Know the subscript of the child node, find the subscript of the father node
    Know the subscript of the left child is
    derived from the above formula: father = (left child-1)/2;
    know the subscript of the right child
    is derived from the above formula: father = (right child- 2)/2;
    Among them, the left child must be an odd number, and the right child must be an even number. In C language, dividing an even number by 2 after subtracting 1 is the same as dividing an even number by 2 after subtracting 2, because the subscript is an integer , the even number minus 1 and divided by 2 must be an integer, and the indivisible part is discarded directly.
    Therefore, we don't need to know whether the child node is a left child or a right child.
    Knowing the subscript of the child node, the formula for finding the subscript of the father node is as follows:
    father = (child-1)/2;

2. Adjust the build pile downward

We already know how a heap is created, now we have the following array, we use the code to create the heap.

int arr[8] = {
    
     7,4,2,9,3,4,5,6 };

code show as below:

void swap(int* a1, int* a2)
{
    
    
	int tmp = *a1;
	*a1 = *a2;
	*a2 = tmp;
}

void AdjustDown(int* arr, int n, int father)//向下调整
{
    
    
	int child = father * 2 + 1;//左孩子节点下标

	while (child < n) //向下调整,child只会越来越大,当超出数组范围时,退出循环
	{
    
    
		if (child+1 < n && arr[child] < arr[child + 1])//判断是否存在右孩子节点,存在左右孩子结点谁大,要是建小堆,取小的那个元素
			child = child + 1;
		if (arr[father] < arr[child])//判断父亲结点和孩子结点那个大,若孩子结点大,则进行交换,并继续向下调整,建小堆取小值即可
		{
    
    
			swap(&arr[father], &arr[child]);//两节点进行交换
			father = child;//孩子结点的下标赋给父亲结点
			child = father * 2 + 1;//孩子节点获得新的左孩子下标
		}
		else
			break;
	}
}

int CreatHeap(int* arr, int n)//arr为要建堆的数组
{
    
    
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)//循环的初始值为倒数第一个非叶子节点的下标
	{
    
    
		AdjustDown(arr, n, i);//向下调整
	}
}

3. Adjust the time complexity of heap building downward

The time complexity is to find the number of basic operations of the function in the worst case , and we are now calculating the time complexity of the function of building a heap, because the heap is a complete binary tree, and a full binary tree (except leaf nodes, each Each node has two child nodes) is also a complete binary tree, here we can use a full binary tree to judge the time complexity of the heap.

insert image description here
Because we need to consider the worst case, we need to move all the nodes in the heap except the last layer down to the bottom layer.
insert image description here
So: the time complexity of building a heap is O(N)

4. Insertion of the heap (adjusted upwards)

As shown in the figure below, on the basis of the original complete binary tree, we insert a 10 into it at the end, and then need to make upward adjustments , as shown in the figure below:
insert image description here

The heap is a small heap. First find the parent node of the newly inserted node and compare it. If the parent node is large, replace it. The newly inserted node replaces the parent node, and compares it to the new parent node until it reaches the root node. If the parent node is smaller than the newly inserted node, stop the operation, and the insertion of the surface heap is completed.

Note: It is best not to use downward adjustment here. The heap insertion operation is performed when a heap has already been formed. If downward adjustment is used, all nodes after the parent node of the newly inserted node must be traversed again. Time The complexity must be greater than an upward adjustment that just moves a new node.

Knowing the child node, how to find the formula of the parent node has been mentioned before, here we can directly get the code (here is the insertion of a small heap):

Adjust the code upwards as follows:

void AdjustUp(int* arr, int child)
{
    
    
	int father = (child - 1) / 2;//获得父亲节点的下标

	while (child)
	{
    
    
		if (arr[father] > arr[child])//孩子结点数据小于父亲结点数据,两数交换
		{
    
    
			swap(&arr[father], &arr[child]);
			child = father;
			father = (child - 1) / 2;
		}
		else
			break;
	}
}
  • Upward adjustment can also be used to build the heap, but the time complexity is greater than that of downward adjustment. There is a complete code in the code implementation of the heap below.

5. Adjust the pile up

We have learned the idea of ​​upward adjustment of the heap in the insertion of the heap. We can also use the upward adjustment to build the heap, but the time complexity of this is greater than that of the downward adjustment. This method will not be used in actual operation to build the heap. Here Brief introduction.
insert image description here

  • Use the array in the picture above to build a large heap

insert image description here

  • To adjust the heap upwards, you need to insert the arrays into the heap one by one for comparison, and if there is data larger than the parent node, adjust upwards until all the arrays enter the heap.

insert image description here

  • If it is found that the inserted data is larger than the parent node, it will be adjusted upward until it encounters a larger one or reaches the root node to stop.
    insert image description here
    code show as below:
void AdjustUp(int* arr, int child)
{
    
    
	int father = (child - 1) / 2;//获得父亲节点的下标

	while (child)
	{
    
    
		if (arr[father] > arr[child])//孩子结点数据小于父亲结点数据,两数交换
		{
    
    
			swap(&arr[father], &arr[child]);
			child = father;
			father = (child - 1) / 2;
		}
		else
			break;
	}
}
void CreatHeap(int* a,int n)
{
    
    
	for (int i = 0; i < n; i++)
	{
    
    
		AdjustUp(a, i);
	}
}

6. Adjust upward the time complexity of heap building

Calculate the time complexity in the worst case, which is different from downward adjustment. Assuming that the height is h, each node at the bottom layer must be moved up h-1 times, and the remaining nodes are deduced according to the number of layers.

insert image description here
Adding these number of moves is:
insert image description here
Therefore: the time complexity of building a heap is O(Nlog(N))

The time complexity of adjusting the heap upward is greater than that of adjusting the heap downward

7. Heap deletion

Heap deletion is to delete the data at the top of the heap.

Specific operation: exchange the data at the top of the heap with the data at the last node, then delete the last data in the array, and adjust the data at the top of the heap downward, as shown in the figure below.

insert image description here

8. Heap code implementation

From the above functions, such as heap insertion and deletion, this is to directly expand the array and other related operations. If it is only performed on the array, it will cause a lot of trouble, so we need to create a structure to store the heap. , and the deletion of the heap also uses the content related to the structure. The structure is defined as follows:

typedef int HPDataType;
typedef struct Heap
{
    
    
HPDataType* _a;
int _size;
int _capacity;
}Heap;

It is not difficult to implement other functions of the heap, so here we only show the code as a whole, not explaining each one.
Note: Some types in the following code are exchanged with custom structure types, which is the same as the code implemented above
. All function codes of the heap are as follows;

//打印堆
void PrintHeap(Heap* hp)
{
    
    
	assert(hp);
	for (int i = 0; i < hp->size; i++)
	{
    
    
		printf("%d ", hp->a[i]);
	}
	printf("\n");
}

void HeapInit(Heap* hp)
{
    
    
	hp->a = NULL;
	hp->capacity = 0;
	hp->size = 0;
}

// 堆的销毁
void HeapDestory(Heap* hp)
{
    
    
	assert(hp);
	free(hp->a);
	hp->a = NULL;
	hp->capacity = 0;
	hp->size = 0;
}

堆的构建——偷懒的写法——使用向上调整实现堆的构建
//void HeapCreat(Heap* hp, HPDataType* a, int n)
//{
    
    
//	hp->a = (HPDataType*)malloc(sizeof(HPDataType) * n);
//	if (!hp->a)
//	{
    
    
//		perror("malloc fail!");
//		exit(-1);
//	}
//	hp->capacity = n;
//
//	for (int i = 0; i < n; i++)
//	{
    
    
//		HeapPush(hp, a[i]);
//	}
//}

//堆的构建——建堆算法
void HeapCreat(Heap* hp, HPDataType* a, int n)
{
    
    
	hp->a = (HPDataType*)malloc(sizeof(HPDataType) * n);
	if (!hp->a)
	{
    
    
		perror("malloc fail!");
		exit(-1);
	}

	memcpy(hp->a, a, sizeof(HPDataType) * n);
	hp->capacity = hp->size = n;

	//建堆算法——函数接收父亲结点算法
	for (int i = (n - 1 - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(hp->a, n, i);
	}
}

//两数交换
void swap(HPDataType* a1, HPDataType* a2)
{
    
    
	HPDataType tmp = *a1;
	*a1 = *a2;
	*a2 = tmp;
}

//向上调整
void AdjustUp(HPDataType* arr, int child)
{
    
    
	int father = (child - 1) / 2;//父亲结点位置

	while (child)
	{
    
    
		if (arr[child] > arr[father])//孩子结点数据大于父亲结点数据,两数交换
		{
    
    
			swap(&arr[child], &arr[father]);
			child = father;
			father = (child - 1) / 2;
		}
		else
			break;
	}
}

// 堆的插入
void HeapPush(Heap* hp, HPDataType x)
{
    
    
	assert(hp);
	if (hp->capacity == hp->size)
	{
    
    
		int newSize = hp->capacity * 2;
		HPDataType* newArr = (HPDataType*)realloc(hp->a, newSize);
		if (!newArr)
		{
    
    
			perror("realloc fail!");
			exit(-1);
		}
		hp->capacity = newSize;
		hp->a = newArr;
	}

	hp->a[hp->size++] = x;

	AdjustUp(hp->a, 0);
}

//向下调整
void AdjustDown(HPDataType* arr,int n,int parent)
{
    
    
	int child = parent * 2 + 1;

	while (child < n)
	{
    
    
		if (child + 1 < n && arr[child] > arr[child + 1])
			child = child + 1;

		if (arr[parent] > arr[child])
		{
    
    
			swap(&arr[parent], &arr[child]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
			break;
	}
}

// 堆的删除
void HeapPop(Heap* hp)
{
    
    
	assert(hp);
	assert(hp->size);

	hp->a[0] = hp->a[--hp->size];

	AdjustDown(hp->a, hp->size, 0);
}

//取堆顶的数据
HPDataType HeapTop(Heap* hp)
{
    
    
	assert(hp);
	assert(hp->size > 0);

	return hp->a[0];
}

//堆的数据的个数
int HeapSize(Heap* hp)
{
    
    
	assert(hp);

	return hp->size;
}

//判空
int HeapEmpty(Heap* hp)
{
    
    
	assert(hp);

	return hp->size == 0;
}

4. Top-K questions

Question: In a large data, take out its largest or smallest top k values.

For example: the top 10 professional players, the world's top 500, the rich list, the top 100 active players in the game, etc.

For this problem, the easiest way to think of it is sorting, but the amount of data is very large. When the data cannot be loaded into the memory at once, sorting is not advisable.

The best way is to use the heap to solve it. The basic idea is as follows:

1. Use the first k elements in the data set to build a heap

  • Take the top k largest elements and build a small heap
  • Take the first k smallest elements and build a large heap

2. Use the remaining Nk elements to compare with the top elements in turn, and replace the top elements if they are not satisfied, and make downward adjustments.

After comparing the remaining Nk elements with the top elements of the heap in turn, the remaining K elements in the heap are the first K smallest or largest elements sought.

The following figure is a comparison chart of the first 6 largest elements:

insert image description here

The specific code is as follows:

void TopK(int* arr,int n,int k)
{
    
    
	int* tmpArr = (int*)malloc(sizeof(int) * k);
	
	//方法1
	//memcop(tmpArr, arr, sizeof(int) * k);//使用内存函数拷贝数组前k个数

	//方法2
	for (int i = 0; i < k; i++)
	{
    
    
		tmpArr[i] = arr[i];
	}

	for (int i = (k - 1 - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(tmpArr, k, i);//向下调整建堆
	}
	for (int i = k; i < n; i++)
	{
    
    
		if (arr[i] > tmpArr[0])
		{
    
    
			swap(&arr[i], &tmpArr[0]);
		    AdjustDown(tmpArr, k, 0);//将此时栈顶的元素进行向下调整
		}
	}
	for (int i = 0; i < k; i++)
	{
    
    
		printf("%d ", tmpArr[i]);//打印出这k个数,但此时这k个数是无序的
	}
}

Five. Heap sort

When we have mastered the creation of the above heap, downward adjustment, and the conversion of the parent and child node subscripts, the implementation of heap sorting is very simple

Let me talk about the idea first:

When we want to get a sorted array, we need to build it into a heap first.

  • Ascending: large pile
  • Descending order: small heap Suppose we want to descend an array of
    size n
    . First, the top of the built small heap is the smallest number. We exchange it with the nth number, and then adjust the size of the heap to ( n-1), adjust the top of the heap downward.
    Secondly, at this time, the number at the top of the heap is the next smallest number. We are exchanging it with the n-1th number, adjusting the size of the heap to (n-2), and adjusting the top of the heap downward.
    In this way, the final n numbers are sorted in descending order

The following figure is a partial cycle diagram of heap sorting:

insert image description here
The remaining steps are the same as the above steps and are not shown here.

According to this idea, the implementation code of heap sorting is as follows:

void AdjustDown(int* arr, int n, int father)
{
    
    
	int child = father * 2 + 1;//左孩子节点下标

	while (child < n)
	{
    
    
		if (child+1 < n && arr[child] > arr[child + 1])//判断是否存在右孩子节点,存在左右孩子结点谁大
			child = child + 1;
		if (arr[father] > arr[child])//判断父亲结点和孩子结点那个大,若孩子结点大,则进行交换,并继续向下调整
		{
    
    
			swap(&arr[father], &arr[child]);//两节点进行交换
			father = child;//孩子结点的下标赋给父亲结点
			child = father * 2 + 1;//孩子节点获得新的左孩子下标
		}
		else
			break;
	}
}

int* HeapSort(int* arr,int n)
{
    
    
	for (int i = (n-1-1)/2; i >= 0; i--)//先建堆
	{
    
    
		AdjustDown(arr, n, i);
	}

	for (int i = 0; i < n; i++)//向下调整堆排序
	{
    
    
		swap(&arr[i], &arr[n - i - 1]);//堆顶元素与最后一个元素交换,将最大值或最小值放在最后
		AdjustDown(arr, n - i - 1, 0);
	}

	return arr;
}

Guess you like

Origin blog.csdn.net/m0_52094687/article/details/128425725