[Sequence structure of binary tree: heap && heap sort && TopK]

Trying to improve yourself is always more meaningful than looking up to others


Table of contents

 

1 Sequential structure of binary tree

2 The concept and structure of heap

3 heap implementations

3.1 Heap Down Adjustment Algorithm

 3.2 Heap upward adjustment algorithm

 3.3 Heap insertion

 3.4 Heap deletion

3.5 Code implementation of the heap

4 stacks of applications

4.1 Heap sort

4.2 TOP-K problem

Summarize:


1 Sequential structure of binary tree

Ordinary binary trees are not suitable for storage in arrays, because there may be a lot of wasted space. The complete binary tree is more suitable for sequential structure storage. In reality, we usually store the heap ( a binary tree ) in an array of sequential structures. It should be noted that the heap here and the heap in the virtual process address space of the operating system are two different things. One is the data structure, and the other is the management in the operating system. A region of memory is segmented.


2 The concept and structure of heap

If there is a set of key codes K = { k0 , k1 , k2 , k3 ... , } , store all its elements in , and satisfy: Ki <= K2i+1 And Ki<=K2i+2 ( i = 0 , 1 , 2… ), it is called a small heap (or a large heap otherwise ) . The heap with the largest root node is called the largest heap or large root heap, and the heap with the smallest root node is called the smallest heap or small root heap.
Properties of the heap:
  • The value of a node in the heap is always not greater than or not less than the value of its parent node;
  • The heap is always a complete binary tree


3 heap implementations

3.1 Heap Down Adjustment Algorithm

Now we give an array, which is logically regarded as a complete binary tree. We can adjust it into a small heap through the downward adjustment algorithm starting from the root node. The downward adjustment algorithm has a premise: the left and right subtrees must be a heap to be adjusted.
int array [] = { 27 , 15 , 19 , 18 , 28 , 34 , 65 , 49 , 25 , 37 };

 Specific code:

void AdjustDown(int* a, int parent, int sz)
{
	assert(a);
	int child = parent * 2 + 1;
	while (child < sz)
	{
		if (child + 1 < sz && a[child + 1] > a[child])//建立小堆 a[child + 1] < a[child]
			child++;

		if (a[child] > a[parent])//建立小堆 <
		{
			Swap(&a[child], &a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
			break;
	}

}

The big pile is created here, and I have given comments in the small pile code.

 3.2 Heap upward adjustment algorithm

The upward adjustment algorithm of the heap is often matched with push. After pushing a piece of data, the data is adjusted upward, so as to ensure that the structure of the heap will not be damaged.

Specific legend:

Code:

void AdjustUp(int* a, int child)
{
	assert(a);
	int parent = (child - 1) / 2;
	while (child>0)//用parent>=0也行,只是这样的话就不是正常结束的了
	{
		if (a[child] > a[parent])//建小堆 <
		{
			Swap(&a[child], &a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
			break;
	}
}

 It is not difficult to find that the time complexity of a data adjustment up or down is logN.

 3.3 Heap Insertion

Specific legend:

 Code:

void HeapPush(Heap* php, HeapDataType x)
{
	assert(php);
	if (php->capacity == php->sz)
	{
		int newcapacity = php->a == NULL ? 4 : php->capacity * 2;
		HeapDataType* tmp = (HeapDataType*)realloc(php->a, sizeof(HeapDataType) * newcapacity);
		if (tmp == NULL)
		{
			perror("realloc fail:");
			exit(-1);
		}
		php->a = tmp;
		php->capacity = newcapacity;
	}

	php->a[php->sz] = x;
	php->sz++;
	//向上调整算法,保证建立的是堆(这里以建小堆为例)
	AdjustUp(php->a, php->sz - 1);//第二个参数传的是push这个数据的下标
}

 3.4 Heap deletion

Assuming that a small heap is built, the smallest value (the top of the heap) needs to be popped. If the structure below continues to maintain the small heap structure, the data cannot be moved forward by only one bit, otherwise the structure of the heap will be destroyed. The correct way is to exchange the data at the top of the heap with the last data, then re-build the heap downwards, and then pop the data at the end of the heap.

Code:

void HeapPop(Heap* php)
{
	assert(php);
	assert(php->sz > 0);
	//假设建小堆,要pop掉最小的一个数值(堆顶),要让下面的结构继续保持小堆结构就不能只将数据向前挪动一位,
	//否则堆的结构将会被破坏。正确做法是将堆顶的数据与最后一个数据交换,然后重新向下建堆,再pop掉堆尾数据。

	Swap(&php->a[0], &php->a[php->sz - 1]);
	php->sz--;
	AdjustDown(php->a, 0, php->sz);
}

3.5 Code implementation of the heap

#include<stdio.h>
#include<stdlib.h>
#include<stdbool.h>
#include<assert.h>

typedef int HeapDataType;
typedef struct Heap
{
	HeapDataType* a;
	int sz;
	int capacity;
}Heap;

void HeapInit(Heap* php);
void HeapPush(Heap* php, HeapDataType x);
void HeapPop(Heap* php);
HeapDataType HeapTop(Heap* php);
int HeapSize(Heap* php);
bool HeapEmpty(Heap* php);
void HeapDestroy(Heap* php);
void HeapPrint(Heap* php);
void AdjustDown(int* a, int parent, int sz);
void AdjustUp(int* a, int child);
void Swap(HeapDataType* p1, HeapDataType* p2);

void HeapInit(Heap* php)
{
	assert(php);
	php->a = NULL;
	php->capacity = php->sz = 0;
}


void Swap(HeapDataType* p1, HeapDataType* p2)
{
	HeapDataType tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}


void AdjustUp(int* a, int child)
{
	assert(a);
	int parent = (child - 1) / 2;
	while (child>0)//用parent>=0也行,只是这样的话就不是正常结束的了
	{
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
			break;
	}
}


void HeapPush(Heap* php, HeapDataType x)
{
	assert(php);
	if (php->capacity == php->sz)
	{
		int newcapacity = php->a == NULL ? 4 : php->capacity * 2;
		HeapDataType* tmp = (HeapDataType*)realloc(php->a, sizeof(HeapDataType) * newcapacity);
		if (tmp == NULL)
		{
			perror("realloc fail:");
			exit(-1);
		}
		php->a = tmp;
		php->capacity = newcapacity;
	}

	php->a[php->sz] = x;
	php->sz++;
	//向上调整算法,保证建立的是堆(这里以建小堆为例)
	AdjustUp(php->a, php->sz - 1);//第二个参数传的是push这个数据的下标
}


void AdjustDown(int* a, int parent, int sz)
{
	assert(a);
	int child = parent * 2 + 1;
	while (child < sz)
	{
		if (child + 1 < sz && a[child + 1] > a[child])
			child++;

		if (a[child] > a[parent])
		{
			Swap(&a[child], &a[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
			break;
	}

}


void HeapPop(Heap* php)
{
	assert(php);
	assert(php->sz > 0);
	//假设建小堆,要pop掉最小的一个数值(堆顶),要让下面的结构继续保持小堆结构就不能只将数据向前挪动一位,
	//否则堆的结构将会被破坏。正确做法是将堆顶的数据与最后一个数据交换,然后重新向下建堆,再pop掉堆尾数据。

	Swap(&php->a[0], &php->a[php->sz - 1]);
	php->sz--;
	AdjustDown(php->a, 0, php->sz);
}


HeapDataType HeapTop(Heap* php)
{
	assert(php);
	assert(php->sz > 0);

	return php->a[0];
	
}


int HeapSize(Heap* php)
{
	assert(php);
	return php->sz;
}


bool HeapEmpty(Heap* php)
{
	assert(php);
	return php->sz == 0;
}


void HeapDestroy(Heap* php)
{
	assert(php);
	free(php->a);
	php->capacity = php->sz = 0;
}


void HeapPrint(Heap* php)
{
	assert(php);
	for (int i = 0; i < php->sz; i++)
	{
		printf("%d ", php->a[i]);
	}
	printf("\n");
}


4 piles of applications

4.1 Heap sort

Here we think about a question: Does sorting build up a heap or build a heap down?

There is no proof, here we can calculate their respective time complexities through accurate calculations:

1 Build up the heap:

Here we all take the full binary tree as an example. The time complexity is only an approximate value, so the full binary tree can be used instead of the complete binary tree. (assuming the height of the number is h)

The first layer has 2^0 nodes, which need to be adjusted upwards 0 times;

The second layer has 2^1 nodes, which need to be adjusted upward once;

The third layer has 2^2 nodes, which need to be adjusted up 2 times;

………………………

There are 2^(h-2) nodes in the h-1 layer, and it needs to be adjusted up (h-2) times;

There are 2^(h-1) nodes in the h-th layer, which need to be adjusted up (h-1) times;

So you can get:

T(h)=2^1*1+2^2*2+……2^(h-2)*(h-2)+2^(h-1)*(h-1)

It is easy to calculate using the misplaced subtraction method:

T(h)=2^h*(h-2)+2;

Since h=logN (approximate value is fine, not too precise)

Therefore, the time complexity of finding the upward heap is about:

The order of magnitude of T(N)=N*logN  .

2 Build the heap downwards:

I calculated this calculation when I was talking about heap sorting. You can jump to heap sorting:

Insertion and selection sorting of the eight major sorts

Through calculation, we can know that the time complexity of building down the heap is about:

T(N)=N  order of magnitude.

So we choose to build the heap downward.

Then the second question comes: should we build a large heap or a small heap in ascending order?

If you build a small heap, the minimum number has been selected, but you cannot pop the minimum number, otherwise the heap structure will be broken, and then you have to rebuild the heap, which will be inefficient, so we need to build a large heap, and we will The top element of the heap is exchanged with the last element -- the number of data, and then adjusted downwards.

Specific code:

void HeapSort(HeapDataType* a, int sz)
{
	//从最后一个结点的父亲开始建堆
	for (int i = (sz - 1 - 1) / 2; i >= 0; i--)
	{
		AdjustDown(a, i, sz);
	}

	for (int i = sz-1; i>0; i--)
	{
		Swap(&a[0], &a[i]);
		AdjustDown(a, 0, --sz);
	}
}

4.2 TOP-K problem

TOP-K problem: Find the top K largest elements or smallest elements in the data combination. Generally, the amount of data is relatively large .

For example: the top 10 professional players, the world's top 500 , the rich list, the top 100 active players in the game, etc.
For the Top-K problem, the most simple and direct way that can be thought of is sorting, but: if the amount of data is very large, sorting is not advisable ( it may not be possible to load all the data into memory at once ) . The best way is to use the heap to solve it. The basic idea is as follows:
1. Use the first K elements in the data set to build a heap:
For the first k largest elements, build a small heap
For the first k smallest elements, build a large heap
2. Use the remaining NK elements to compare with the top elements in turn, and replace the top elements if they are not satisfied:
After comparing the remaining NK elements with the top elements of the heap in turn, the remaining K elements in the heap are the first K smallest or largest elements sought.

Specific code:

    //建立一个k个数的小堆,依次遍历数组,比堆顶元素大就替换,然后向下调整,最后堆中数据就是topk
	//时间复杂度为:N+N*logk  空间复杂度为O(k)
	int topk[5] = {0};
	int i;
	for (i = 0; i < 5; i++)
	{
		topk[i] = array[i];
	}
	//建小堆
	for (i = (5 - 1 - 1) / 2; i >= 0; i--)
	{
		AdjustDown(topk, i, 5);
	}
	//遍历替换
	for (i=5; i < sz; i++)
	{
		if (array[i] > topk[0])
		{
			topk[0] = array[i];
			AdjustDown(topk, 0, 5);
		}
	}

	for (i = 0; i < 5; i++)
		printf("%d ", topk[i]);
	//这种方法占据内存较小,比较优秀

Summarize:

In the article, we introduced the binary tree sequence structure of the heap, realized the heap and introduced two important applications of the heap (heap sorting and TopK problems). The more important thing here is the upward/downward adjustment algorithm. We will explain the chained binary tree and related OJ in the next article. Big guys, see you next time!

 

Guess you like

Origin blog.csdn.net/m0_68872612/article/details/127957682