Data structure - basic concepts of binary trees and sequential storage (heap)

fe594ea5bf754ddbb223a54d8fb1e7bc.gif

Table of contents

I. Introduction

2. Tree concept and structure

2.1 The concept of tree

2.2 Related concepts of trees

2.3 Tree performance

2.4 Practical application of trees (representing the directory tree structure of the file system)

3. Concept and structure of binary tree

3.1 Concept

3.2 Special binary trees

3.3 Properties of binary trees

3.4 Storage structure of binary tree

3.4.1 Sequential storage

3.4.2 Chain storage

4. Binary tree sequence structure and implementation

4.1 Sequential structure of binary tree

4.2 Concept and structure of heap

4.3 Implementation of heap

4.3.1 Heap downward adjustment algorithm (omitted)

4.3.2 Creation of heap (omitted)

4.3.3 Time complexity of heap construction (omitted)

4.3.4 Heap insertion (omitted)

4.3.5 Heap deletion (omitted)

4.3.6 Implementation of heap code (details)

 4.3.6.1Initialization function

4.3.6.2 Destruction function

4.3.6.3 Inserting functions

4.3.6.4 Upward adjustment function

4.3.6.5 Exchange function

 4.3.6.6Print function

4.3.6.6 Delete function

4.3.6.7 Adjust function downwards

4.3.6.8 Obtain root value function

4.3.6.9 Null function

4.3.6.10 Heap sort 

4.3.6.11 Adjust downward to build heap

4.3.6.12 Proof of complexity of building heap 

4.3.6.13 Small exercises (optional)

5. All codes

6. Conclusion


8fb442646f144d8daecdd2b61ec78ecd.pngI. Introduction

Friendly reminder: The concepts covered in this article are quite deep. If anyone understands the basic concepts of binary trees and wants to see the core code implementation, you can directly search the directory and move to 4. Binary tree sequence structure and implementation fragments to start reading. Coding is not easy, I hope everyone will support me! (Three consecutive + follow, you are my god!)

2. Tree concept and structure

2.1 The concept of tree

A tree is a non-linear data structure. It is a collection of hierarchical relationships composed of n (n>=0) limited nodes. It is called a tree because it looks like an upside-down tree, that is to say It has the roots facing up and the leaves facing down.

  • There is a special node called the root node, which has no predecessor node.
  • Except for the root node, the remaining nodes are divided into m (m>0) disjoint sets T1, T2,..., Tm, where each set Ti (1<=i<=m) is another A subtree with a similar structure to a tree. The root node of each subtree has one and only one predecessor, and can have 0 or more successors.
  • Therefore, the tree is defined recursively.

Note: In the tree structure, there cannot be intersection between subtrees, otherwise it will not be a tree structure.

2.2 Related concepts of trees

Here are some terms about trees. If you have any questions, you can read this analysis.

Degree of a node : The number of subtrees a node contains is called the degree of the node; as shown in the figure above: A’s is 6

Leaf node or terminal node : A node with degree 0 is called a leaf node; as shown in the figure above: B, C, H, I... and other nodes are leaf nodes.

Non-terminal node or branch node : a node with a degree other than 0; as shown in the figure above: nodes such as D, E, F, G... are branch nodes

Parent node or parent node : If a node contains child nodes, this node is called the parent node of its child node; as shown in the figure above: A is the parent node of B

Child node or child node : The root node of the subtree contained by a node is called the child node of the node; as shown above: B is the child node of A

Brother nodes : Nodes with the same parent node are called brother nodes; as shown in the figure above: B and C are brother nodes.

Degree of tree : In a tree, the degree of the largest node is called the degree of the tree; as shown in the figure above: the degree of the tree is 6

The level of the node : starting from the definition of the root, the root is the first level, the child nodes of the root are the second level, and so on.

The height or depth of the tree : the maximum level of nodes in the tree; as shown above: the height of the tree is 4

Cousin nodes : Nodes whose parents are on the same level are cousins ​​of each other; as shown in the picture above: H and I are brother nodes of each other.

Ancestors of a node : all nodes on the branches from the root to the node; as shown above: A is the ancestor of all nodes

Descendants : Any node in the subtree rooted at a node is called a descendant of that node. As shown above: all nodes are descendants of A

Forest : A collection of m (m>0) disjoint trees is called a forest.

2.3 Tree performance

Left child right brother notation : When A gives birth to 3 children: BCD , always point to the leftmost child ( B ), and then let child B take care of child C , and child C take care of child D.

A only gave birth to 3 children, so the last child D points to the sky , and A has no brothers . Then according to the fingering of the left child's right brother, the right side of A also points to the sky.

Finally, looking at EF , B has two children, E and F. According to the right child, it points to the rightmost child E , and E has a brother F. According to the right brother E points to F, and finally F points to nothing .

So in the end, how do we determine whether a node is a leaf? Just check whether (firstchild) is empty.

The tree structure is more complicated than a linear table, and it is more troublesome to store and represent it. It is necessary to save both the value range and the relationship between nodes . In fact, there are many ways to represent trees, such as: parent representation, Child representation, child parent representation and child brother representation, etc. Here we will briefly take a look at the most commonly used child brother representations. 

//树的最优设计
struct  TreeNode
{
	int val;                     //结点中的数据域
	struct TreeNode* firstchild; //第一个孩子结点
	struct TreeNode* nextbrother;//指向其下一个兄弟结点
};

Demonstration diagram of children’s brotherhood method: 

Expansion: Demonstration diagram of parent representation 

  • Determine how many trees there are - ( see how many -1 ). Because A and B cannot find the subscript of their father, they give the value -1.
  • Determine whether two nodes are in the same tree ( see whether root one is different ). It is judged by comparing whether the subscript of the father is the same.

Note: The chain structure looks at the pointer, and the subscript looks at the array.

2.4 Practical application of trees (representing the directory tree structure of the file system)

3. Concept and structure of binary tree

3.1 Concept

A binary tree is a finite set of nodes, which is:

  • or empty
  • It consists of a root node plus two binary trees, also known as the left subtree and the right subtree.

As can be seen from the picture above:

  • There is no node with degree greater than 2 in a binary tree
  • The subtrees of a binary tree can be divided into left and right subtrees, and the order cannot be reversed, so the binary tree is an ordered tree.

Note: Any binary tree is composed of the following situations:

3.2 Special binary trees

  • Full binary tree : A binary tree. If the number of nodes in each layer reaches the maximum , then the binary tree is a full binary tree. That is to say, if the number of levels of a binary tree is K and the total number of nodes is 2^k-1, then it is a full binary tree.
  • Complete binary tree : A complete binary tree is a very efficient data structure. A complete binary tree is derived from a full binary tree. For a binary tree with n nodes of depth K, it is called a complete binary tree if and only if each node corresponds to the nodes numbered from 1 to n in the full binary tree of depth K. It should be noted that a full binary tree is a special kind of complete binary tree.

Suppose its height is h and each layer is full.

Assume its height is h, the first h-1 layer is full, the last layer is not necessarily full , and it is continuous from left to right .

For example, if we move a node and the last level is no longer continuous , then it is not called a complete binary tree. 

Let's take a look at the number of nodes in a binary tree with height h~

We can also use the nodes to infer the height h

As for the node range of a complete binary tree with height h, the maximum number of nodes is the same as that of a full binary tree ( 2^h-1 )

What about the minimum node ? We can split it into a full binary tree with a height of h-1 , and the node is ( 2^(h-1)-1 ). Finally, adding a node to the h layer is the minimum node ( 2^(h-1)

3.3 Properties of binary trees

  • If the number of levels of the root node is specified to be 1, then there are at most 2^(i-1) nodes on the i-th level of a non-empty binary tree.
  • If the number of levels of the root node is specified to be 1, then the maximum number of nodes of a binary tree with depth h is 2^h-1
  • For any binary tree, if the degree is 0, the number of leaf nodes is n0, and the number of branch nodes with degree 2 is n2, then n0=n2+1
  • If the number of levels of the root node is specified to be 1, the depth of a full binary tree with n nodes,h=\log_2(n+1)
  • For a complete binary tree with n nodes, if all nodes are numbered starting from 0 in numerical order from top to bottom, left to right, then for the node with serial number i:
  1. If i>0, the parent number of the node at position i: (i-1)/2; i=0; i is the number of the root node, and there is no parent node.
  2. If 2i+1<n, left child number: 2i+1,  2i+1>=n otherwise there is no left child
  3. If 2i+2<n, the right child number is: 2i+2, 2i+2>=n otherwise there is no right child

Here are some practice questions~ 

First question: Leaf nodes are points with degree 0. According to the formula n0=n2+1 , there are 200, so choose A.

Second question: Non-complete binary trees are not suitable and will empty array positions. So choose A

Third question:

N1 can only be 1, because n can only be an integer, so A is chosen.

Question 4:

We can judge the value of h based on the range of the number of complete binary tree nodes (substituting h to see if it exceeds the range). So choose B

Fifth question:

The same characteristics as the third question, except that 767 is an odd number , so N1 can only take 0. Therefore, choose B

3.4 Storage structure of binary tree

Binary trees can generally be stored using two structures, a sequential structure and a chain structure.

3.4.1 Sequential storage

Sequential structure storage is to use arrays for storage . Generally, arrays are only suitable for representing complete binary trees , because if they are not complete binary trees, there will be a waste of space. In reality, only heaps use arrays for storage. Binary tree sequential storage is physically an array and logically a binary tree.

The left child is represented by its father's subscript *2+1, and the right child is represented by its father's subscript *2+2. In turn, the coordinates of the child's father can also be inferred - parent = (child-1)/2

Note: The right children are all in even numbers, but the formula of 6-1=5, 5/2 or 2 can still be used

You can find the father or child at any position by subscripting

If it is an incomplete binary tree, is this storage structure suitable for use? ——Not suitable for using such a structure (array storage)

Full binary trees or complete binary trees are suitable, and incomplete binary trees are suitable for storage in a chain structure.

3.4.2 Chain storage

The linked storage structure of a binary tree means that a linked list is used to represent a binary tree, that is, a chain is used to indicate the logical relationship of elements. The usual method is that each node in the linked list consists of three fields, the data field and the left and right pointer fields. The left and right pointers are used to give the storage addresses of the link points where the left child and right child of the node are respectively. Chain structures are divided into binary chains and trifurcated chains. Currently, we generally study binary chains.

4. Binary tree sequence structure and implementation

4.1 Sequential structure of binary tree

Ordinary binary trees are not suitable for storage in arrays because there may be a lot of wasted space. A complete binary tree is more suitable for sequential structure storage. In reality, we usually store the heap (a binary tree) using an array of sequential structure. It should be noted that the heap here and the heap in the virtual process address space of the operating system are two different things. One is a data structure and the other is managed in the operating system. An area of ​​memory is segmented.

4.2 Concept and structure of heap

If there is a set of key codes k={k0,k1,k2,...k(n-1)}, store all its elements in a one-dimensional array in the order of a complete binary tree, and satisfy: ki<=k()2i+1) and ki<=k(2i+2)    ( ki>=k(2i+1) and ki>=k(2i+2) ) i=0,1,2.. ., is called a small heap (or large heap ). The heap with the largest root node is called the maximum heap or large root heap, and the heap with the smallest root node is called the minimum heap or small root heap.

Properties of heap:

  • The value of a node in the heap is always no greater than or no less than the value of its parent node;
  • The heap is always a complete binary tree.

Note: I am talking about numerical values ​​rather than subscripts.

Let’s use exercises to understand the rules~

Option A: We first represent the heap according to the definition of a complete binary tree ( note the continuity ), and later find out that it is a large heap ( any father in the tree >= child )

Option D: We will find that there is a small pile from the top and a large pile from the bottom. This is obviously contradictory, so it is judged as wrong.

Bottom layer:

  • physical structure, array
  • Logical structure, complete binary tree

Question: If it is a small heap, is the underlying array in ascending order? ——Not necessarily~

But what is certain is that the root of the small pile is the minimum value of the entire tree.

Use of heap:

  • topk problem
  • Heap sort (time complexity - O(N*logN)

——————————————————————————————————The above are all concepts.

4.3 Implementation of heap

4.3.1 Heap downward adjustment algorithm (omitted)

4.3.2 Creation of heap (omitted)

4.3.3 Time complexity of heap construction (omitted)

4.3.4 Heap insertion (omitted)

4.3.5 Heap deletion (omitted)

4.3.6 Implementation of heap code (details)

 4.3.6.1Initialization function
//初始化函数
void HeapInit(HP* php)
{
	assert(php);

	php->a = NULL;
	php->size = 0;
	php->capacity = 0;

}

This is our old friend. I write almost every article haha~ 

Another initialization function

//另一种初始化函数
void HeapInitArray(HP* php, int* a, int n)
{
	assert(php);
	assert(a);
	php->a = (HPDataType*)malloc(sizeof(HPDataType) * n);
	if (php->a == NULL)
	{
		perror("malloc fail");
		exit(-1);
	}
	php->size = 0;
	php->capacity = 0;
	memcpy(php->a, a, sizeof(HPDataType) * n);
	for (int i = 1; i < n; i++)
	{
		AdjustUp(php->a, i);
	}
}

Give you an array directly, and then insert the array directly into the heap. The original initialization did not give a value, and then slowly inserted it into the heap one by one. 

4.3.6.2 Destruction function
//销毁函数
void HeapDestroy(HP* php)
{
	assert(php);
	free(php->a);
	php->a = NULL;
	php->size = php->capacity = 0;
}
4.3.6.3 Inserting functions

  Assume that the number we insert is 90 , which can be inserted directly into the small heap without making other changes.

So what if we insert 50 ? This will destroy the structure of the small heap , so we need to adjust the insertion position. According to the rules of the small pile, it is impossible for us to adjust downwards, so we have to find the nodes above ~

So now we have to find the father position of the node , and use the formula (child-1)/2 to deduce that the father is 56 with subscript 2 .

After finding it, if the node is larger than its father, it will not move . If it is smaller, then its position will have to be swapped . It is equivalent to letting 50 be the father and 56 be the son, so that the small heap structure can be maintained.

Of course, this is not the end. After we exchange, we have to compare it with the root of 10. Only after completing the above operations can we achieve the structure of the small heap.

The worst case scenario is that it ends when you reach the root.

//插入函数
void HeapPush(HP* php, HPDataType x)
{
	assert(php);
	//扩容
	if (php->size == php->capacity)
	{
		int newCapacity = php->capacity == 0 ? 4 : php->capacity * 2;
		HPDataType* tmp = (HPDataType*)realloc(php->a, sizeof(HPDataType) * newCapacity);
		if (tmp == NULL)
		{
			perror("realloc fail");
			exit(-1);
		}
		php->a = tmp;
		php->capacity = newCapacity;
	}
	php->a[php->size] = x;
	php->size++;
	AdjustUp(php->a, php->size-1);
	//不仅要把数组传过去,我们还得把孩子也传过去,而孩子可以通过下标找到
   
}

Expansion and insertion are very common operations. After successful insertion (tail insertion), we have to make an upward adjustment (after all, there is no guarantee that the original small or large heap structure can be maintained after insertion).

PS: The reason why we need to pass the tail is because the data we insert is at the tail, so we have to make adjustments starting from the insertion of the data.

4.3.6.4 Upward adjustment function

When the inserted child is found to be smaller than the parent, a swap is performed. Then let 5 be the child and continue to compare with the new father 10 .

So when will the adjustment end ? In the worst case scenario, child 5 and parent 10 are swapped, child points to 5 , and parent points outside the array, indicating that when parent is less than 0 , it means no adjustment is needed.

But there is one thing to note here. If the parent subscript is (child-1)/2 according to the calculation rule that child 5 is already the root (child==0) , the result will be 0. instead of -0.5 , which represents an adjustment. Before it was over, conflicts arose.

Therefore, the condition for the end of the adjustment here is changed to (child>0) . When child equals 0, it also means that the adjustment is over.

//向上调整函数
void AdjustUp(HPDataType* a, int child)
{
	int parent = (child - 1) / 2;
	while (child > 0)
	{
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			child = parent;
			parent = (parent - 1) / 2;
		}
		else
		{
			break;
		}
	}
}
4.3.6.5 Exchange function
//交换函数
void Swap(HPDataType* p1, HPDataType* p2)
{
	HPDataType tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}
 4.3.6.6Print function
//打印函数
void HeapPrint(HP* php)
{
	assert(php);

	for (size_t i = 0; i < php->size; i++)
	{
		printf("%d ", php->a[i]);
	}
	printf("\n");

}

After completing the above functions, let’s test it:

int main()
{
	int a[] = { 65, 100, 70, 32, 50, 60 };
	HP hp;
	HeapInit(&hp);
	for (size_t i = 0; i < sizeof(a) / sizeof(int); i++)
	{
		HeapPush(&hp, a[i]);
	}
	HeapDestroy(&hp);

	return 0;
}

After the test is completed, adjust it after inserting and it will still be a small pile~

PS: Our insertion here is to open an additional space and then take the values ​​​​from the array and insert them one by one in the opened space.

4.3.6.6 Delete function

There is a question, in the heap deletion, who does it make more sense for us to delete?

If you just delete the tail, it doesn't make much sense. The most meaningful thing is to delete the root . As for why, let's demonstrate it below.

Note: It is not advisable to move and cover the array, as this cannot guarantee the original structure of the small heap.

We can do this: let the root and tail data be exchanged and then deleted . The advantage of this is that it will not destroy the original small heap structure of the left and right subtrees.

Then we adjust downward according to the new root to maintain the structure of the small heap. Note: Whether it is downward adjustment or upward adjustment, the premise is that the left and right subtrees are small heaps or large heaps.

Downward adjustment: We first find the left and right children under the root , and then compare them with the smallest of the two . Here, the left child 50 is smaller than the right child 60. Let the root 70 be compared with the left child 50 , and we find that the left child Smaller than the root , the two need to be exchanged according to the rules of the small heap.

And so on, until finally compared with the Leaf 65 . Downward adjustment is a process of adjusting small ones downward and large ones sinking.

In fact, Pop here has another meaning - to find the smallest value in the current binary tree.

Time complexity: O(logN) extremely efficient algorithm sorting

//删除函数
void HeapPop(HP* php)
{
	assert(php);
	assert(php->size > 0);

	Swap(&php->a[0], &php->a[php->size - 1]);
	php->size--;//删除数据

	AdjustDown(php->a, php->size, 0);
}
4.3.6.7 Adjust function downwards

Here we do not deliberately distinguish between the left child and the right child, but first assume that the left child is the smallest. If the right child is actually smaller, then let them exchange. Finally, it stops when the leaves are changed. ——Assume first, then correct mistakes.

//向下调整函数
void AdjustDown(HPDataType* a, int n, int parent)
{
	//假设左右孩子中小的为左孩子
	int child = parent * 2 + 1;

	while (child<n)//换到叶子就终止,循环结束后child已经是指向数组外了
	{
		//找出小的孩子
		if (child+1<n&&a[child + 1] < a[child])//如果右孩子比左孩子小
		{
			child++;//小的孩子变成右孩子
		}

		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			//继续向下调整
			parent = child;
			child = parent * 2 + 1;
			//跟向上调整法一样,交换完后把parent和child都移下一层为下一次的循环作准备
			
		}
		else
		{
			break;
		}
	}
}

One thing to note is that after we specify the condition for child<n , we must also determine whether child+1 is within this range :

For example, when we adjust the position of 80 in the second line ( the 80 at the end has been exchanged with the 32 at the top of the heap, and then 80 has been exchanged with its right child 40 ), according to our code, we will find child+1 , However, there is no right child in the circled part at all, so in order to prevent it from crossing the boundary and randomly replacing it with a value , we need to add another judgment condition-child+1<n.

4.3.6.8 Obtain root value function
//获取根值函数
HeapTop(HP* php)
{
	assert(php);
	assert(php->size > 0);

	return php->a[0];
}
4.3.6.9 Null function
//判空函数
bool HeapEmpty(HP* php)
{
	assert(php);

	return php->size == 0;
}

Test it: Let’s get the smallest, second smallest, etc. data

int main()
{
	int a[] = { 65, 100, 70, 32, 50, 60 };
	HP hp;
	HeapInit(&hp);
	for (size_t i = 0; i < sizeof(a) / sizeof(int); i++)
	{
		HeapPush(&hp, a[i]);
	}
	HeapPrint(&hp);


	while (!HeapEmpty(&hp))
	{
		printf("%d ", HeapTop(&hp));
		HeapPop(&hp);

	}


	HeapDestroy(&hp);

	return 0;
}

In the end, you will find that the ones taken out are indeed the smallest ones, and the results in ascending order are finally formed. As long as we modify it into a large pile or a small pile, we can achieve the effect of descending or ascending order~

4.3.6.10 Heap sort 

Of course, this is not yet true heap sorting, because we just print out the sorted heap . If you want to use the heap to sort the array itself, please see the following analysis:

void HeapSort(int* a, int n)
{
	HP hp;
	HeapInit(&hp);
	for (size_t i = 0; i < sizeof(a) / sizeof(int); i++)
	{
		HeapPush(&hp, a[i]);
	}
	int i = 0;

	while (!HeapEmpty(&hp))
	{
		//printf("%d ", HeapTop(&hp));
        //关键代码,真正实现对数组内部的排序
		a[i++] = HeapTop(&hp);
		HeapPop(&hp);

	}


	HeapDestroy(&hp);
}



int main()
{
	int a[] = { 65, 100, 70, 32, 50, 60 };
	HeapSort(a, sizeof(a) / sizeof(int));

	return 0;
}

But this way of writing has two big flaws:

  • First there is a heap data structure
  • Space complexity consumption (an additional array space is opened for sorting)

So we still need to optimize:

We don’t need to open up space to create a heap first, we can directly view the array as a heap.

At present, the array can only be said to be arranged in a heap, but it is not a heap yet, so we need to adjust it - build a heap

The most important part! ! !

In the upward adjustment method we wrote earlier, we usually transfer the last data and then adjust the entire heap . Because our insertion is considered a tail insertion, only the tail data is transferred. In the picture above, we can imagine that the array has only one number - 70 ( self-forming a pile ), and then we insert 65 to adjust upward ( for example, we want to adjust it to a large pile ). After the adjustment, it becomes 70 - 65 ( Two arrays form a heap ), and so on to insert data. This is the key to building a heap! ——Traverse the numbers except the top of the heap, treating each adjustment as a tail insertion.

Finally, let’s verify:

void HeapSort(int* a, int n)
{
	//建堆
	for (int i = 0; i < n; i++)
	{
		AdjustUp(a, i);
	}
}

int main()
{
	//int a[] = { 65, 100, 70, 32, 50, 60 };
	int a[] = { 70, 65, 100, 32, 50, 60 };
	HeapSort(a, sizeof(a) / sizeof(int));

	return 0;
}

Finally, the heap building effect was achieved, which also made up for the previous two major flaws~ Next, we will implement the sorting effect~

If we want ascending order, then we need to build a small heap~ Let's change the array to build the heap

Since it is in ascending order, when we select the root ( 2 ) - the smallest data is selected. To select the next smallest, we can only treat the remaining data as a heap. But in the end, we found that the structure of a small heap could no longer be formed ~ So we had to change our thinking. If building a small heap in ascending order didn't work, how about trying a large heap ~

After building a large heap, use the downward adjustment method to first exchange it with the tail 60. After the exchange, you do not need to delete the tail data , but isolate it from the array (that is, ignore its position in the array and heap) .

Then treat the remaining data as a heap, then how to choose the second largest one? Just continue using the downward adjustment method.

The adjustment of each data is logN . There are N data in total, so the time complexity is O(N*logN)

void HeapSort(int* a, int n)
{
	//建堆
	for (int i = 0; i < n; i++)
	{
		AdjustUp(a, i);
	}
	//建大堆升序
	int end = n - 1;//数组尾部数据下标
	while (end > 0)
	{
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);//选出最大
		end--;//让数组最后数据的前一位和堆顶交换
	}
}

int main()
{
	//int a[] = { 65, 100, 70, 32, 50, 60 };
	//int a[] = { 70, 65, 100, 32, 50, 60 };
      int a[] = { 2, 3, 5, 7, 4, 6, 8 };
	HeapSort(a, sizeof(a) / sizeof(int));

	return 0;
}

Debugging content: After continuous debugging, the final result is in ascending order 

4.3.6.11 Adjust downward to build heap

In addition to building a heap by adjusting upwards and inserting, we have another idea: adjusting downwards to build a heap.

If we want to build a big pile, can we just start from position 2 and adjust it downward?

No ~ Because the premise of downward adjustment is that the left and right subtrees are both large , which is obviously not consistent here.

So if the left subtree and right subtree of 3 are both big piles, does it mean that 3 can be adjusted downwards ? Unfortunately, these two subtrees are not big piles either.

Then we might as well think in the opposite direction - start from the bottom and adjust downward , from the bottom to the top. This will ensure that every left and right subtree is a big pile ~ But there is one thing to note, the bottom line There is no need to adjust the parts that belong to the leaves (the leaves themselves can be seen as a big pile).

So the first thing we need to find is the last non-leaf node (6) , and then adjust it downwards (find the largest child - 60 and swap it with it).

Their subscripts are also easy to find. The subscript of the tail 60 is n-1 , then the subscript of its father is (n-1-1)/2 . When we adjust 6 downwards , the order should be We need to adjust the node 4 , and its subscript is just in front of 6 . By analogy, start adjusting at position 7 .

You need to find the subscript of 5 in the same way as above . Just subscript - - and then adjust it downwards.

The final downward adjustment ends~

The implementation of the code part is also very simple:

void HeapSort(int* a, int n)
{
	//建堆——向上调整建堆
	//for (int i = 0; i < n; i++)
	//{
	//	AdjustUp(a, i);
	//}
	// 
	//建堆——向下调整建堆
	for (int i = (n-1-1)/2; i >=0; i--)
	{
		AdjustDown(a, n, i);
	}
	//建大堆升序
	int end = n - 1;//数组尾部数据下标
	while (end > 0)
	{
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);//选出最大
		end--;//让数组最后数据的前一位和堆顶交换
	}
}
4.3.6.12 Proof of complexity of building heap 

 

Generally, we recommend adjusting the heap construction downward because it is more efficient.

T(h) is the total number of adjustments to be made - the second to last layer is adjusted once, the third to last layer is adjusted 2 times... until the first layer is adjusted h-1 times.

 The sum of the blue boxes is the number of binary tree nodes we counted before.

Therefore, the actual time complexity of adjusting downward to build a heap is actually a little smaller than O(N). ( Because when converted into time complexity, it depends on N, but N is difficult to express, so we use h to express it, and finally change it to N )

———————————————————————————————————————————

Now let’s use upward adjustment to build a heap~

We will find that compared to the downward adjustment method, the numbers for upward adjustment are multiplied by more , and the last data is the largest , which is (N/2)*logN when converted , so this is why we give priority to downward adjustment. legal reasons.

Start misaligned subtraction ~

The actual complexity of the final upward adjustment method is almost O(N*logN-N) . In short, there is too much data in the last layer to adjust upward, and there are also many layers to adjust to the upper layer , so it is not as efficient as downward adjustment. Method.

The heap sort here can actually be regarded as N data being adjusted downward , and the result is self-evident ~ O(N*logN)

4.3.6.13 Small exercises (optional)

Note: If you have a C language memory file, you can basically eat it with confidence~

Assume that 1 billion data cannot be stored in the memory. Find the top K largest data in the file.

The basic idea:

  • Read the first 100 data of the file and create a small heap in the memory array
  • Then read the remaining data in sequence and compare it with the data on the top of the heap. If it is greater than the top of the heap, replace it into the heap and adjust it downwards.
  • So after reading the data, the data in the heap will be the top 100 largest ones.

There are N numbers that need to be traversed. In the worst case, each number has to be added to the heap and adjusted down logK times . When N is large enough , the time complexity is O(N) instead of O(N*logK) . Space complexity - O(K)

void CreateNDate()
{
	//造数据
	int n = 1000;
	srand(time(0));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");//写操作
	if (fin == NULL)
	{
		perror("fopen error");
		return;
	}
	for (int i = 0; i < n; i++)
	{
		int x = rand() % 1000000;
		fprintf(fin, "%d\n", x);
	}

	fclose(fin);
}



void PrintTopK(const char* filename, int k)
{
	//1. 建堆——用数组a中前k个元素建堆
	FILE* fout = fopen(filename, "r");//r——读操作
	if (fout == NULL)
	{
		perror("fopen fail");
		return;
	}
	int* minheap = (int*)malloc(sizeof(int) * k);//建立所放堆空间大小
	if (minheap == NULL)
	{
		perroe("minheap fail");
		return;
	}
	for (int i = 0; i < k; i++)
	{
		fscanf(fout, "%d", &minheap[i]);//把数组数据放入文件中
	}
	//开始构建前K个数小堆
	for (int i = (k - 2) / 2; i >= 0; i--)
	{
		AdjustDown(minheap, k, i);
	}

	//2.将剩余n-k个元素依次与堆顶元素交换,如果比堆顶还小就换下一个
	//比堆顶大就替换堆顶
	int x = 0;
	while (fscanf(fout, "%d", &x) != EOF)
	{
		if (x > minheap[0])
		{
			minheap[0] = x;
			AdjustDown(minheap, k, 0);
		}
	}
	//上面的操作我们前面写过了,需要注意的一般是文件那部分代码别写错
	for (int i = 0; i < k; i++)
	{
		printf("%d ", minheap[i]);
	}
	printf("\n");
	fclose(fout);
}



int main()
{
	CreateNDate();
	PrintTopK("Data.txt", 10);
	return 0;
}

In order to verify whether we have really found the 10 largest numbers , we first write 10 numbers larger than one million, and then see if we can find them - the other numbers are smaller than one million, directly in Manually insert a number larger than 1 million into the file and see if you can find it without taking the modulo.

Successfully realized~

5. All codes

//Heap.h
#pragma once
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <time.h>
typedef int HPDataType;
typedef struct Heap
{
	HPDataType* a;
	int size;
	int capacity;
}HP;


//初始化函数
void HeapInit(HP* php);

//另一种初始化函数
void HeapInitArray(HP* php, int* a, int n);

//销毁函数
void HeapDestroy(HP* php);

//插入函数
void HeapPush(HP* php, HPDataType x);

//交换函数
void Swap(HPDataType* p1, HPDataType* p2);

//打印函数
void HeapPrint(HP* php);

//向上调整函数
void AdjustUp(HPDataType* a, int child);

//删除函数
void HeapPop(HP* php);

//获取根值函数
HeapTop(HP* php);

//向下调整函数
void AdjustDown(HPDataType* a, int n, int parent);

//判空函数
bool HeapEmpty(HP* php);

————————————————————————————————————————

//Heap.c
#include "Heap.h"


//初始化函数
void HeapInit(HP* php)
{
	assert(php);

	php->a = NULL;
	php->size = 0;
	php->capacity = 0;

}

//另一种初始化函数
void HeapInitArray(HP* php, int* a, int n)
{
	assert(php);
	assert(a);
	php->a = (HPDataType*)malloc(sizeof(HPDataType) * n);
	if (php->a == NULL)
	{
		perror("malloc fail");
		exit(-1);
	}
	php->size = 0;
	php->capacity = 0;
	memcpy(php->a, a, sizeof(HPDataType) * n);
	for (int i = 1; i < n; i++)
	{
		AdjustUp(php->a, i);
	}
}

//销毁函数
void HeapDestroy(HP* php)
{
	assert(php);
	free(php->a);
	php->a = NULL;
	php->size = php->capacity = 0;
}

//交换函数
void Swap(HPDataType* p1, HPDataType* p2)
{
	HPDataType tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}

//向上调整函数
void AdjustUp(HPDataType* a, int child)
{
	int parent = (child - 1) / 2;//再用公式去算父亲的下标来找到父亲
	while (child > 0)
	{
		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			child = parent;
			parent = (parent - 1) / 2;
		}
		else
		{
			break;
		}
	}
}

//插入函数
void HeapPush(HP* php, HPDataType x)
{
	assert(php);
	//扩容
	if (php->size == php->capacity)
	{
		int newCapacity = php->capacity == 0 ? 4 : php->capacity * 2;
		HPDataType* tmp = (HPDataType*)realloc(php->a, sizeof(HPDataType) * newCapacity);
		if (tmp == NULL)
		{
			perror("realloc fail");
			exit(-1);
		}
		php->a = tmp;
		php->capacity = newCapacity;
	}
	php->a[php->size] = x;
	php->size++;
	AdjustUp(php->a, php->size - 1);
	//不仅要把数组传过去,我们还得把孩子也传过去,而孩子可以通过下标找到
}

//打印函数
void HeapPrint(HP* php)
{
	assert(php);

	for (size_t i = 0; i < php->size; i++)
	{
		printf("%d ", php->a[i]);
	}
	printf("\n");

}

//向下调整函数
void AdjustDown(HPDataType* a, int n, int parent)
{
	//假设左右孩子中小的为左孩子
	int child = parent * 2 + 1;

	while (child < n)//换到叶子就终止,循环结束后child已经是指向数组外了
	{
		//找出小的孩子
		if (child + 1 < n && a[child + 1] < a[child])//如果右孩子比左孩子小
		{
			child++;//小的孩子变成右孩子
		}

		if (a[child] < a[parent])
		{
			Swap(&a[child], &a[parent]);
			//继续向下调整
			parent = child;
			child = parent * 2 + 1;
			//跟向上调整法一样,交换完后把parent和child都移下一层为下一次的循环作准备

		}
		else
		{
			break;
		}
	}
}

//删除函数
void HeapPop(HP* php)
{
	assert(php);
	assert(php->size > 0);

	Swap(&php->a[0], &php->a[php->size - 1]);
	php->size--;//删除数据

	AdjustDown(php->a, php->size, 0);
}

//获取根值函数
HeapTop(HP* php)
{
	assert(php);
	assert(php->size > 0);

	return php->a[0];
}


//判空函数
bool HeapEmpty(HP* php)
{
	assert(php);

	return php->size == 0;
}


————————————————————————————————————————

//Test.c
#include "Heap.h"

树的最优设计
//struct  TreeNode
//{
//	int val;
//	struct TreeNode* firstchild;
//	struct TreeNode* nextbrother;
//};
//
//void HeapSort(int* a, int n)
//{
//	HP hp;
//	HeapInit(&hp);
//	for (size_t i = 0; i < sizeof(a) / sizeof(int); i++)
//	{
//		HeapPush(&hp, a[i]);
//	}
//	int i = 0;
//
//	while (!HeapEmpty(&hp))
//	{
//		//printf("%d ", HeapTop(&hp));
//		a[i++] = HeapTop(&hp);
//		HeapPop(&hp);
//
//	}
//
//
//	HeapDestroy(&hp);
//}

//void HeapSort(int* a, int n)
//{
//	//建堆
//	for (int i = 0; i < n; i++)
//	{
//		AdjustUp(a, i);
//	}
//	//建大堆升序
//	int end = n - 1;//数组尾部数据下标
//	while (end > 0)
//	{
//		Swap(&a[0], &a[end]);
//		AdjustDown(a, end, 0);//选出最大
//		end--;//让数组最后数据的前一位和堆顶交换
//	}
//}

void HeapSort(int* a, int n)
{
	//建堆——向上调整建堆
	//for (int i = 0; i < n; i++)
	//{
	//	AdjustUp(a, i);
	//}
	// 
	//建堆——向下调整建堆
	for (int i = (n-1-1)/2; i >=0; i--)
	{
		AdjustDown(a, n, i);
	}
	//建大堆升序
	int end = n - 1;//数组尾部数据下标
	while (end > 0)
	{
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);//选出最大
		end--;//让数组最后数据的前一位和堆顶交换
	}
}


//int main()
//{
//	//int a[] = { 65, 100, 70, 32, 50, 60 };
//	int a[] = { 70, 65, 100, 32, 50, 60 };
//	HeapSort(a, sizeof(a) / sizeof(int));
//
//	return 0;
//}
void CreateNDate()
{
	//造数据
	int n = 1000;
	srand(time(0));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");//写操作
	if (fin == NULL)
	{
		perror("fopen error");
		return;
	}
	for (int i = 0; i < n; i++)
	{
		int x = rand() % 1000000;
		fprintf(fin, "%d\n", x);
	}

	fclose(fin);
}



void PrintTopK(const char* filename, int k)
{
	//1. 建堆——用数组a中前k个元素建堆
	FILE* fout = fopen(filename, "r");//r——读操作
	if (fout == NULL)
	{
		perror("fopen fail");
		return;
	}
	int* minheap = (int*)malloc(sizeof(int) * k);//建立所放堆空间大小
	if (minheap == NULL)
	{
		perroe("minheap fail");
		return;
	}
	for (int i = 0; i < k; i++)
	{
		fscanf(fout, "%d", &minheap[i]);//把数组数据放入文件中
	}
	//开始构建前K个数小堆
	for (int i = (k - 2) / 2; i >= 0; i--)
	{
		AdjustDown(minheap, k, i);
	}

	//2.将剩余n-k个元素依次与堆顶元素交换,如果比堆顶还小就换下一个
	//比堆顶大就替换堆顶
	int x = 0;
	while (fscanf(fout, "%d", &x) != EOF)
	{
		if (x > minheap[0])
		{
			minheap[0] = x;
			AdjustDown(minheap, k, 0);
		}
	}
	//上面的操作我们前面写过了,需要注意的一般是文件那部分代码别写错
	for (int i = 0; i < k; i++)
	{
		printf("%d ", minheap[i]);
	}
	printf("\n");
	fclose(fout);
}



int main()
{
	CreateNDate();
	PrintTopK("Data.txt", 10);
	return 0;
}

4b12323f94834afd9ec146a3c10df229.jpeg6. Conclusion

In addition to popularizing the concepts of binary trees and heaps among friends, this article is more importantly about the code implementation of the heap function. The methods in it will help improve the efficiency of our algorithm. Finally, thank you all for watching. It’s a great honor for you friends to learn new knowledge. I look forward to seeing you next time~

Guess you like

Origin blog.csdn.net/fax_player/article/details/133394741