Detailed explanation of the concept, structure, and implementation of trees and binary trees (Part 1)

Table of contents

one tree

1.2 Related concepts of trees

1.3 Tree Representation

1.4 The use of trees in practice (representing the directory tree structure of the file system)

Second, the binary tree

2.1 Binary tree concept

Three, special binary tree

 1. Full binary tree

2. Complete binary tree

3.1 Properties of binary tree

3.2 Storage structure of binary tree

1. Sequential storage

3. 3 The concept and structure of the heap

3. 4 Implementation of the heap (taking the large heap as an example)

1. Heap.h structure creation + function declaration

2. Heap.c function implementation

3. HeapText.c test

4. Adjust the algorithm upwards (take the large heap as an example)

2. Heap delete data (large heap)

3. Adjust the algorithm downwards (big pile)

3.5 Application of Heap

1. Top-K questions

2. Heap sort

3. Heap sorting: Proof of heap building time complexity O(N)

one tree

     A tree is a non-linear data structure, which is a set of hierarchical relationships composed of n (n>=0) finite nodes. It is called a tree because it looks like an upside-down tree, which means it has the roots pointing up and the leaves pointing down .
  • There is a special node called the root node , and the root node has no predecessor nodes.
  • Except the root node, the other nodes are divided into M (M>0) disjoint sets T1, T2, ..., Tm , each of which Ti (1<= i <= m) is a structure and Tree-like subtrees. The root node of each subtree has one and only one predecessor, and can have zero or more successor nodes. Therefore, the tree is defined recursively .

 

Note: In the tree structure, there can be no intersection between subtrees, otherwise it is not a tree structure 

 

1.2 Related concepts of trees

Degree of node : The number of subtrees contained in a node is called the degree of the node; as shown in the figure above: A is 6
Leaf node or terminal node : a node with a degree of 0 is called a leaf node; as shown in the figure above: nodes such as B, C, H, I... are leaf nodes
Non-terminal nodes or branch nodes : nodes whose degree is not 0; as shown in the figure above: nodes such as D, E, F, G... are branch nodes
Parent node or parent node : If a node contains child nodes, this node is called the parent node of its child nodes; as shown above: A is the parent node of B
Child node or child node : the root node of the subtree contained in a node is called the child node of the node; as shown above: B is the child node of A
Brother nodes : Nodes with the same parent node are called brother nodes; as shown in the figure above: B and C are brother nodes
Degree of the tree : In a tree, the degree of the largest node is called the degree of the tree; as shown above: the degree of the tree is 6
The level of nodes : starting from the definition of the root, the root is the first level, the child nodes of the root are the second level, and so on;
Tree height or depth : the maximum level of nodes in the tree; as shown above: the height of the tree is 4
Cousin nodes : Nodes whose parents are on the same layer are cousins; as shown in the figure above: H and I are sibling nodes
Ancestors of a node : all nodes on the branch from the root to the node; as shown in the figure above: A is the ancestor of all nodes
Descendants : Any node in the subtree rooted at a node is called a descendant of the node. As shown above: all nodes are descendants of A
Forest : A collection of m (m>0) disjoint trees is called a forest;

1.3 Tree Representation

    The tree structure is more complicated than the linear table, and it is more troublesome to store and express. Since the value range is saved, the relationship between nodes and nodes is also saved . In practice, there are many ways to represent trees, such as: parent representation , child representation, child parent representation, and child sibling representation. Here we simply understand the most commonly used child brother notation.
typedef int DataType;
struct   Node
{
   structNode*_firstChild1;    // 第一个孩子结点
   structNode*_pNextBrother;   // 指向其下一个兄弟结点
   DataType_data;               // 结点中的数据域
};

As shown in the picture:

1.4 The use of trees in practice (representing the directory tree structure of the file system)

 

Second, the binary tree

2.1 Binary tree concept

A binary tree is a finite set of nodes that:
1. or empty.
2. It consists of a root node plus two binary trees called left subtree and right subtree

 As can be seen:

1. There is no node with degree greater than 2 in the binary tree.

2. The subtrees of the binary tree are divided into left and right, and the order cannot be reversed, so the binary tree is an ordered tree
Note: For any binary tree, it is composed of the following situations:

Three, special binary tree

 1. Full binary tree

: A binary tree, if the number of nodes in each layer reaches the maximum value, then the binary tree is a full binary tree. That is to say, if a binary tree has K layers and the total number of nodes is 2^k - 1 , then it is a full binary tree.

2. Complete binary tree

: The complete binary tree is a very efficient data structure, and the complete binary tree is derived from the full binary tree. For a binary tree with a depth of K and n nodes, it is called a complete binary tree if and only if each node has a one-to-one correspondence with the nodes numbered from 1 to n in the full binary tree with a depth of K. It should be noted that a full binary tree is a special kind of complete binary tree.

 As shown in the picture:

3.1  Properties of Binary Tree

1. If the number of layers of the root node is specified as 1, then there are at most 2^(i - 1) nodes on the i-th layer of a non-empty binary tree 

2. If the number of layers of the root node is specified as 1, then the maximum number of nodes in a binary tree with depth h is n = 2^h - 1.

3. If the number of layers of the root node is specified as 1, the depth of a full binary tree with n nodes , h = log2 (n + 1).

4.  For any binary tree, if the degree is 0, the number of leaf nodes is K, and the branch node with degree 2 is Z, then K = Z + 1.

5. For a complete binary tree with n nodes, if all nodes are numbered from 0 in the order of the array from top to bottom and from left to right, then for the node with the serial number i:

  • If i>0, the parent number of the i position node: (i-1)/2 ; i=0, i is the root node number, no parent node
  • If 2i+1<n, left child number: 2i+1, 2i+1>=n otherwise there is no left child
  • If 2i+1+1<n, the right child number: 2i+1+1, 2i+2>=n otherwise there is no right child

As shown in the picture:

 

 3.2   Storage structure of binary tree

Binary trees can generally be stored using two structures, a sequential structure and a chain structure .

1. Sequential storage

Sequential structure storage is to use arrays to store . Generally, arrays are only suitable for representing complete binary trees , because not complete binary trees will waste space. In reality, only the heap will use arrays for storage. Binary tree sequential storage is physically an array and logically a binary tree.

 3.  The concept and structure of 3 heaps

   Concept: To put it simply and concisely, if the value of the father is greater than that of the child, it is called a large pile; otherwise, it is called a small pile.

Properties of the heap :
  • The value of a node in the heap is always not greater than or not less than the value of its parent node;
  • The heap is always a complete binary tree.

 

3.4 Implementation of the heap (taking the large heap as an example)

Note: The upward and downward adjustment algorithm and the deletion of heap data are explained in detail separately

1. Heap.h structure creation + function declaration

#pragma once
#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
#include<stdbool.h>

typedef int HeapDateType;
typedef struct Heap {
	HeapDateType* a;
	int size;
	int capacity;
}HP;

//                                        小堆 
// 堆初始化
void HeapInit(HP* hp);
// 插入数据,并自动调整数据
void HeapPush(HP* hp, HeapDateType x);
// 对堆空间扩容
void Heap_add_room(HP* hp);
// 删除数据
void HeapPop(HP* hp);
// 销毁数据
void HeapDestroy(HP* hp);
// 打印二叉数数据
void HeapPrint(HP* hp);
// 向下调整数据
void HeapAjustDown(int* a, int size, int parent);
// 向上调整数据
void HeapAjustUp(int* a, int child);
// 交换位置
void Swap(int* n1, int* n2);
// 判断堆是否为空
bool HeapEmpty(HP* hp);
// 返回堆顶元素
HeapDateType HeapTop(HP* hp);

2. Heap.c function implementation

#pragma once
#include"Heap.h"
//二叉树初始化
void HeapInit(HP* hp) {
	assert(hp);
	hp->a = NULL;
	hp->size = hp->capacity = 0;
}

// 销毁数据
void HeapDestroy(HP* hp)
{
	assert(hp);
	free(hp->a);
	/*hp->a = NULL; // hp 首先是在栈上的变量,数据在函数完成后自动回收,
					 所以不用担心野指针
	free(hp);*/
	hp->size = hp->capacity = 0;
}
// 打印二叉数
void HeapPrint(HP* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));
	for (int i = 0; i < hp->size; i++)
	{
		printf("%d  ", hp->a[i]);
	}
	printf("\n");
}

// 删除数据
void HeapPop(HP* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));
	//交换堆顶, 堆底数据
	Swap(&hp->a[0], &hp->a[hp->size - 1]);
	hp->size--;                               // 没有减一
	// 再向下调整
	HeapAjustDown(hp->a, hp->size, 0);
}


// 插入数据,并自动调整数据
void HeapPush(HP* hp, HeapDateType x) {
	assert(hp);
	if (hp->size == hp->capacity)
	{
		Heap_add_room(hp);
	}
	hp->a[hp->size++] = x;
	// 向上调整
	HeapAjustUp(hp->a, hp->size - 1);// 输入最后一个有效数字的下标
}

// 向下调整
void HeapAjustDown(int *a, int size, int parent)
{
	assert(a);
	int child = 2 * parent + 1;
	while (child < size)
	{
		if (child + 1 < size && a[child + 1] > a[child])                       // 大堆
		{
			child++;
		}

		if (a[child] > a[parent])                       // 选大的
		{
			Swap(&a[child], &a[parent]);
			parent = child;
			child = 2 * parent + 1;
		}
		else
		{
			break;
		}
	}
}
// 向上调整数据
void HeapAjustUp(int * a, int child) // 孩子下标
{
	assert(a);
	int parent = (child - 1) / 2;
	while (child > 0) // 不能为负数
	{
		if (a[child] > a[parent])    // 大的替换
		{
			//交换
			Swap(&a[child], &a[parent]);
			child = parent;
			parent = (child - 1) / 2;
		}
		else
		{
			break;
		}
	}
}
// 交换位置
void Swap(int* n1, int* n2) 
{
	int tmp = *n1;
	*n1 = *n2;
	*n2 = tmp;
}
// 判断堆是否为空
bool HeapEmpty(HP* hp)
{
	assert(hp);
	return hp->size == 0;
}

void Heap_add_room(HP* hp)
{
	int newcapacity = hp->capacity == 0 ? 4 : hp->capacity * 2;
	HeapDateType* tmp = (HeapDateType*)realloc(hp->a, sizeof(HeapDateType) * newcapacity);
	if (tmp == NULL)
	{
		perror("realloc");
		exit(-1);
	}
	hp->a = tmp;
	hp->capacity = newcapacity;
}

HeapDateType HeapTop(HP* hp)
{
	assert(hp && !HeapEmpty(hp));
	return hp->a[0];
}

3. HeapText.c test

#pragma once
#include"Heap.h"
void text()
{
	int b[6] = {34, 32, 31, 12, 3, 28};
	HP hp;
	HeapInit(&hp);
	for (int i = 0; i < 6; i++)
	{
		HeapPush(&hp, b[i]);
	}
	HeapPrint(&hp);
	HeapPush(&hp, 56);
	HeapPrint(&hp);
	HeapPush(&hp, 16);
	HeapPrint(&hp);
    HeapDestroy(&hp);
}

4. Adjust the algorithm upwards (take the large heap as an example)

     We can know that the physical storage of the heap is an array. In order to maintain the nature of the heap, only the last insertion is allowed for heap insertion, and at this time, the position of the inserted data needs to be adjusted to maintain a small (or large) heap.

parent subscript : (child - 1) / 2 

 

 code:

// 向上调整数据
void HeapAjustUp(int * a, int child) // 孩子下标
{
	assert(a);
	int parent = (child - 1) / 2;
	while (child > 0) // 不能为负数,为0时已经到堆顶了
	{   // 就2种情况,要么需要调整,要么呆在原地。
		if (a[child] > a[parent])    // 大的替换
		{
			//交换
			Swap(&a[child], &a[parent]);
			child = parent;          // 孩子移动到父亲位置
			parent = (child - 1) / 2; // 父亲结点移动到其父亲的结点
		}
		else               
		{
			break;
		}
	}
}

2. Heap delete data (large heap)

     Deleting the heap is to delete the data at the top of the heap, replace the data at the top of the heap with the last data, then delete the last data in the array, and then perform the downward adjustment algorithm.

3. Adjust the algorithm downwards (big pile)

// 向下调整
void HeapAjustDown(int *a, int size, int parent)  
{
	assert(a);
	int child = 2 * parent + 1;
	while (child < size)
	{
		if (child + 1 < size && a[child + 1] > a[child])// 向下调整有左右孩子,我们寻找大的
		{
			child++;
		}

		if (a[child] > a[parent])      // 大则调整,反之,停止调整
		{
			Swap(&a[child], &a[parent]);
			parent = child;
			child = 2 * parent + 1;
		}
		else
		{
			break;
		}
	}
}

 Do you know how to arrange a lot? How to arrange the small pile? We can think of it this way, the purpose of the upward adjustment algorithm is to send the older children up, and the purpose of the downward algorithm is to send the older children to the top of the stack, so it is fine to use their judgment method to be smaller .

3.5 Application of Heap

1. Top-K questions

       TOP-K problem: Find the top K largest elements or smallest elements in the data combination. Generally, the amount of data is relatively large .

For example: the top 10 professional players, the world's top 500, the rich list, the top 100 active players in the game, etc.
For the Top-K problem, the most simple and direct way that can be thought of is sorting, but: if the amount of data very large , sorting is not advisable (maybe all the data cannot be loaded into memory at once). The best way is to use the heap to solve it. The basic idea is as follows:
1. Use the first K elements in the data set to build a heap
  • For the first k largest elements, build a small heap
  • For the first k smallest elements, build a large heap

2. Use the remaining NK elements to compare with the top elements of the heap one by one. If they are not satisfied, replace the top elements of the heap. After comparing the remaining NK elements with the top elements of the heap in turn, the remaining K elements in the heap are the required top elements. K smallest or largest elements.

 Example: Find the largest 10 numbers among 1000 data.

  • Step 1 : The first 10 data of the array are used to build a small heap of 10 values.

code:

// 创建一个堆
	HP hp;
	HeapInit(&hp);
	// 完成前K个的初始化
	for (int i = 0; i < K; i++)
	{
		HeapPush(&hp, ps[i]);  // 将小的向上调整
	}
  • Step 2 : If the value in the array is greater than the top of the heap, enter the heap and adjust the data. ( 2 methods )

1. Method 1: 

2. Method 2:  Code:

// 开始逐步替换里面的数
	for (int i = K; i < n; i++)
	{
		if (ps[i] > HeapTop(&hp))
		{
			hp.a[0] = ps[i];  // 方法一: 只调用一次函数(更优)
			HeapAjustDown(hp.a, hp.size, 0);
			/*HeapPop(&hp);   // 方法二: 调用三次函数
			HeapPush(&hp, ps[i]);*/
		}
	}

 The final code is as follows:

void PrintTok(HeapDateType *ps, int n, int K)
{
	// 创建一个堆
	HP hp;
	HeapInit(&hp);
	// 完成前K个的初始化
	for (int i = 0; i < K; i++)
	{
		HeapPush(&hp, ps[i]);  // 将小的向上调整
	}
    // 开始逐步替换里面的数
	for (int i = K; i < n; i++)
	{
		if (ps[i] > HeapTop(&hp))
		{
			hp.a[0] = ps[i];  // 方法一: 只调用一次函数(更优)
			HeapAjustDown(hp.a, hp.size, 0);
			/*HeapPop(&hp);   // 方法二: 调用三次函数
			HeapPush(&hp, ps[i]);*/
		}
	}
	// 寻找完后开始打印这前k个数
	HeapPrint(&hp);
}
void text2() {  // 测试函数
	int n = 10000;   // 从10000个数据中找出前10个
	HeapDateType* a = (HeapDateType*)malloc(sizeof(HeapDateType) * n);
	if (a == NULL)
	{
		printf("malloc fail");
		exit(-1);
	}
	srand(time(0));    //  准备随机数
	int K = 10;
	for (int  i = 0; i < n; i++)
	{
		a[i] = rand() % 10000;  // 产生随机数录入用例数组
	}
	a[2] = 10000 + 10;
	a[3] = 10000 + 9;
	a[2353] = 10000 + 8;
	a[5678] = 10000 + 7;
	a[2324] = 10000 + 6;
	a[9999] = 10000 + 5;
	a[3435] = 10000 + 4;
	a[3432] = 10000 + 3;
	a[234] = 10000 + 2;
	a[34] = 10000 + 1;
	PrintTok(a, n, K);
}

2. Heap sort

    We use the TOPK algorithm to find the top 10, but we can't know the specific ranking of the top 10, and heap sorting can solve this problem very well.

Ideas:

1. Build a heap
  • Ascending order: build a large pile
  • Descending order: build a small heap
2. Use the idea of ​​​​heap deletion to sort
     Downward adjustment is used in both heap building and heap deletion, so heap sorting can be completed by mastering downward adjustment.
Taking descending order as an example , the process diagram is as follows:
  •     The first step: build a heap. Assuming that we use the TopK algorithm to find the top 5 largest, we know that the array is already in the form of a small heap . At this time, we need to convert the small heap into a large heap, thus completing the heap building operation.
Time complexity of building a heap: O(N) ----- will prove later

 The process diagram is as follows:

 Code:

for (int parent = (size - 1- 1) / 2; parent >= 0; parent--)
	{
		HeapAjustDown(a, size, parent);
	}
  • Step 2: Delete the data and adjust it down (I didn't understand it before, but the painting is easy to understand)

From the perspective of logical structure: 

 From the perspective of physical structure:

 Full code:

// 排升序 0 -> 10
void HeapSort(HeapDateType* a, int size)
{   // 1. 建堆
	for (int parent = (size - 1- 1) / 2; parent >= 0; parent--)
	{
		HeapAjustDown(a, size, parent);
	}
    // 2. 排序
	for (int end = size - 1; end >= 0; end--) 
	{
		Swap(&a[end], &a[0]);
		HeapAjustDown(a, end, 0);
	}
}

 3. Heap sorting: Proof of heap building time complexity O(N)

    Because the heap is a complete binary tree, and the full binary tree is also a complete binary tree, here we use a full binary tree to prove it for simplicity (the time complexity is originally an approximation, and a few more nodes will not affect the final result):

   

 epilogue

This section is over here, thank you friends for browsing, if you have any suggestions, welcome to comment in the comment area, if you bring some gains to your friends, please leave your likes, your likes and concerns will become bloggers The driving force of the master's creation.

Guess you like

Origin blog.csdn.net/qq_72112924/article/details/130324565
Recommended