[Data Structure] The concepts of trees and binary trees and the sequential structure and implementation of binary trees

Foreword:

Before we learned the data structures of sequential lists, linked lists, stacks and queues, but these data structures are all linear (one-to-one). Next, we need to learnnonlinear data structure - tree (binary tree). Compared with the previous one, the structure of tree is more complex. Not much to say, Let’s get straight to the point.

1. Concept and structure of tree

1.The concept of tree

A tree is anon-linear data structure, which is one-to-many (it is also possible It is a one-to-one relationship). It is a set of hierarchical relationships composed of n (n>=0) limited nodes. It's called a tree because it looks like an upside-down tree, that is, it has theroots pointing up andThe leaves are facing down.

has a special node, called the root node. The root node has no predecessor nodes
except the root node. The remaining nodes are divided into M (M>0) disjoint sets T1, T2,..., Tm, where each set Ti (1<= i
<= m) is another subtree with a structure similar to that of a tree. The root node of each subtree has one and only one predecessor, and can have 0 or more successors, so the tree is Recursiondefined.

Trees in reality and trees in data structures, such as:
Insert image description here

Note: In the tree structure, there cannot be intersection between subtrees, otherwise it will not be a tree structure.
Insert image description here

2. Concepts related to trees

The related concept of tree refers to the connection between the roots, branches, leaves, etc. of the tree, which is similar to the relationship between humans.
Insert image description here

Degree of node: The number of subtrees contained in a node is called the degree of the node; As shown in the figure above: A is a pair of 6, so the degree of A is 6< /span>: A collection of m (m>0) disjoint trees is called a forest; < /span>Forest of all nodes: any node in the subtree with a node as the root is called this node descendants. As shown above: all nodes are descendants of Adescendant: from the root to the branch passed by the node All nodes; as shown in the figure above: A is the ancestorThe ancestor of the node: Both parents are in the same The nodes in the layer are cousins ​​of each other; as shown in the figure above: H and I are cousin nodes of each otherCousin node: The maximum level of the node in the tree; as shown above: the height of the tree is 4 The height or depth of the tree : Starting from the definition of the root, the root is the 1st level, the child nodes of the root are the 2nd level, and so on. node The level of : In a tree, the degree of the largest node is called the degree of the tree; As shown in the figure above: the degree of the tree is 6The degree of the tree: Nodes with the same parent node are called brother nodes; as shown above: B and C are brother nodesSibling node: The root node of the subtree contained by a node is called the child node of the node; As shown in the figure above: B is A's Child nodeChild node or child node: If a node contains child nodes, this node is called the parent node of its child node; As shown in the figure above: A is B's Parent nodeParent node or parent node: A node with a degree other than 0; As shown in the figure above: D, E, F, G... and other nodes are branch nodes Non-terminal node or branch node: A node with degree 0 is called a leaf node; As shown in the figure above: Nodes such as B, C, H, I... are leaves. Node
Leaf node or terminal node










Note: When calculating the node level, counting from the top node downwards can start from 0 or 1, but it is more recommended to start from 1.
Because the tree may be an empty tree, a tree with only one node, or a tree with multiple nodes, if the calculation starts from 0, then the node level (or the height of the tree) should be 0 It is difficult to distinguish whether it is an empty tree or a tree with only one node, so in order to facilitate counting the height of the tree, it is better to start from 1.

Insert image description here

3. Tree storage

The tree structure is more complicated than the linear table, and it is more troublesome to store and represent it. It is necessary to save both the value range and the relationship between nodes. Relationship, in fact, there are many ways to represent trees, such as parent representation, child representation, child parent representation, and child brother representation, etc. Here we will briefly understand the most commonly usedchild brother representation. (also known as left child right brother notation)

struct TreeNode
{
    
    
	int val;
	struct TreeNode* firstchild;
	struct TreeNode* nextbrother;
};

A node can point to multiple nodes arbitrarily. The firstchild pointer of this node points to its first child node (if not, it points to a null pointer), and the nextbrother pointer points to its sibling node (if not, it points to a null pointer).

Insert image description here
Find the first child first, (for example, B is the first child of A) and then find the first child at this time to find his brothers until it ends with a null pointer.

TreeNode* Anode is assumed to be node A
Find node B, the first child of A: TreeNode* child = Anode->firstchild;
while(child) until empty ends
{ ………… child = child->nextbrother; }


Insert image description here

4. Application of trees in practice

The directory we usually use in the computer file system is a tree structure. When you open this computer, there are D drive, C drive, etc., and each layer has multiple files.
Insert image description here

2. Binary tree concept and structure

1. Concept

A binary tree is afinite set of nodes.
1. Up to two nodes, or 1 or 0
2. A root node plus two aliases is called < A i=5>left subtree and right subtree are composed of binary trees

Insert image description here
As can be seen from the above figure, there is no node with degree greater than 2 in a binary tree, and the subtrees of a binary tree can be divided into left and right subtrees, and the order cannot be reversed, so the binary tree is ordered Tree

Note: Any binary tree is composed of the following situations:
Insert image description here
Real binary tree:
Insert image description here

2. Special binary tree

(1) Full binary tree

Concept: A binary tree, ifthe number of nodes in each layer reaches the maximum value, then This binary tree is a full binary tree. That is to say, if the number of levels of a binary tree is h, and the total number of nodes is 2^h - 1, then it is a full binary tree.
Insert image description here

Each layer is full: 2^(i-1) nodes (i is the position of a certain layer)
F(h) = 2 ^ 0 + 2 ^ 1 + ……+2 ^ (h-2) + 2 ^ (h-1)Sum of geometric sequence

Assume that this binary tree has N nodes
N = 2 ^ h - 1 => h = log(N+1) (log is based on 2)

(2) Complete binary tree

Concept: A complete binary tree is modified on the basis of a full binary tree. Suppose its height is h, then its first h-1 layer is full, and the last layer may be full or unsatisfied , and itslast layer must be continuous from left to right. A full binary tree is a special type of complete binary tree.

Insert image description here
Therefore, the total number of nodes in a complete binary tree has a range. When it is a full binary tree, that is, when the total number of nodes is the largest, its total number of nodes is 2^h - 1; when the last layer has only one node, that is, when the total number of nodes is the smallest, Its total number of nodes is 2 ^ (h-1), so the node range is: [2 ^ (h-1), 2 ^ h - 1].

3. Properties of binary trees

1. If the number of levels of the root node is 1, then the hth level of a non-empty binary tree has at most 2^(h-1) nodes;
2. If the level of the root node is 1, thenA binary tree with depth h has a maximum of 2^h-1 nodes;
3. For any binary tree, if the degree is 0, the number of leaf nodes is n0, and the number of branch nodes with degree 2 is n2, then the formula is satisfiedn0=n2+ 1;
4. If the number of levels of the root node is 1, The depth of a full binary tree with n nodes, h=log(n+1 )

4. Storage of binary trees

The storage methods of binary trees are divided into sequential storage and chain storage.

(1)Sequential storage

Sequential structure storage isusing arrays to store. Generally, arrays are only suitable for representing complete binary trees, because if they are not complete binary trees, there will be a waste of space. . The sequential storage of binary trees is physically an array and logically a binary tree.

There is a rule when using array storage:
Insert image description here

The father or child can be found by subscripting at any position
Note: This rule can only be used if a full binary tree or a complete binary tree is required

Insert image description here
Therefore, for non-complete binary trees, it is more suitable to use chained structure storage.

Summary: Full binary trees or complete binary trees are suitable for array storage

(2)Chain storage

The linked structure of a binary tree is represented by a linked list, which is divided into three scopes (data domain and left and right pointer domains) , the left and right pointers are used to store the addresses of the left and right children.
Insert image description here

3. Sequential structure and implementation of binary tree

1. Sequential structure of binary tree

Ordinary binary trees are not suitable for storage in arrays, because there may be a lot of wasted space. A complete binary tree is more suitable for sequential structure storage. In reality, we usually store the heap (a binary tree) using an array of sequential structure. What needs to be noted is the heap and operating system virtual process address here. The heap in space is two different things. One is a data structure, and the other is an area segmentation in the operating system that manages memory.

2. Concept and structure of heap

Heap: A binary tree with a non-linear structure, more precisely a complete binary tree. Suitable for array storage.

Heaps are divided into two types: the heap with the largest root node is called the maximum heap or large root heap, and the heap with the smallest root node is called the minimum heap or small root heap.
Small heap: Any father in the tree is <= child (the comparison is the value of the node)
Large heap: Any father in the tree is > =Child

Insert image description here

3. Implementation of heap

(1) Initialize the heap

There are two ways to initialize the heap.
The first method: Make the array pointed to by the structure empty (no elements), and set the effective number and capacity to 0

void HPInit(HP* php)
{
    
    
	assert(php);
	php->a = NULL;
	php->capacity = 0;
	php->size = 0;
}

The second type: receives the size n of the external array, dynamically opens up a space of size n, and copies the data to the array pointed to by the structure.

void HPInitArray(HP* php, HPDataType* a, int n)
{
    
    
	assert(php);
	assert(a);
	php->a = (HPDataType*)malloc(sizeof(HPDataType) * n);
	if (php->a == NULL)
	{
    
    
		perror("malloc fail");
		exit(-1);
	}
	php->size = n;
	php->capacity = n;
	memcpy(php->a, a, sizeof(HPDataType) * n);
	for (int i = 1; i < n; i++)
	{
    
    
		AdjustUp(php->a, i);
	}

AdjustUp is an upward adjustment algorithm, which will be analyzed below.

(2) Destroy the heap

Needless to say, the destruction of the heap is the same as the destruction of the sequence table.

void HPDestroy(HP* php)
{
    
    
	assert(php);
	free(php->a);
	php->a = NULL;
	php->capacity = php->size = 0;
}

(3) Insertion into the heap

The insertion into the heap adopts the tail insertion of the sequence table (insertion one by one), because the tail insertion of the array is highly efficient (there is also tail deletion, which will be used later). When inserting data, expansion must be considered because the space size was 0 during initialization. Here you can set a variable newcapacity and use the conditional operator. If the capacity is 0, give the initial capacity 4, otherwise it will be twice the original capacity. After the expansion, new data is inserted, the number of elements is ++.

void HPPush(HP* php, HPDataType x)
{
    
    
	assert(php);
	if (php->size == php->capacity)
	{
    
    
		int newcapacity = php->capacity == 0 ? 4 : php->capacity * 2;
		HPDataType* ptr = (HPDataType*)realloc(php->a, sizeof(HPDataType) * newcapacity);
		if (ptr == NULL)
		{
    
    
			perror("realloc fail");
			exit(-1);
		}
		php->a = ptr;
		php->capacity = newcapacity;
	}
	php->a[php->size] = x;
	php->size++;
	AdjustUp(php->a, php->size - 1);//向上调整
}

Is it enough to insert the external data one by one? In fact, it is not the case. After we insert the data, we need to turn the array pointed to by the structure into a heap . Here we will introduce an algorithm.

Adjust the algorithm upwards

Start from the last leaf node and adjust upward. Taking building a small heap as an example, insert a new data as the last leaf node. The characteristic of a small heap is that the parent node <= child node, so the parent node must be found first.

Formula: parent = (child - 1) / 2

We know that although the implementation of the heap is a binary tree, its storage method is essentially an array, and the space is continuous, so the parent node can be found through the subscript of the child node. At this time, the child node can be compared with the parent node. If the parent node is greater than the child node, it is swapped, and then the child node reaches the position of the parent node upwards, and the original parent node reaches the position of its parent node. Pay attention to controlling the range of the child node. The child node cannot be the top element of the heap, otherwise it will go out of bounds, so child>0. If the size relationship between the parent and child nodes meets the characteristics of a small heap, the loop will be jumped out, and the array will become a small heap.
Insert image description here

Prerequisite for upward adjustment: the array is originally a small heap or a large heap

In this case, how do you know whether the array is a small heap or a large heap before inserting a piece of data? In fact, the array pointed to by the pointer here originally had nothing. Data was inserted one by one, and I adjusted it every time I inserted a piece of data. This is equivalent to saying that when I want to insert the next data, the previous data has been adjusted to a small heap, then When inserting the next data, I just adjust the relationship between this data and its parent node.

void AdjustUp(HPDataType* a, int child)
{
    
    
	int parent = (child - 1) / 2;
	while (child > 0)
	{
    
    
		if (a[parent] > a[child])
		{
    
    
			Swap(&a[parent], &a[child]);
			child = parent;
			parent = (parent - 1) / 2;
		}
		else
		{
    
    
			break;
		}
	}
}

(4) Deletion of heap

Deleting an element of the heap uses tail deletion of the sequence list, but it should be noted here that it is meaningless to only delete the last element of the heap. If it is a small heap, then it has a characteristic that the top element of the heap must be the smallest among all elements, so the top element of the heap should be deleted. However, deleting the head of the array requires moving the data, which is troublesome, so here we exchange the top element of the heap with the last element, and then delete the tail, so that the smallest element can be deleted. But after deletion, the size of the element at the top of the heap is uncertain, and it is not necessarily a small heap at this time. Another algorithm is introduced here.

void HPPop(HP* php)
{
    
    
	assert(php);
	assert(php->size > 0);
	Swap(&php->a[0], &php->a[php->size - 1]);
	php->size--;
	AdjustDown(php->a, php->size, 0);
}

Adjust algorithm downwards

Start adjusting from the top element of the heap and adjust downwards. Because the previous foreshadowing only changed the top element of the heap, the top element of the heap is the parent node, and its child nodes can be found through the following formula:

Left child: child = parent * 2 + 1
Right child: child = parent * 2 + 2

So the question is, should we choose the left child or the right child? Here we can select the left child by default, and then add a judgment. If the value of the left child is greater than the value of the right child, the left child's subscript is increased by 1 and becomes the right child. Because the node to be exchanged when adjusting downward to become a small heap parent node is the smaller of the two child nodes. Then compare the size relationship between the parent node and the child node. If the parent node is larger than the child node, exchange it (the exchange of the parent node and the child is to exchange with the smaller child node), then the parent node reaches the position of the child node, and the child node reaches it. The position of the left child node. If the conditions are not met, it means that the heap is small and the loop will be jumped out.

Note: There is a detail to be controlled here. The range of child node subscripts is less than the number of elements. At the same time, if a parent node has only one child node (only the left child), then adding 1 to the subscript of the left child is out of bounds.
child + 1 < n - This condition is true, indicating that there is only a left child but no right child

Insert image description here

void AdjustDown(HPDataType* a, int n, int parent)
{
    
    
	int child = parent * 2 + 1;
	while (child < n)
	{
    
    
		if (child + 1 < n && a[child + 1] < a[child])
		{
    
                             
			child++;
		}
		if (a[parent] > a[child])
		{
    
    
			Swap(&a[parent], &a[child]);
			parent = child;
			child = parent * 2 + 1;
		}
		else
		{
    
    
			break;
		}
	}
}

(5) Get the top element of the heap/the number of elements in the heap/empty judgment

//获取堆顶元素
HPDataType HPTop(HP* php)
{
    
    
	assert(php);
	assert(php->size > 0);
	return php->a[0];
}
//获取堆的元素个数
int HPSize(HP* php)
{
    
    
	assert(php);
	return php->size;
}
//堆的判空
bool HPEmpty(HP* php)
{
    
    
	assert(php);
	return php->size == 0;
}

4. Heap sort

We need to sort the data in the heap (taking ascending order as an example). There are two ways. One is to print it out in order, and the other is to order the array in place.

(1) Heap sort version 1

Insert data into the heap one by one, take the top element of the heap as long as the array is not empty, and then delete the top element of the heap until all elements are printed out in ascending order.

void HeapSort(int* a, int n)
{
    
    
	HP hp;
	HPInit(&hp);
	int i = 0;
	for (i = 0; i < n; i++)
	{
    
    
		HPPush(&hp, a[i]);
	}
	HPPrint(&hp);

	while (!HPEmpty(&hp))
	{
    
    
		printf("%d ", HPTop(&hp));
		HPPop(&hp);
	}
	HPDestroy(&hp);
}

This method has disadvantages:
1. Frequent expansion causes space complexity consumption
2. There must first be a heap data structure< /span>

(2) Heap sort version 2 - sort the array in place

If an array is to be used after heap sorting, it is not appropriate to just print it out and sort it. Therefore, we need to perform heap sort on the array in place to keep the contents of the array in order.

For example:
Original array: 65,100,70,32,50,60
Sorted array: 32,50,60,65 ,70,100

We need to build a heap before sorting, so should we build a small heap or a large heap?
Let’s build a small heap first:
Taking ascending order as an example, we printed out the previous heap sorting, and used the upward adjustment method to build a small heap. The method of building a small heap is also used here. After the small heap is built, the element at the top of the heap is the smallest, so this element is ranked first in the array; then the next smallest element is ranked from front to back until the elements in the array are Arrange everything from front to back in ascending order.

But the method of building a small heap first has flaws:
Analysis: The element at the top of the heap is the smallest one in the heap and corresponds to the array in sequence from In the back row, each time a piece of data is queued, the next smallest data must be found from the subsequent data. However, the problem is that the next data after the row is not necessarily the second smallest, so the heap must be rebuilt for the subsequent data. The top element of the reconstructed heap is the next smallest. Rebuilding the heap each time increases the time complexity, resulting in very low efficiency.

The time complexity of traversing one data in the upward adjustment method is: logN
There are N data, so the time complexity of building a heap is: N * logN
The time complexity of building a heap every time a piece of data is queued is: N * (N * logN)
It is equivalent to building a small heap N times

So here we should build a big heap. The top of the big heap is the largest element in the heap, but it is ranked at the first position of the array. How to make it reach the last position of the array?

In this step, you can use the idea of ​​heap deletion. Swap the element at the top of the heap with the last element. At this time, the largest element will be where it should be. The next step is very critical. If you don’t really delete the last element, then the largest element will be ranked last. Gone? So here we can define a variable end, which points to the last element of the array. After the exchange is completed, it will be adjusted downward. The adjustment range is the effective number from 0 (the first element) to end, Note that the element pointed to by end is not within the adjustment range. Then end decreases by 1 and controls the end>0 loop, because there is only one element in the end and no adjustment is needed, it is the smallest.
Insert image description here

//建堆
	for (int i = 1; i < n; i++)
	{
    
    
		AdjustUp(a, i);
	}
	//调整
	int end = n - 1;
	while (end > 0)
	{
    
    
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);
		end--;
	}

(3) Comparison of optimized heap sort version 2 and time complexity

Compared with the upward adjustment method to build a heap,the time complexity of building a heap using the downward adjustment method is better. Previously we used the upward adjustment method to build a heap, so how do we use the downward adjustment method to build a heap?

We know that using the upward adjustment method to build a heap starts from the last node, finds its parent node and then compares it to complete the heap building.

The total number of adjustments performed: the number of data in each layer * the number of layers moved up or down

Calculation of time complexity of upward adjustment method
As shown in the figure:
Insert image description here
Heap construction of downward adjustment method:
The downward adjustment method used previously is to adjust downward from the top of the heap and compare them in sequence. Instead of starting at the top of the pile, adjust from the bottom. Find the last parent node (as long as the parent node must have child nodes), then compare and adjust; then go to the previous parent node to perform the previous operations, and finally build a heap.

Find the last parent node:
The last child node: child = n-1
If there is a child, find the father: (child-1)/ 2
So the last parent node is: (n - 1 - 1) / 2

Calculation of time complexity of downward adjustment method
As shown in the figure:
Insert image description here
By comparison: the total number of upward adjustments is more than the total number of downward adjustments 2 ^ (h-1), which means there is one more layer (the last layer) of calculation. The last layer accounts for about half of the total, so it is better to adjust the magic stack downwards.

void HeapSort(int* a, int n)
{
    
    
	//建堆
	//选择向上调整——O(N*log(N))
	/*for (int i = 1; i < n; i++)
	{
		AdjustUp(a, i);
	}*/
	//选择向下调整——O(N)
	int i = 0;
	for ( i = (n - 1 - 1) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(a, n, i);
	}
	//调整
	int end = n - 1;
	while (end > 0)
	{
    
    
		Swap(&a[0], &a[end]);
		AdjustDown(a, end, 0);
		end--;
	}
}

(4)TopK problem

Suppose there are N pieces of data, find the top K largest ones

1. Read the first K data of the file and build a small heap in the memory array 2. Read the following data in sequence Data, is compared with the top element of the heap. As long as it is larger than the top element of the heap, replace the top element of the heap into the heap and then adjust it downwards 3. After reading all the data, the data in the heap will be the top K ones

This method is very clever. Build a small heap so that the element at the top of the heap is the smallest in the heap. As long as it is larger than it, it will be swapped into the heap. Large data enters the heap and sinks to the bottom of the heap. Only the top K largest data in the heap are not exchanged. When there are the top K largest elements in the heap, because it is a small heap, the top of the heap is the smallest element in the heap, but it is larger than the elements not in the heap. Even if there are already K-1 elements in the heap at the beginning, within the range of the first K largest elements of all elements, if you adjust downward to keep the heap small, there must be an element at the top of the heap and this element is smaller than some subsequent data. After reading the following element, swap it and adjust it downwards. The top K largest ones are found.
If you are building a large heap, the top element of the large heap will be the largest in the heap. Only if it is larger than it can it be added to the heap. If the initial heap is built, the top element of the heap will be all The largest of the elements, wouldn't that block all the elements?

void PrintTopK(const char* file, int k)
{
    
    
	FILE* fout = fopen(file, "r");
	if (fout == NULL)
	{
    
    
		perror("fopen fail");
		return;
	}
	int* minheap = (int*)malloc(sizeof(int) * k);
	if (minheap == NULL)
	{
    
    
		perror("minheap fail");
		return;
	}
	int i = 0;
	for (i = 0; i < k; i++)
	{
    
    
		fscanf(fout, "%d", &minheap[i]);
	}
	for (i = (k - 2) / 2; i >= 0; i--)
	{
    
    
		AdjustDown(minheap, k, i);
	}
	int x = 0;
	while (fscanf(fout, "%d", &x) != EOF)
	{
    
    
		if (minheap[0] < x)
		{
    
    
			minheap[0] = x;
			AdjustDown(minheap, k, 0);
		}
	}
	for (i = 0; i < k; i++)
	{
    
    
		printf("%d ", minheap[i]);
	}
	free(minheap);
	fclose(fout);
}
void CreateNDate()
{
    
    
	// 造数据
	int n = 10000;
	srand(time(0));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");
	if (fin == NULL)
	{
    
    
		perror("fopen error");
		return;
	}
	for (int i = 0; i < n; ++i)
	{
    
    
		int x = rand() % 1000000;
		fprintf(fin, "%d\n", x);
	}
	fclose(fin);
}
int main()
{
    
    
	CreateNDate();
	PrintTopK("data.txt", 5);
	return 0;
}

Insert image description here
Thanks for watching~

Guess you like

Origin blog.csdn.net/2301_77459845/article/details/132777524