[C language] data structure - tree

        A tree is a non-linear data structure, which is a set of hierarchical relationships composed of n (n>=0) finite nodes. It is called a tree because it looks like an upside-down tree, which means it has the roots pointing up and the leaves pointing down.

1. The concept of tree

concept: 

The degree of a node : the number of subtrees contained in a node is called the degree of the node; as shown in the figure above: the degree of A is 2.
Leaf node or terminal node : A node with a degree of 0 is called a leaf node ; as shown in the figure above: E, F, G, H... and other nodes are leaf nodes.
Non-terminal nodes or branch nodes : nodes whose degree is not 0; as shown in the figure above: nodes such as B, C, D, etc. are branch nodes
Parent node or parent node : If a node contains child nodes, this node is called the parent node of its child nodes ; as shown above: A is the parent node of B
Child node or child node : the root node of the subtree contained in a node is called the child node of the node; as shown above: B is the child node of A
Sibling nodes : Nodes with the same parent node are called sibling nodes ; as shown above: B and C are sibling nodes .
Degree of the tree : In a tree, the degree of the largest node is called the degree of the tree; as shown in the figure above: the degree of the tree is 2.
The level of nodes : starting from the definition of the root, the root is the first level, the child nodes of the root are the second level, and so on;
The height or depth of the tree : the maximum level of nodes in the tree ; as shown above: the height of the tree is 4
Cousin nodes : Nodes whose parents are on the same layer are cousins; as shown in the figure above: D and F are cousin nodes.
Ancestor of a node : all nodes on the branch from the root to the node; as shown in the figure above: A is the ancestor of all nodes.
Descendants : Any node in the subtree rooted at a node is called a descendant of the node. As shown above: all nodes are descendants of A.
Forest : A collection of m (m>0) disjoint trees is called a forest;
PS: In some places, the level of the root node is set to 0, so it should be noted that the empty tree is -1.

PS: The subtrees in the tree structure cannot have intersections! Otherwise it is not a tree structure.

(It should be noted that although it is not a tree, it is not wrong, it is another data structure.) 

Left-child right-sibling notation:

typedef int DataType;
struct tree
{
	struct treee* firstchild1; //指向孩子节点
    struct tree* pNextBrother; //指向兄弟
    DataType _data;
};

        The first pointer only points to the left child node of the child node, and the second pointer points to its sibling node , so that the structure of the tree can be displayed. Here we use a method similar to a linked list.

        Here A's child pointer points to B. The child pointer of B points to D, the sibling pointer points to C, the child pointers of C point to F, G, and the sibling pointer points to NULL.

        The child pointer only points to the first child node, here B only points to D, and E is pointed by D's sibling pointer.

(Only take care of the elder one, let the elder take care of the second child, and let the second child take care of the third child.)

PS: If there is no child node or sibling node behind the node, its pointer points to null.

Second, the binary tree

concept:

Binary tree (binary tree) refers to the ordered tree in which the degree of nodes in the tree is not greater than 2. It is the simplest and most important tree.

1. or empty
2. It consists of a root node plus two binary trees called left subtree and right subtree

PS:

  • 1. There is no node with degree greater than 2 in the binary tree
  • 2. The subtrees of the binary tree are divided into left and right, and the order cannot be reversed, so the binary tree is an ordered tree

Full binary tree : Except for the nodes at the lowest level of the binary tree, every level of the node has two child nodes.

Complete binary tree : Except for the last layer, the number of nodes in each layer reaches the maximum number, and the nodes in the last layer exist continuously from left to right.

PS: A full binary tree is a special kind of complete binary tree.

Properties of binary trees:

The subscript (array) calculates the relationship between parent and child:

                                     leftchild = parent*2+1;

                                     rightchile = parent*2+2;

                                     parent= (child-1)/2 ; (Don't consider decimals, integer operations will ignore decimals)

PS: h is the height of the binary tree, n is the number of nodes.

 (It is not clear if the index is ^, so I will use pictures here) 

Q: In a complete binary tree with 2k nodes, what is the number of leaf nodes?

A:  n0+n2+n1=2k; and n2=n0 - 1; so n0+n0 -1+n1=2k;

       2n0-1+n1=2k; Here the node (n1) with a degree of 1 is either 1 or 0. If it is 0 here, n0 is a decimal, but the number of nodes cannot be a decimal, so n1 is 1 here.

       So n0=k; so the number of leaf nodes is k.

Sequential storage:

       Sequential structure storage uses arrays for storage . If the node is empty, space must be reserved. It will cause a waste of space and is suitable for storing full binary trees and complete binary trees. 

Chain storage:

typedef int BTDataType;
// 二叉链
struct BinaryTreeNode
{ 
 struct BinTreeNode* _pLeft; // 指向当前节点左孩子
 struct BinTreeNode* _pRight; // 指向当前节点右孩子
 BTDataType _data; // 当前节点值域
}

         The linked storage structure of the binary tree means that a linked list is used to represent a binary tree, that is, a link is used to indicate the logical relationship of elements.

 

3. Data structure - heap

Small root pile :

            1. It is a complete binary tree.

            2. All parents in the tree are less than or equal to children.

Big root heap :

            1. It is a complete binary tree.

            2. All fathers in the tree are greater than or equal to children.

        Here is the array used to simulate the structure of the heap.

Insert data:

        Because of the sequence table used, its real storage can be directly inserted at the end, but how to make it logically become a heap?

        Now we give an array, which is logically regarded as a complete binary tree. Add a new piece of data, and use the upward adjustment algorithm to adjust it into a small heap.

int array[] = {10,15,19,18,28,34,65,49,25,37};

​​​​​​(Here, use drawing to quickly simulate the heap)

        It is assumed here that there is already a heap with the correct structure, and a node 27 is inserted, and the heap does not match here. Then here you need to use the upward adjustment algorithm to adjust.

        The inserted node is compared with the parent node. If the parent node is larger, exchange itself with the parent node. Then continue to compare with the new parent node until you encounter a parent node smaller than yourself.

 

        This way the heap lines up correctly.

PS: When adjusting the size of a node, it only needs to be compared with the node of the same ancestor, and the size between sibling nodes does not affect the establishment of the heap.       

code:

//向上调整
void AdjustUp(int* arr,int pos)
{

	int child = pos;
	int parent = (child - 1) / 2; //父节点公式。

	while ( child > 0 ) {

		parent = (child - 1) / 2;
		
		if (arr[child] < arr[parent]) {
			Swap(&arr[child],&arr[parent]);
		}
		else {
			break;
		}

		child = parent;
	}
}

        At the beginning, there is no data, and each time a number is inserted, it is adjusted, so that the heap is created.

        When looking for the subscript of the parent node here, because it is an integer for division, the decimal point will be ignored. The calculated parent nodes of child nodes 4 and 5 are both 2, which happens to correspond.

        The worst case here is to switch to the root node, then it can be judged whether the root node has been reached to end the loop. Be careful not to use parent>=0 to judge, because at the end parent = -1/2, the result is 0, which will cause an infinite loop .

PS: A small heap is created here. If you want to create a large heap, just replace the parent-child node comparison with a greater than symbol.

delete data:

        Deleting the root node of the heap cannot be deleted directly, otherwise its logical structure will fail and the heap will be disrupted.

        First, the root node is exchanged with the last leaf node, and then deleted. Then adjust the new root node downward.

       Adjust the idea downward: select a smaller child node, and if it is also smaller than itself, then exchange it. If it is greater than itself then end the exchange. (Choose a larger number when generating a large pile, and exchange if it is larger than yourself)

        Not every node has a right node, if there is only one left node, just swap directly. If the subscript of your own child node exceeds the size of the heap, it means that you are already a leaf node, and you must exit the loop.

PS: There is a prerequisite for the downward adjustment algorithm: the left and right subtrees must be a heap to be adjusted.

code:
void AdjustDown(int *arr ,int size,int parent)
{
	int child = parent * 2 + 1;

	while ( child < size ) {

		//防止越界,因为有左孩子节点,不一定有右孩子节点。
		if ( child + 1 < size && arr[child + 1] <arr[child]) {
			child++;
		}

		if (arr[child] < arr[parent]) {
			Swap(&arr[child], &arr[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else {
			break;
		}
	}
}

        With insertion and deletion, the structure of the heap can be written.

 Heap code implementation:

typedef int HPDataType;
typedef struct Heap
{
	HPDataType* _a;
	int _size;
	int _capacity;
}Heap;

//交换
void Swap(HPDataType* a, HPDataType* b)
{
	HPDataType temp = *a;
	*a = *b;
	*b =  temp;
}

//向上调整
void AdjustUp(int* arr,int pos)
{
	//建立小堆,如果新加入的数据比父节点小,就交换,比父节点大就退出。
	int child = pos;
	int parent = (child - 1) / 2;

	while ( child > 0 ) {

		parent = (child - 1) / 2;
		
		if (arr[child] < arr[parent]) {
			Swap(&arr[child],&arr[parent]);
		}
		else {
			break;
		}

		child = parent;
	}
}

//向下调整
void AdjustDown(int *arr ,int size,int parent)
{
	//取较小的子节点交换数据。
	int child = parent * 2 + 1;

	while ( child < size ) {

		//防止越界,因为有左孩子节点,不一定有右孩子节点。
		if ( child + 1 < size && arr[child + 1] <arr[child]) {
			child++;
		}

		if (arr[child] < arr[parent]) {
			Swap(&arr[child], &arr[parent]);
			parent = child;
			child = parent * 2 + 1;
		}
		else {
			break;
		}
	}

}

//初始化
void HeapInit(Heap* hp)
{
	assert(hp);
	hp->_a = NULL;
	hp->_capacity = 0;
	hp->_size = 0;
}

//创建堆
void HeapCreate(Heap* hp, HPDataType* a, int n)
{
	assert(hp);
	assert(a);

	for (int i = 0; i < n;i++) {
		HeapPush(hp, a[i]);
	}
}

//销毁
void HeapDestory(Heap* hp)
{
	assert(hp);
	free(hp->_a);
	hp->_a = NULL;
	hp->_capacity = 0;
	hp->_size = 0;
}

//插入数据
void HeapPush(Heap* hp, HPDataType x)
{
	assert(hp);
	int newCpacity = 0;

	if (hp->_capacity==hp->_size) {
		newCpacity = hp->_capacity == 0 ? 4 : hp->_capacity * 2;
		HPDataType* temp=(HPDataType*)realloc(hp->_a, sizeof(HPDataType) * newCpacity);
		if (temp==NULL) {
			perror("temp为NULL");
			exit(-1);
		}

		hp->_a = temp;
		hp->_capacity = newCpacity;
	}

	hp->_a[hp->_size] = x;
	hp->_size++;


	//向上调整,构建堆
	AdjustUp(hp->_a,hp->_size-1);
}

//删除数据
void HeapPop(Heap* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));

	//交换首尾元素,size--,然后把根节点的元素向下调整。

	Swap(&hp->_a[0],&hp->_a[hp->_size-1]);
	hp->_size--;

	AdjustDown(hp->_a,hp->_size,0);

}

//取根数据
HPDataType HeapTop(Heap* hp)
{
	assert(hp);
	assert(!HeapEmpty(hp));

	return hp->_a[0];
}

//堆的大小
int HeapSize(Heap* hp)
{
	assert(hp);
	return hp->_size;
}

//堆是否为空
bool HeapEmpty(Heap* hp)
{
	assert(hp);
	return hp->_size==0;
}

test: 

       

        Draw a picture to quickly judge whether the structure is correct.

        Since the sequence table is actually used for storage, most codes are written similarly to the sequence table. The main thing here is the logic and ideas of inserting and deleting data.

Heap sort:

        The advantage of the heap is that the selection speed is very fast, so you can use the heap for sorting.

        Directly use the heap already written above.

void HeapSort(int* a, int n)
{
	Heap hp;
	HeapInit(&hp);
	for (int i = 0; i < n;i++) {
		HeapPush(&hp,a[i]);
	}

	int i = 0;
	while (!HeapEmpty(&hp)) {
		a[i++] = HeapTop(&hp);
		HeapPop(&hp);
	}
}

test:

         This way of writing has disadvantages. When using heap sorting, you don’t have to write out all the data structures of the heap. You only need to use arrays and algorithms to simulate the structure of the heap.

        Moreover, two arrays are used here, and the space complexity is also improved.

        Here you can directly use the upward adjustment algorithm and the downward adjustment algorithm to adjust the array.

//方式1:
//直接从数组第二个元素开始向上调整构建堆 ,此时的时间复杂度为O(nlogn);
for (int i = 1; i < n;i++) {
	  AdjustUp(a, i);
}

         Directly adjust the data of the incoming array upwards, and the adjusted array is the structure of the heap, and the heap can be sorted after forming the heap.

PS: To build a heap here, you need to adjust upwards from the second element until the end.

//方式2:
//直接从最后的叶子节点的父节点,往下调整构建堆,时间复杂度为O(N)
for (int i = (n - 1 - 1) / 2; i >= 0; i--) {
	  AdjustDown(a, n, i);
}

        For downward adjustment, there needs to be heaps on the left and right of the node. Here, the first node with heaps is the parent node of the last leaf node. Then adjust upwards one by one until the final heap structure is established.

 PS: The subscript of the last node is the length of the array -1, and its parent node is the first node to start adjusting.

        Finally, sort the adjusted heap, which can be directly sorted with an array.

        The idea is to exchange the data of the root node with the last node, and then reduce the size of the heap by one (so that the last node cannot be traversed), because the data is exchanged to the root node, so the heap needs to be readjusted, Directly adjust the new root data down, and then continue to exchange until the size of the heap becomes 0.

        From the above ideas, we can know that assuming that a small heap is built, its root node is the smallest number in the heap, put it behind the array, and then continue to build the heap, find the next smallest number, and then put it behind. In the end, it will be found that the placed data is arranged in descending order in the array, forming a descending array. (According to its logic, it can be known that a small heap is used to create a descending array, and a large heap is used to create an ascending array)

void HeapSort(int* a, int n)
{
    //i是最后一个叶子节点的父节点
	for (int i = (n - 1 - 1) / 2; i >= 0; i--) {
		AdjustDown(a, n, i);
	}

	//注意升序建立小堆,降序建立大堆。
	//把首尾数据交换,然后把根数据向下调整
	int size = n-1;
	while (size > 0) {
		Swap(&a[0], &a[size]);
		AdjustDown(a,size,0);
		size--;
	}

}

Time complexity of heap sort:

Time complexity for scaling up:

    Adjust upwards, the starting node is the left node of the root node, and the root node does not need to be adjusted. Then go down in turn. The last layer accounts for half of the nodes, so it consumes more.

Total (worst) number of adjustments T(h): 

The formula is equal ratio value*equal difference value, which is simplified here by using the misplaced subtraction method .

 

 It can be found that the front is a geometric sequence, which can be simplified by using the geometric formula:

The number of nodes in a full binary tree is 2^h-1, and the height of the binary tree is h=log(n+1); so: 

 Here, the size of n*logn is the biggest influencing factor, so its time complexity is O(n*logn);

Downscaled time complexity:

        Adjust downwards, the starting node is the first non-leaf node, and then go up in turn. Therefore, the data of layer h does not need to be adjusted at the beginning, and is adjusted from layer h-1.

        Here the first layer has 2^0 nodes, and the h-1 layer can be adjusted at most (excluding your own layer). There are 2^(h-2) nodes in the h-1 layer, which is adjusted down by 1 layer.

The formula can be summed up:

Total (worst) number of adjustments T(h): 

The formula is equal ratio value*equal difference value, which is simplified here by using the misplaced subtraction method .

        The front is 0-1*(h-1); get 1-h. The intermediate coefficients are all subtracted to 1. Followed by 2^(h-1)-0; get 2^(h-1).

         It can be found that the front is a geometric sequence, which can be simplified by using the geometric formula:

 

        The number of nodes in a full binary tree is 2^h-1, and the height of the binary tree is h=log(n+1); so: 

 

         The size of n here is its biggest influencing factor, so its time complexity is O(n).


expand:

TOP-K problem: Find the largest or smallest top K elements in the data set, where the amount of data is generally relatively large.

Ideas:

        Use the heap structure to build a heap for the first K data. If you want to find the largest one, you need to build a small heap. Then compare other data with the root node, and if it is larger than it, enter the heap and then adjust the heap structure. In this way, all the largest top K numbers can be calculated at the end of the comparison, and the cutting efficiency is relatively high.

PS: Find the largest and build a small heap, and build a heap that is larger than the root node, find the smallest and build a large heap, and build a heap that is smaller than the root node.

code:

void PrintTopK(int* a, int n, int k)
{
	//取前K个建立堆。
	int* b = (int*)malloc(sizeof(int)*k);
	assert(a);
	for (int j = 0; j < k;j++) {
		b[j] = a[j];
	}

	//PS:注意选前面最大的几个数,使用小堆
	for (int i = (k - 1 - 1) / 2; i >= 0;i--) {
		AdjustDown(b,k,i);
	}

	for (int p = k; p < n; p++) {
		if (a[p] > b[0]) {
			b[0] = a[p];
			AdjustDown(b, k, 0);//每次比较一次都要调整一次
		}
	}

	for (int y = 0; y < k;y++) {
		printf("%d\n",b[y]);
	}
}

void TestTopk()
{
	int n = 10000;
	int* a = (int*)malloc(sizeof(int) * 10000);
	srand((unsigned)time(NULL));
	for (int i = 0; i < 10000;i++) {
		a[i] = rand() % 10000;
	}
	a[1500] = 10000 + 1;
	a[1231] = 10000 + 2;
	a[531] = 10000 + 3;
	a[100] = 10000 + 4;
	a[1000] = 10000 + 5;
	a[2000] = 10000 + 6;
	a[3000] = 10000 + 7;
	a[4000] = 10000 + 8;
	a[399] = 10000 + 9;
	a[1999] = 10000 + 10;

	PrintTopK(a,10000,10);
}


int main()
{
    TestTopk();
    return 0;
}

test:

Guess you like

Origin blog.csdn.net/weixin_45423515/article/details/124916001