Article directory
foreword
1. Parent representation
Since each nodeonly one parent node, so we can represent a tree by its parents. Specific way throughthe form of an arrayaccomplish.
Subscript law
- The subscript of the root node is 0
- Sort from top to bottom in layer order
- Each layer increases from left to right
Representation:
storage method
- Two-dimensional array
- The column label of the data is 0 , only need to determine the row label to lock the position
- The subscript of the parent node of the root node is -1
- The column labeled 1 is stored in the parent node , and the stored data is the row label of the parent node
2. Complete binary tree conclusion
Calculation of child nodes of complete binary tree
- Premise: Both the left and right children of the node exist
- Let the subscript of the node be N
- left child subscript = 2*N+1
- Right child subscript = 2*N+2
Calculation of the parent node of the complete binary tree
- Prerequisite: The parent node of the node exists and is not the root node
- Let the subscript of the node be N
- Parent node subscript = N/2 - 1
1. Sequential structure
1. Philosophy
- The sequence structure is stored as an array
- Ordinary binary tree storage is inconvenient
- A heap is a complete binary tree . suitable for storage in one-dimensional arrays
Notice:Ordinary binary trees can be stored in one-dimensional arrays, but the space is seriously wasted.
Graphic:
Description :The deeper the depth, the more serious the waste of space
Replenish
- Logical structure:peopleunderstanding of the data, andabstract model。
- Linear structure:computerunderstanding of the data, theComputer Languagesmapping in .
- Therefore: Binary trees do not have to be stored with pointers .
2. Heap
concept:
If there is a key code 1 set K = {k1, k2, k3, ..., }, store all its elements in a one-dimensional array in the order of complete binary tree storage , and satisfy: k i <= k 2 *i+1 and k i <=k 2*i+2 ( k i >=k 2*i+1 and k i >=k 2*i+2 ), i = 0, 1, 2..., then it is called for small (or large) heaps, Here i is the subscript of the node. The heap with the largest root node is called the largest heap or large root heap , and the heap with the smallest root node is called the smallest heap or small root heap .
small root heap
- The root node of a tree is smaller than its child nodes
- A tree can be divided into a root and a subtree
Dagendui
- The root node of a tree is larger than its child nodes
3. Implementation of the heap
- The form of the array - sequence table
//大根堆
typedef int HPDateType;
typedef struct Heap
{
HPDateType* arr;
int size;
int capacity;
}Heap;
-
Key idea: add and delete data
-
Increase data: increase from the bottom of the stack, upward adjustment
-
Delete data: pop out the top of the stack and adjust down
Here what I have achieved is:Dagendui
1. Initialize the heap
- set size to 0, indicates the number of current data elements , and also indicates the subscript of the next data
- size is set to -1, indicating the subscript of the current data element
void HeapCreate(Heap* hp)
{
HPDateType* tmp = (HPDateType*)malloc(sizeof(HPDateType) * 4);
if (tmp == NULL)
{
perror("malloc fail");
exit(-1);
}
hp->arr = tmp;
hp->size = 0;
hp->capacity = 4;
}
Helper function - swap elements
- Remember to pass pointers !
void Swap(HPDateType* arr, int child, int parent)
{
int tmp = arr[child];
arr[child] = arr[parent];
arr[parent] = tmp;
}
2. Build heaps - increase data
- Increase from the bottom of the pile and adjust the data upwards
- The relationship between the child node and the parent node of the complete binary tree is compared
- Child nodes are only compared with ancestors until no greater than the ancestor or until the root node .
Put 1,9,3,5,7 into the pile in turn
Process diagram:
Turn up the heap code :
void AdjustUp(HPDateType* arr, int child)
{
int parent = (child - 1) / 2;
while (child > 0)
{
if (arr[parent] < arr[child])//不断与祖先比较
{
Swap(arr, child, parent);
child = parent;
parent = (child - 1) / 2;
}
else//遇到根节点或者不大于祖先就停止
{
break;
}
}
}
add data to the heap
void HeapPush(Heap* hp, HPDateType x)
{
int child = hp->size;
if (hp->size == hp->capacity)//判断是否需要扩容
{
HPDateType* tmp = (HPDateType*)realloc(hp->arr, \/*换行符*/
sizeof(HPDateType) * hp->capacity * 2);
if (tmp == NULL)
{
perror("realloc fail");
exit(-1);
}
hp->arr = tmp;
hp->capacity *= 2;
}
hp->arr[hp->size] = x;
AdjustUp(hp->arr, hp->size);
hp->size++;
}
3. Delete data
- The data at the top of the heap is taken - generally we need the largest or smallest
- The data at the top of the stack should not be moved forward , which willdestroy the structure of the heap
- Generally, the data at the top of the heap is exchanged with the data at the bottom of the heap , and then the size is reduced by one to achieve the effect of deletion
- Use the relationship between the parent node and the child node to represent the left and right child nodes
- From the root node to the left and right children that are not bigger than it or the end of the data
- The data is 0 and can no longer be deleted!
Dynamic Diagram:
Process Diagram:
adjust down
void AdjustDown(HPDateType* arr, int parent, int size)
{
//假设左孩子为较大的孩子
int child = parent * 2 + 1;
while (child < size)//这里size是删过之后的数据个数,也是最后一个元素的下一个元素的下标
{
//先选出较大的孩子
if (child+1<size && arr[child + 1]>arr[child])
{
child++;
}
//错误示例:
// if (arr[child + 1]>arr[child]&&child+1<size)
// {
// child++;
// }
//说明:&&从左到右进行判断,这就好比:你犯错了还要弥补有什么用?
if (arr[child] > arr[parent])
{
Swap(arr, child, parent);
parent = child;
child = parent * 2 + 1;
}
else
{
break;
}
}
}
Delete heap data
void HeapPop(Heap* hp)
{
assert(hp);
assert(hp->size > 0);
Swap(hp->arr, hp->size-1, 0);
hp->size--;
AdjustDown(hp->arr, 0, hp->size);
}
4. Take the top element of the heap
- Only data can be fetched
- The pointer passed in is not null!
//取堆顶元素
HPDateType HeapTop(Heap* hp)
{
assert(hp);
assert(hp->size > 0);
return hp->arr[0];
}
4. Heap sort
- The data in the array is out of order - build a heap
- Sort the internal elements of the array - use a large root pile for ascending order/a small root pile for descending order
- no extra space is used
adjust up
- From the penultimate floor
- Each layer is compared with its child node's larger
- until not less than or to the last layer
Dynamic Diagram:
Process Diagram:
Time complexity - building up the heap
- Let the height be h
- From the last layer to the end of the first layer - [h-1,1]
- Each node in the last layer compares at most 1 time , and each node in the first layer compares at most h-1 times
high | Maximum number of comparisons | Number of nodes |
---|---|---|
1 | h-1 | 20 |
2 | h-2 | 21 |
…… | …… | …… |
h-1 | 1 | 2 h-2 |
Total times
- Suppose a total of T(N) times are compared, and the total number of binary tree nodes is N
- The method used: dislocation subtraction
- 2*T(N)= 2 1 *(h-1)+2 2 *(h-2)+……+2 h-2 *2+2 h-1 *1
- T(N)= 2 0 *(h-1)+2 1 *(h-2)+……+ 2 h-2 *1
- Subtract the first expression from the second expression.
- Get: T(N)= -2 0 (h-1)+2 1 +2 2 +...+2 h-2 +2 h-1
- Arranged: T(N) = 2 0 +2 1 +2 2 +...+2 h-1 +2 h-1 - h
- According to the sum of the first n items of the geometric sequence: S=a
1
(1-q n )/1-q, a1
is the first item, and q is the ratio - Here q=2, a
1
=1, substitute into the formula - Therefore: T(N)=2 h -1 - h
- And because the number of nodes N = 2 h - 1 (full binary tree)
- So T(N) =N-log 2 (N+1 ), when N is infinite, the latter term can be ignored.
- therefore:The time complexity is O(N), N is the number of nodes
Code implementation :
void AdjustDown(HPDateType* arr, int parent, int size)
{
//左孩子,假设左孩子为较大的孩子
int child = parent * 2 + 1;
while (child < size)
{
//先选出较大的孩子
if (child+1<size && arr[child + 1]>arr[child])
{
child++;
}
if (arr[child] > arr[parent])
{
Swap(arr, child, parent);//上文有
parent = child;
child = parent * 2 + 1;
}
else
{
break;
}
}
}
void HeapCreat(int* arr, int size)
{
//向下调整
//((size - 1) - 1) / 2 表示倒数第二层的倒数第一个节点
/*(size-1)是最后一个节点的下标*/
for (int i = ((size - 1) - 1) / 2; i >= 0; i--)
{
AdjustDown(arr, i, size);
}
}
adjust down
- From the first floor to the penultimate
- Each layer is compared with its child node's larger
- until not less than or to the last layer
Tulue~
Time Complexity - Scaled Down
- Let the height be h
- From the 2nd layer to the penultimate layer end - [2,h-1]
- Each node in the last layer compares at most h-1 times , and each node in the first layer compares at most 1 time
high | Maximum number of comparisons | Number of nodes |
---|---|---|
1 | 1 | 20 |
2 | 2 | 21 |
…… | …… | …… |
h-1 | h-1 | 2 h-2 |
- Method: Same as above
- Conclusion: T(N)=2 h *(h-2)+2
- Combination: the number of nodes N = 2 h - 1 (full binary tree)
- Arranged: T(N)=(N+1)*(log
2
(N+1)-2)+2 - When N tends to infinity, the magnitude of T(N) tends to N*log
2
N - Therefore: time complexity -O(n*log 2 N)
Code implementation :
void AdjustUp(HPDateType* arr, int child)
{
int parent = (child - 1) / 2;
while (child > 0)
{
if (arr[parent] < arr[child])
{
Swap(arr, child, parent);
child = parent;
parent = (child - 1) / 2;
}
else
{
break;
}
}
}
void HeapCreat(int* arr, int size)
{
//向下调整
for (int i = 1; i < size; i++)
{
AdjustUpTwo(arr, i);
}
}
sorting ideas
- Exchange the data at the top of the heap with the data at the bottom of the heap without deleting
- size minus 1, minus one is to keep the exchanged data
- Adjust the structure of the heap
Dynamic Diagram:
Process Diagram:
Code Implementation :
void HeapSort(int* arr, int size)
{
//第一步调堆
//建大根堆——升序
//第一种:向上调整
//for (int i = 1; i < size; i++)
//{
// AdjustUp(arr, i);
//}
//第二种:向下调整
for (int i = ((size - 1) - 1) / 2; i >= 0; i--)
{
AdjustDown(arr, i, size);
}
int tmp = size - 1;//最后一个元素的下标
//时间复杂度:n*logn
while (tmp)
{
Swap(arr, tmp, 0);
tmp--;
AdjustDownTwo(arr, 0, tmp+1);//tmp+1指的是当前元素的个数
}
}
5. TopK questions
- Purpose: Take out the top k large numbers/top K small numbers from massive data
- Heap size: about 2GB
- Limitation: 2GB can store up to 250 million integers (ideal conditions)
- Breakthrough: The hard disk has 512GB, which can store about 61.5 billion integers (ideal conditions), which is enough.
- Idea: Take k pieces of data randomly, build a small root heap/big root heap, and continuously take out and compare the data in the hard disk.
- Explanation: Take out the first N large ones and use the small root pile , and the top K smallest ones as the boundary, as the top of the heap, if it is larger than that, go in. The final result: the data on the top of the heap is the top K smallest ones. vice versa.
What I achieved is:From 100,000 data, take out the first 10 large numbers。
-
Create a text file in the directory of the source file
-
Output one million numbers (not greater than 10000) to a text file
Functions used:
- the fpo
- Return value: pointer to FILE*
- Parameter 1: file name - const char*
- Parameter 2: Open with - here read("r")
- fprintf
- Return value: the number of characters entered
- Parameter 1: File pointer - FILE*
- Parameter 2: input string - const char*
- Parameter 3: variable parameter list - data
void DatasCreat()
{
FILE* p = fopen("datas.txt", "w");
if (p == NULL)
{
perror("fopen fail");
exit(-1);
}
srand((unsigned int)time(NULL));//设置随机数种子
int i = 0;
for (i = 0; i < 1000000; i++)
{
int ret = rand() % 10000;//产生1到9999的数字
fprintf(p, "%d\n", ret);
}
fclose(p);//使用完要关闭文件
}
- Modify the data in 10 texts to make it greater than 10000
illustrate:Comment out after using this function! Re-use will refresh the file data, don't ask me how I know.
- Take out the first 10 elements in the file and build a small root heap
Here first give the complete function declaration and the function to build the small root heap :
void DataSort(const char* fname, int k);
//小根堆
void AdjustUpTwo(HPDateType* arr, int child)
{
int parent = (child - 1) / 2;
while (child > 0)
{
if (arr[parent] > arr[child])
{
Swap(arr, child, parent);
child = parent;
parent = (child - 1) / 2;
}
else
{
break;
}
}
}
void HeapCreat(int* arr, int size)
{
//向下调整
for (int i = 1; i < size; i++)
{
AdjustUpTwo(arr, i);
}
}
- fscanf
- Read end flag: EOF
- Return value: the number of elements read
- Parameter 1: File pointer - FILE*
- Parameter 2: The read content—const char*
- Parameter 3: Read the address of the target variable
FILE* fp = fopen(fname, "r");
if (fp == NULL)
{
perror("fopen:fail");
}
int i = 0;
int arr[10] = {
0 };//也可以在堆上开辟
for (i = 0; i < 10; i++)
{
fscanf(fp, "%d", &arr[i]);
}
//建小根堆
HeapCreat(arr, sizeof(arr) / sizeof(arr[0]));
- Take out the data of the file and compare it with the top element of the heap
int ret = 0;
while (fscanf(fp, "%d", &ret)!=EOF)//文件的数据读完就结束
{
if (ret > arr[0])
{
arr[0] = ret;
AdjustDownTwo(arr, 0, sizeof(arr) / sizeof(arr[0]));
}
}
- Sort in ascending order - good-looking (can be omitted)
HeapSort(arr, 10);
- print data
for (i = 0; i < 10; i++)
{
printf("arr[%d]:%d\n", i, arr[i]);
}
- close file
fclose(fp);
Summarize the code for TOPK sorting:
void DataSort(const char* fname, int k)
{
FILE* fp = fopen(fname, "r");
if (fp == NULL)
{
perror("fopen:fail");
}
int i = 0;
int arr[10] = {
0 };
for (i = 0; i < 10; i++)
{
fscanf(fp, "%d", &arr[i]);
}
//建小根堆
HeapCreat(arr, sizeof(arr) / sizeof(arr[0]));
int ret = 0;
while (fscanf(fp, "%d", &ret)!=EOF)
{
if (ret > arr[0])
{
arr[0] = ret;
AdjustDownTwo(arr, 0, sizeof(arr) / sizeof(arr[0]));
}
}
HeapSort(arr, 10);
for (i = 0; i < 10; i++)
{
printf("arr[%d]:%d\n", i, arr[i]);
}
fclose(fp);
}
void DatasCreat()
{
int i = 0;
FILE* p = fopen("datas.txt", "w");
if (p == NULL)
{
perror("fopen fail");
exit(-1);
}
srand((unsigned int)time(NULL));
for (i = 0; i < 1000000; i++)
{
int ret = rand() % 10000;
fprintf(p, "%d\n", ret);
}
fclose(p);
}
Summarize
Hope to help you!
Data items that can be identified in data elements↩︎