[Data structure] Binary tree - sequential structure

foreword

1. Parent representation

Since each nodeonly one parent node, so we can represent a tree by its parents. Specific way throughthe form of an arrayaccomplish.
insert image description here

Subscript law

  • The subscript of the root node is 0
  • Sort from top to bottom in layer order
  • Each layer increases from left to right

Representation:
insert image description here

storage method

  • Two-dimensional array
  • The column label of the data is 0 , only need to determine the row label to lock the position
  • The subscript of the parent node of the root node is -1
  • The column labeled 1 is stored in the parent node , and the stored data is the row label of the parent node

2. Complete binary tree conclusion

abea738a549d18565.png =400x)

Calculation of child nodes of complete binary tree

  1. Premise: Both the left and right children of the node exist
  2. Let the subscript of the node be N
  3. left child subscript = 2*N+1
  4. Right child subscript = 2*N+2

Calculation of the parent node of the complete binary tree

  1. Prerequisite: The parent node of the node exists and is not the root node
  2. Let the subscript of the node be N
  3. Parent node subscript = N/2 - 1

1. Sequential structure

1. Philosophy

  • The sequence structure is stored as an array
  • Ordinary binary tree storage is inconvenient
  • A heap is a complete binary tree . suitable for storage in one-dimensional arrays

Notice:Ordinary binary trees can be stored in one-dimensional arrays, but the space is seriously wasted.
Graphic:
insert image description here
Description :The deeper the depth, the more serious the waste of space

Replenish

  1. Logical structure:peopleunderstanding of the data, andabstract model
  2. Linear structure:computerunderstanding of the data, theComputer Languagesmapping in .
  3. Therefore: Binary trees do not have to be stored with pointers .

2. Heap

concept:

If there is a key code 1 set K = {k1, k2, k3, ..., }, store all its elements in a one-dimensional array in the order of complete binary tree storage , and satisfy: k i <= k 2 *i+1 and k i <=k 2*i+2 ( k i >=k 2*i+1 and k i >=k 2*i+2 ), i = 0, 1, 2..., then it is called for small (or large) heaps, Here i is the subscript of the node. The heap with the largest root node is called the largest heap or large root heap , and the heap with the smallest root node is called the smallest heap or small root heap .

small root heap

  • The root node of a tree is smaller than its child nodes
  • A tree can be divided into a root and a subtree
    insert image description here

Dagendui

  • The root node of a tree is larger than its child nodes

insert image description here

3. Implementation of the heap

  • The form of the array - sequence table
//大根堆
typedef int HPDateType;
typedef struct Heap
{
    
    
    HPDateType* arr;
    int size;
    int capacity;
}Heap;
  • Key idea: add and delete data

  • Increase data: increase from the bottom of the stack, upward adjustment

  • Delete data: pop out the top of the stack and adjust down

Here what I have achieved is:Dagendui

1. Initialize the heap

  • set size to 0, indicates the number of current data elements , and also indicates the subscript of the next data
  • size is set to -1, indicating the subscript of the current data element
void HeapCreate(Heap* hp)
{
    
    
    HPDateType* tmp = (HPDateType*)malloc(sizeof(HPDateType) * 4);
    if (tmp == NULL)
    {
    
    
        perror("malloc fail");
        exit(-1);
    }
    hp->arr = tmp;
    hp->size = 0;
    hp->capacity = 4;
}

Helper function - swap elements

  • Remember to pass pointers !
void Swap(HPDateType* arr, int child, int parent)
{
    
    
    int tmp = arr[child];
    arr[child] = arr[parent];
    arr[parent] = tmp;
}

2. Build heaps - increase data

  • Increase from the bottom of the pile and adjust the data upwards
  • The relationship between the child node and the parent node of the complete binary tree is compared
  • Child nodes are only compared with ancestors until no greater than the ancestor or until the root node .
    Put 1,9,3,5,7 into the pile in turn
    insert image description here
    Process diagram:
    insert image description here

Turn up the heap code :

void AdjustUp(HPDateType* arr, int child)
{
    
    
    int parent = (child - 1) / 2;
    while (child > 0)
    {
    
    
        if (arr[parent] < arr[child])//不断与祖先比较
        {
    
    
            Swap(arr, child, parent);
            child = parent;
            parent = (child - 1) / 2;
        }
        else//遇到根节点或者不大于祖先就停止
        {
    
    
            break;
        }
    }
}

add data to the heap

void HeapPush(Heap* hp, HPDateType x)
{
    
    
    int child = hp->size;
    if (hp->size == hp->capacity)//判断是否需要扩容
    {
    
    
        HPDateType* tmp = (HPDateType*)realloc(hp->arr, \/*换行符*/
        sizeof(HPDateType) * hp->capacity * 2);
        if (tmp == NULL)
        {
    
    
            perror("realloc fail");
            exit(-1);
        }
        hp->arr = tmp;
        hp->capacity *= 2;
    }
    hp->arr[hp->size] = x;
    AdjustUp(hp->arr, hp->size);
    hp->size++;
}

3. Delete data

  • The data at the top of the heap is taken - generally we need the largest or smallest
  • The data at the top of the stack should not be moved forward , which willdestroy the structure of the heap
  • Generally, the data at the top of the heap is exchanged with the data at the bottom of the heap , and then the size is reduced by one to achieve the effect of deletion
  • Use the relationship between the parent node and the child node to represent the left and right child nodes
  • From the root node to the left and right children that are not bigger than it or the end of the data
  • The data is 0 and can no longer be deleted!

Dynamic Diagram:
insert image description here
Process Diagram:
insert image description here

adjust down
void AdjustDown(HPDateType* arr, int parent, int size)
{
    
    
    //假设左孩子为较大的孩子
    int child = parent * 2 + 1;
    while (child < size)//这里size是删过之后的数据个数,也是最后一个元素的下一个元素的下标
    {
    
    
        //先选出较大的孩子
        if (child+1<size && arr[child + 1]>arr[child])
        {
    
    
            child++;
        }
        //错误示例:
       // if (arr[child + 1]>arr[child]&&child+1<size)
       // {
    
    
       //     child++;
       // }
       //说明:&&从左到右进行判断,这就好比:你犯错了还要弥补有什么用?
        if (arr[child] > arr[parent])
        {
    
    
            Swap(arr, child, parent);
            parent = child;
            child = parent * 2 + 1;
        }
        else
        {
    
    
            break;
        }
    }
}
Delete heap data
void HeapPop(Heap* hp)
{
    
    
    assert(hp);
    assert(hp->size > 0);
    Swap(hp->arr, hp->size-1, 0);
    hp->size--;
    AdjustDown(hp->arr, 0, hp->size);
}

4. Take the top element of the heap

  • Only data can be fetched
  • The pointer passed in is not null!
//取堆顶元素
HPDateType HeapTop(Heap* hp)
{
    
    
    assert(hp);
    assert(hp->size > 0);
    return hp->arr[0];
}

4. Heap sort

  1. The data in the array is out of order - build a heap
  2. Sort the internal elements of the array - use a large root pile for ascending order/a small root pile for descending order
  3. no extra space is used

adjust up

  • From the penultimate floor
  • Each layer is compared with its child node's larger
  • until not less than or to the last layer

Dynamic Diagram:
insert image description here
Process Diagram:
insert image description here

Time complexity - building up the heap
  • Let the height be h
  • From the last layer to the end of the first layer - [h-1,1]
  • Each node in the last layer compares at most 1 time , and each node in the first layer compares at most h-1 times
high Maximum number of comparisons Number of nodes
1 h-1 20
2 h-2 21
…… …… ……
h-1 1 2 h-2
Total times
  1. Suppose a total of T(N) times are compared, and the total number of binary tree nodes is N
  2. The method used: dislocation subtraction
  3. 2*T(N)= 2 1 *(h-1)+2 2 *(h-2)+……+2 h-2 *2+2 h-1 *1
  4. T(N)= 2 0 *(h-1)+2 1 *(h-2)+……+ 2 h-2 *1
  5. Subtract the first expression from the second expression.
  6. Get: T(N)= -2 0 (h-1)+2 1 +2 2 +...+2 h-2 +2 h-1
  7. Arranged: T(N) = 2 0 +2 1 +2 2 +...+2 h-1 +2 h-1 - h
  8. According to the sum of the first n items of the geometric sequence: S=a 1(1-q n )/1-q, a 1is the first item, and q is the ratio
  9. Here q=2, a 1=1, substitute into the formula
  10. Therefore: T(N)=2 h -1 - h
  11. And because the number of nodes N = 2 h - 1 (full binary tree)
  12. So T(N) =N-log 2 (N+1 ), when N is infinite, the latter term can be ignored.
  13. therefore:The time complexity is O(N), N is the number of nodes

Code implementation :

void AdjustDown(HPDateType* arr, int parent, int size)
{
    
    
    //左孩子,假设左孩子为较大的孩子
    int child = parent * 2 + 1;
    while (child < size)
    {
    
    
        //先选出较大的孩子
        if (child+1<size && arr[child + 1]>arr[child])
        {
    
    
            child++;
        }
        if (arr[child] > arr[parent])
        {
    
    
            Swap(arr, child, parent);//上文有
            parent = child;
            child = parent * 2 + 1;
        }
        else
        {
    
    
            break;
        }
    }
}
void HeapCreat(int* arr, int size)
{
    
    
    //向下调整
    
    //((size - 1) - 1) / 2 表示倒数第二层的倒数第一个节点
    /*(size-1)是最后一个节点的下标*/
    for (int i = ((size - 1) - 1) / 2; i >= 0; i--)
    {
    
    
        AdjustDown(arr, i, size);
    }
}

adjust down

  • From the first floor to the penultimate
  • Each layer is compared with its child node's larger
  • until not less than or to the last layer

Tulue~

Time Complexity - Scaled Down
  • Let the height be h
  • From the 2nd layer to the penultimate layer end - [2,h-1]
  • Each node in the last layer compares at most h-1 times , and each node in the first layer compares at most 1 time
high Maximum number of comparisons Number of nodes
1 1 20
2 2 21
…… …… ……
h-1 h-1 2 h-2
  1. Method: Same as above
  2. Conclusion: T(N)=2 h *(h-2)+2
  3. Combination: the number of nodes N = 2 h - 1 (full binary tree)
  4. Arranged: T(N)=(N+1)*(log 2(N+1)-2)+2
  5. When N tends to infinity, the magnitude of T(N) tends to N*log 2N
  6. Therefore: time complexity -O(n*log 2 N)

Code implementation :

void AdjustUp(HPDateType* arr, int child)
{
    
    
    int parent = (child - 1) / 2;
    while (child > 0)
    {
    
    
        if (arr[parent] < arr[child])
        {
    
    
            Swap(arr, child, parent);
            child = parent;
            parent = (child - 1) / 2;
        }
        else
        {
    
    
            break;
        }
    }
}
void HeapCreat(int* arr, int size)
{
    
    
    //向下调整
    for (int i = 1; i < size; i++)
    {
    
    
        AdjustUpTwo(arr, i);
    }
}
sorting ideas
  1. Exchange the data at the top of the heap with the data at the bottom of the heap without deleting
  2. size minus 1, minus one is to keep the exchanged data
  3. Adjust the structure of the heap

Dynamic Diagram:
insert image description here
Process Diagram:
insert image description here
Code Implementation :

void HeapSort(int* arr, int size)
{
    
    
    //第一步调堆
	//建大根堆——升序
    //第一种:向上调整
    //for (int i = 1; i < size; i++)
    //{
    
    
    //    AdjustUp(arr, i);
    //}
    //第二种:向下调整
    for (int i = ((size - 1) - 1) / 2; i >= 0; i--)
    {
    
    
        AdjustDown(arr, i, size);
    }
    int tmp = size - 1;//最后一个元素的下标
    //时间复杂度:n*logn
    while (tmp)
    {
    
    
        Swap(arr, tmp, 0);
        tmp--;
        AdjustDownTwo(arr, 0, tmp+1);//tmp+1指的是当前元素的个数
    }
}

5. TopK questions

  • Purpose: Take out the top k large numbers/top K small numbers from massive data
  • Heap size: about 2GB
  • Limitation: 2GB can store up to 250 million integers (ideal conditions)
  • Breakthrough: The hard disk has 512GB, which can store about 61.5 billion integers (ideal conditions), which is enough.
  • Idea: Take k pieces of data randomly, build a small root heap/big root heap, and continuously take out and compare the data in the hard disk.
  • Explanation: Take out the first N large ones and use the small root pile , and the top K smallest ones as the boundary, as the top of the heap, if it is larger than that, go in. The final result: the data on the top of the heap is the top K smallest ones. vice versa.

What I achieved is:From 100,000 data, take out the first 10 large numbers

  1. Create a text file in the directory of the source file
    insert image description here

  2. Output one million numbers (not greater than 10000) to a text file

  Functions used:

  • the fpo
  • Return value: pointer to FILE*
  • Parameter 1: file name - const char*
  • Parameter 2: Open with - here read("r")
  • fprintf
  • Return value: the number of characters entered
  • Parameter 1: File pointer - FILE*
  • Parameter 2: input string - const char*
  • Parameter 3: variable parameter list - data
void DatasCreat()
{
    
    
    
    FILE* p = fopen("datas.txt", "w");
    if (p == NULL)
    {
    
    
        perror("fopen fail");
        exit(-1);
    }
    srand((unsigned int)time(NULL));//设置随机数种子
    int i = 0;
    for (i = 0; i < 1000000; i++)
    {
    
    
        int ret = rand() % 10000;//产生1到9999的数字
        fprintf(p, "%d\n", ret);
    }
    fclose(p);//使用完要关闭文件
}
  1. Modify the data in 10 texts to make it greater than 10000
    insert image description here

illustrate:Comment out after using this function! Re-use will refresh the file data, don't ask me how I know.

  1. Take out the first 10 elements in the file and build a small root heap

Here first give the complete function declaration and the function to build the small root heap :
void DataSort(const char* fname, int k);

//小根堆
void AdjustUpTwo(HPDateType* arr, int child)
{
    
    
    int parent = (child - 1) / 2;
    while (child > 0)
    {
    
    
        if (arr[parent] > arr[child])
        {
    
    
            Swap(arr, child, parent);
            child = parent;
            parent = (child - 1) / 2;
        }
        else
        {
    
    
            break;
        }
    }
}
void HeapCreat(int* arr, int size)
{
    
    
    //向下调整
    for (int i = 1; i < size; i++)
    {
    
    
        AdjustUpTwo(arr, i);
    }
}
  • fscanf
  • Read end flag: EOF
  • Return value: the number of elements read
  • Parameter 1: File pointer - FILE*
  • Parameter 2: The read content—const char*
  • Parameter 3: Read the address of the target variable
    FILE* fp = fopen(fname, "r");
    if (fp == NULL)
    {
    
    
        perror("fopen:fail");
    }
    int i = 0;
    int arr[10] = {
    
     0 };//也可以在堆上开辟
    for (i = 0; i < 10; i++)
    {
    
    
        fscanf(fp, "%d", &arr[i]);
    }
    //建小根堆
    HeapCreat(arr, sizeof(arr) / sizeof(arr[0]));
  1. Take out the data of the file and compare it with the top element of the heap
    int ret = 0;
    while (fscanf(fp, "%d", &ret)!=EOF)//文件的数据读完就结束
    {
    
    
        
        if (ret > arr[0])
        {
    
    
            arr[0] = ret;
            AdjustDownTwo(arr, 0, sizeof(arr) / sizeof(arr[0]));
        }
    }
  1. Sort in ascending order - good-looking (can be omitted)
    HeapSort(arr, 10);
  1. print data
    for (i = 0; i < 10; i++)
    {
    
    
        printf("arr[%d]:%d\n", i, arr[i]);
    }
  1. close file
    fclose(fp);

Summarize the code for TOPK sorting:

void DataSort(const char* fname, int k)
{
    
    
    FILE* fp = fopen(fname, "r");
    if (fp == NULL)
    {
    
    
        perror("fopen:fail");
    }
    int i = 0;
    int arr[10] = {
    
     0 };
    for (i = 0; i < 10; i++)
    {
    
    
        fscanf(fp, "%d", &arr[i]);
    }
    //建小根堆
    HeapCreat(arr, sizeof(arr) / sizeof(arr[0]));
    int ret = 0;
    while (fscanf(fp, "%d", &ret)!=EOF)
    {
    
    
        
        if (ret > arr[0])
        {
    
    
            arr[0] = ret;
            AdjustDownTwo(arr, 0, sizeof(arr) / sizeof(arr[0]));
        }
    }
    HeapSort(arr, 10);
    for (i = 0; i < 10; i++)
    {
    
    
        printf("arr[%d]:%d\n", i, arr[i]);
    }
    fclose(fp);
}
void DatasCreat()
{
    
    
    int i = 0;
    FILE* p = fopen("datas.txt", "w");
    if (p == NULL)
    {
    
    
        perror("fopen fail");
        exit(-1);
    }
    srand((unsigned int)time(NULL));
    for (i = 0; i < 1000000; i++)
    {
    
    
        int ret = rand() % 10000;
        fprintf(p, "%d\n", ret);
    }
    fclose(p);
}

Summarize

Hope to help you!


  1. Data items that can be identified in data elements↩︎

Guess you like

Origin blog.csdn.net/Shun_Hua/article/details/129782619