Notes on Kingly Data Structure

Notes on Kingly Data Structure

Chapter One Introduction

img

1.1 Basic concepts

Data is the carrier of information, a collection of numbers, characters, and all symbols that can be input into a computer and recognized and processed by computer programs to describe the attributes of objective things.

Data elements are the basic units of data

A data element can be composed of several data items. A data item is the indivisible smallest unit that constitutes a data element.

A data structure is a collection of data elements that have one or more specific relationships with each other.

A data object is a collection of data elements with the same properties and is a subset of data.

Data type: a collection of values ​​and a set of operations defined on the collection.

Atomic type: the value cannot be subdivided
Structural type: the value can be subdivided
Abstract data type: abstract data organization and related operations

1.2 Three elements of data structure

image-20230727000113782

Logical structure: the logical relationship between data elements, which has nothing to do with storage and is independent of computers (the design of an algorithm)

Linear Structure: One to One

All elements except the first element have a unique predecessor; all elements except the last element have a unique successor

Tree structure: one-to-many

Network/graph: many-to-many

Collection: Belong to the same collection

Storage structure: the representation of data structure in a computer, also called image/physical structure (implementation of an algorithm)

Sequential storage: Store logically adjacent elements in storage units that are also physically adjacent. The relationship between elements is reflected by the adjacency relationship of the storage units.

Advantages: Random access, elements occupy the least storage space

Disadvantages: Only an adjacent block of storage units can be used, resulting in more external fragments

Chain storage: Logically adjacent elements do not need to be physically adjacent. The logical relationship between elements is represented by a pointer indicating the storage address of the element.

Advantages: No fragmentation

Disadvantages: Storage pointers occupy additional storage space; can only be accessed sequentially

Index storage: Create additional index tables. Each item in the index table is called an index item, and the general form of an index item is (keyword, address)

Advantages: Fast retrieval

Disadvantages: takes up more storage space; adding and deleting data requires modifying the index table, which takes more time

Hash storage: directly calculate the storage address of the element based on the element's keyword, also called hash storage

Advantages: Retrieval, adding and deleting nodes are fast

Disadvantages: If the hash function is not good, element storage unit conflicts will occur, which will increase the cost of time and space.

Data operations

The definition of operations is for logical structures and points out the functions of operations

The implementation of the operation is for the storage structure, and the specific operation steps of the operation are pointed out.

Easy to make mistakes

Belongs to a logically structured ordered list

A circular queue is a queue represented by a sequence table. It is a data structure, not an abstract data structure.

The storage spaces of different nodes can be discontinuous, but the storage spaces within the nodes must be continuous.

Two different data structures, the logical structure and the physical structure can be exactly the same, but the data operations are different

1.3 Concept of algorithm

Algorithm: A description of the steps to solve a specific problem, a finite sequence of instructions, where each instruction represents one or more operations

Characteristics of algorithms

Finiteness: An algorithm must always end after executing finite steps, and each step can be completed in finite time.

Algorithms are finite and programs are infinite.

Determinism: Each instruction in the algorithm must have an exact meaning, and the same input can only produce the same output.

Feasibility: The operations described in the algorithm can be implemented by executing the basic operations a limited number of times

Input: An algorithm has zero or more inputs, which are taken from a specific collection of objects.

Output: An algorithm has one or more outputs, which are quantities that have a specific relationship with the input.

good algorithm

1. 正确性
2. 可读性
3. 健壮性:输入非法数据时,算法能适当地做出反应或进行处理,而不会产生莫名其妙的输出结果
4. 高效率与低存储量需求 (时间复杂度低、空间复杂度低)

1.4 Measurement of algorithm efficiency

The time complexity of the algorithm

Definition: The relationship between the estimated algorithm time cost T(n) in advance and the problem size n

Measures how quickly the algorithm execution time increases as the problem size increases.

For the same algorithm, the higher the level of the implemented language, the lower the execution efficiency.

O(1) < O(log2n) < O(n) < O(nlog2n) < O(n2 ) < O(n3 ) < O(2n) < O(n!) < O(nn)  常对幂指阶

img

Algorithm space complexity

Measures how fast the space required by the algorithm increases as the size of the problem increases.

img

Chapter 2 Linear Table

2.1 Definition and basic operations of linear tables

img

Definition of linear table

A linear table is a finite sequence of n (n≥0) data elements of the same data type, where n is the length of the table. When n = 0, the linear table is an empty table. If L is used to name a linear table, it is generally expressed as L = (a1, a2, … , ai , ai+1, … , an)

ai is the "i-th" element in the linear list. The bit order in the linear list

a1 is the head element; an is the tail element

Every element except the first element has one and only one direct predecessor; every element except the last element has one and only one direct successor.

Basic operations of linear tables

InitList(&L): Initialization list. Construct an empty linear table L and allocate memory space.

DestroyList(&L): Destroy operation. Destroy the linear table and release the memory space occupied by linear table L.

ListInsert(&L,i,e): Insertion operation. Insert the specified element e at the i-th position in table L.

ListDelete(&L,i,&e): Delete operation. Delete the element at position i in table L and use e to return the value of the deleted element.

LocateElem(L,e): Find operation by value. Find elements with a given key value in table L.

GetElem(L,i): Bitwise search operation. Get the value of the element at position i in table L.

& represents a reference in C++. If a pointer variable is passed in and needs to be changed within the function body, a reference to the pointer variable must be used (a pointer to a pointer can also be used in C)

2.2 Sequential representation of linear tables

img

Definition of sequence table

Sequential table —implementing a linear table using sequential storage

Sequential storage : Store logically adjacent elements in storage units that are also physically adjacent. The relationship between elements is reflected by the adjacency relationship of the storage units.

Implementation of sequence table : static allocation, dynamic allocation

Dynamic distribution statement:

 L.data=(ElemType*)malloc(sizeof(ElemType)*InitSize)
//malloc函数申请一片连续的存储空间
//free函数释放原来的内存空间

Dynamic allocation is not chain storage, it is also a sequential storage structure, and the physical structure does not change: it is a random access method, but the allocated space size can be determined at runtime.

Features : Random access, high storage density, insertion and deletion require moving a large number of elements

顺序表的操作

1. **插入操作**  平均时间复杂度O(n)
2. **删除操作**  平均时间复杂度O(n)
img
  1. Search operations : search by value, search by bit

    img

2.3 Chained representation of linear tables

Single list

img

The difference between the head pointer and the head node: (1) Regardless of whether there is a head node or not, the head pointer always points to the first node of the linked list.

(2) The head node is the first node in the linked list with the head node and usually does not store information.

Advantages of introducing the head node: No matter whether the linked list is empty or not, the head pointer points to the non-null pointer of the head node, and empty lists and non-empty lists are treated the same

img

Single linked list operations

Create singly linked list

Core: initialization operation, post-insertion operation of specified nodes

(1) Head insertion method, inversion of linked list

(2) Tail insertion method, pay attention to setting a pointer to the end node of the table

Insert node operation

(1) Insert in bit order (leading node)

(2) Insert in order (without leading node)

(3) Forward insertion operation of specified node: first find the previous node, the time complexity is O(n)

(4) Convert the forward insert operation into a backward insert operation, and then exchange the data of the two nodes. The time complexity is O(1)

Delete node operation

(1) Delete in bit order (leading node)

(2) Deletion of the specified node: first find the predecessor node, then delete the node, O(n)
img

Find node operation

(1) Search by bit

(2) Search by value

img

Double linked list

img img img img img

circular linked list

(1) Circular singly linked list: the next pointer of the end node of the list points to the head node

When operating on the head and tail of a singly linked list: do not set the head pointer and only set the tail pointer, which is more efficient.

You can traverse the entire linked list starting from any node

(2) Circular doubly linked list: The priority of the head node points to the tail node, and the next node of the tail node points to the head node.

img

static linked list

An array is used to describe the linked storage structure, which also has a data field and a pointer field. The pointer is the relative address of the node (array subscript), also called a cursor.

Insertion and deletion only require modifying the pointer and do not require moving elements.

img

Comparison of sequence list and linked list

  1. Logical structures all belong to linear tables and are all linear structures.

  2. Storage structure sequence table: sequential storage linked list: chained storage

  3. Basic operations – initialization

    img

Basic operations – add and delete

img

Basic operation – check

img
  1. how to choose

    (1) Storage considerations: Linked lists are used when it is difficult to estimate the length and storage scale, but the storage density of linked lists is low.

    (2) Considerations based on operations: It is often necessary to use a sequence table to access data elements by serial number.

    (3) Based on environmental considerations: choose a sequence list if it is more stable, or a linked list if it is more dynamic.

Chapter 3 Stack, Queue and Array

3.1 Stack

definition

A linear table that only allows insertion or deletion at one end

Features: first in, last out, last in, first out

Top of stack, bottom of stack, empty stack

Basic operations

InitStack(&S): Initialize the stack. Construct an empty stack S and allocate memory space.

DestroyStack(&S): Destroy the stack. Destroy and release the memory space occupied by stack S.

Push(&S,x): Push into the stack. If the stack S is not full, add x to make it the top of the new stack.

Pop(&S,&x): Pop the stack. If stack S is not empty, pop the top element of the stack and return with x.

GetTop(S, &x): Read the top element of the stack. If stack S is not empty, use x to return the top element of the stack
img

The sequential storage structure of the stack

accomplish

Stack top pointer: S.top Stack top element: S.data[S.top]

Push to the stack: The pointer is first incremented by 1, and then the value is sent to the top element of the stack.

Pop the stack: First take the value of the top element of the stack, and then decrement the top pointer by 1

shared stack

Definition: Set the bottoms of the two stacks at both ends of the shared space, and the tops of the two stacks extend toward the middle

判空: top0=-1 top1=MaxSize

Full sentence: top1-top0=1

Pushing onto the stack: top0 is first incremented by 1 and then assigned, top1 is first decremented by 1 and then assigned, and popping out of the stack is the opposite.
img

stack chain storage structure

Advantages : It facilitates multiple stacks to share storage space, improves its efficiency, and prevents stack overflow.

Features : All operations are performed at the head of the table. There is usually no head node. The head pointer is used as the top pointer of the stack to facilitate node insertion/deletion.

img

3.2 Queue

definition

A queue is a linear list that only allows insertion (enqueue) at one end and deletion (dequeue) at the other end.

Head of queue, tail of queue, empty queue

Features: first in, first out

Basic operations of queues

InitQueue(&Q): Initialize the queue and construct an empty queue Q.

DestroyQueue(&Q): Destroy the queue. Destroy and release the memory space occupied by queue Q.

EnQueue(&Q,x): Enter the queue. If the queue Q is not full, add x to make it the new end of the queue.

DeQueue(&Q,&x): Dequeue. If queue Q is not empty, delete the head element and return with x.

GetHead(Q,&x): Read the head element of the queue. If the queue Q is not empty, assign the head element to x
img

The sequential storage structure of the queue

accomplish

  1. Two pointers: front points to the head element of the queue, rear points to the next position of the tail element of the queue

  2. Initial state (queue empty): Q.front== Q.rear==0

  3. Enter the queue: first send the value to the last element of the queue, and then add 1 to the last pointer of the queue

  4. Dequeue: First take the value of the head element, and then add 1 to the head pointer

  5. False overflow exists

    img

Queue chain storage structure

It is suitable for situations where data elements change greatly. There is no queue overflow or storage allocation inconsistency among multiple queues.

img

deque

A linear table that only allows insertion and deletion from both ends

Input-restricted deque: a linear list that only allows insertions from one end and deletions from both ends.

Output-restricted double-ended queue: a linear list that only allows insertions from both ends and deletions from one end.

img

3.3 Application of stack and queue

3.3 Application of stack and queue

The last-occurring left parenthesis is matched first.

  1. Set an empty stack and read parentheses sequentially

  2. If it is), it will be popped out of the stack when paired with the top of the stack (or it is illegal

  3. If it is ( , push it onto the stack as a new more urgent expectation

  4. The algorithm ends and the stack is empty, otherwise the bracket sequence does not match

    img
3.3.2 Application of stack in expression evaluation
img

Convert infix to suffix

Manual calculation method of converting infix to suffix

① Determine the order of operation of each operator in the infix expression

② Select the next operator and combine it into a new operand according to == "left operand right operand operator" ==

③ If there are still operators that have not been processed, continue ②

"Left-first" principle: As long as the operator on the left can be calculated first, the operator on the left will be calculated first to ensure that the order of operations is unique.

Hand calculation method of postfix expressions: Scan from left to right. Whenever an operator is encountered, the two nearest operands in front of the operator perform the corresponding operation and combine into one operand.

Computer method for converting infix to suffix

Initialize a stack to save operators whose order of operations cannot yet be determined.

Each element is processed from left to right until the end. Three situations may occur:

① The operand is encountered. Add the suffix expression directly.

② Encountered a boundary symbol. When "(" is encountered, it is pushed directly onto the stack; when ")" is encountered, the operators in the stack are popped up and suffix expressions are added until "(" is popped up. Note: "(" does not add suffix expressions.

③ Operator is encountered. Pop all operators in the stack with a priority higher than or equal to the current operator in sequence, and add suffix expressions. Stop if "(" is encountered or the stack is empty. Then push the current operator onto the stack

Postfix expression calculation (algorithm implementation)

Use the stack to implement the calculation of postfix expressions:

①Scan the next element from left to right until all elements are processed

②If the operand is scanned, push it onto the stack and return to ①; otherwise, execute ③

③If the operator is scanned, pop up the two top elements of the stack (the one popped out of the stack first is the right operand), perform the corresponding operation, and push the operation result back to the top of the stack, returning to ①

Infix expression calculation (stack implementation) Infix to suffix + postfix expression evaluation

Initialize two stacks, the operand stack and the operator stack.
If the operand is scanned, it is pushed into the operand stack
. If the operator or delimiter is scanned, it is pushed into the operator stack according to the same logic of "infix to suffix" (during Operators will also be popped up. Whenever an operator is popped up, the top elements of the two operand stacks need to be popped up and the corresponding operations are performed. The operation results are then pushed back to the operand stack). Infix to
prefix

Manual calculation method of converting infix to prefix:

① Determine the order of operation of each operator in the infix expression

② Select the next operator and combine it into a new operand according to == "operator left operand right operand" ==

③ If there are still operators that have not been processed, continue ②

"Right priority" principle: as long as the operator on the right can be calculated first, the operator on the right will be calculated first.

Prefix expression calculation (algorithm implementation)

Use the stack to implement the calculation of prefix expressions:

①Scan the next element from right to left until all elements are processed

②If the operand is scanned, push it onto the stack and return to ①; otherwise, execute ③

③If the operator is scanned, pop up the two top elements of the stack (the first one popped is the left operand), perform the corresponding operation, and push the operation result back to the top of the stack, returning to ①
img

3.3.3 Application of stack in recursion

Characteristics of function calls: The last called function ends first (LIFO)

When a function is called, a stack is needed to store: ① call return address ② actual parameters ③ local variables

Recursion: the original problem can be transformed into a smaller problem with the same properties

Two conditions 1. Recursive expression (recursive body) 2. Boundary condition (recursive exit)

When calling recursively, the function call stack can be called the "recursive work stack." Each time a level of recursion is entered, the information required for the recursive call is pushed onto the top of the stack. Each time a level of recursion is exited, the corresponding information is popped from the top of the stack.

Disadvantages: low efficiency, too many levels of recursion may cause stack overflow; may include many repeated calculations

3.3.4 Application of Queue

Application in hierarchical traversal

  1. tree traversal
  2. Breadth-first traversal of the graph

Applications in computer systems

  1. FCFS first come first served
  2. Resolve speed mismatch between host and external device
  3. Solve resource competition problems caused by multiple users

3.4 Arrays and special matrices

3.4.1 Array

Array : A finite sequence composed of n (n>=1) data elements of the same type. Each data element is called an array element.

Arrays are a generalization of linear tables

Array address calculation

  1. one-dimensional array

    img
  2. Two-dimensional array – row-major

    img
  3. Two-dimensional array – column-major

    img

3.4.2 Compressed storage of special matrices

Compressed storage : Multiple elements with the same value only allocate one space, 0 does not allocate space

Compressed storage of symmetric matrices

If any element ai,j in an n-order square matrix has ai,j = aj,i, then the matrix is ​​a symmetric matrix

img

Compressed storage of triangular matrices

Lower triangular matrix : except for the main diagonal and the lower triangle area, the remaining elements are the same

Upper triangular matrix : except for the main diagonal and upper triangle area, the remaining elements are the same

img img

Compressed storage of tridiagonal matrices

img

Compressed storage of sparse matrices

Compression storage strategy:

  1. Sequential storage - triplet <row, column, value>

  2. Chained storage - cross linked list method

    img
img

Chapter 4 String

4.1 Definition and implementation

4.1.1 Definition

A string is a finite sequence of zero or more characters.

T=‘iPhone 11 Pro Max?’

Substring: A subsequence composed of any consecutive characters in the string. Eg: 'iPhone', 'Pro M' is a substring of string T

Main string: A string containing substrings. Eg: T is the main string of the substring 'iPhone'

The position of the character in the main string: the sequence number of the character in the string. Eg: The position of '1' in T is 8 (first occurrence)

The position of the substring in the main string: the position of the first character of the substring in the main string. Eg: The position of '11 Pro' in T is

String data objects are limited to character sets (such as Chinese characters, English characters, numeric characters, punctuation characters, etc.)

Basic operations on strings, such as addition, deletion, modification, etc., usually use substrings as the operation objects.

Basic string operations

StrAssign(&T,chars): assignment operation. Assign the string T to chars.

StrCopy(&T,S): copy operation. String T is obtained by copying string S.

StrEmpty(S): Empty operation. If S is an empty string, it returns TRUE, otherwise it returns FALSE.

StrLength(S): Find the string length. Returns the number of elements of string S. ClearString(&S): Clear operation. Clear S to an empty string.

DestroyString(&S): Destroy string. Destroy string S (reclaim storage space).

Concat(&T,S1,S2): series connection. Use T to return a new string formed by concatenating S1 and S2

SubString(&Sub,S,pos,len): Find substring. Use Sub to return a substring of length len starting from the pos-th character of string S.

Index(S,T): Positioning operation. If there is a substring with the same value as string T in the main string S, the position where it first appears in the main string S is returned; otherwise, the function value is 0.

StrCompare(S,T): comparison operation. If S>T, the return value>0; if S=T, the return value=0; if S

img
4.1.2 String storage structure

sequential storage

img
4.1.3 Basic operations
  1. Find substring

    img
  2. Compare

    img
  3. position

    img
img

4.2 String pattern matching

4.2.1 Simple pattern matching algorithm

String pattern matching: Find the substring that is the same as the pattern string in the main string and return its location

n is the main string length and m is the pattern string length.

Naive pattern matching algorithm: Compare all substrings of length m in the main string with the pattern string in sequence until a complete match is found or all substrings do not match.

The current substring match fails: the main string pointer i points to the first position of the next substring, and the pattern string pointer j returns to the first position of the pattern string.

The current substring matches successfully: returns the position of the first character of the current substring

Up to (n-m+1)*m comparisons are required until the match succeeds/the match fails.

Worst time complexity: O(nm)
img

4.2.2 KMP algorithm
img

Worst time complexity: O(m+n)

img

Chapter 5 Trees and Binary Trees

5.1 Basic concepts of trees

5.1.1 Definition of tree

A tree is a finite set of n (n≥0) nodes. When n = 0, it is called an empty tree, which is a special case.

In any non-empty tree it should satisfy:

1) There is and is only one specific node called the root.

2) When n > 1, the remaining nodes can be divided into m (m > 0) disjoint finite sets T1, T2,..., Tm, where each set itself is a tree and is called the root The subtree of the node.

Characteristics of non-empty trees:

There is and is only one root node.
Nodes without successors are called "leaf nodes" (or terminal nodes).
Nodes with successors are called "branch nodes" (or non-terminal nodes).
Except for the root node, any A node has one and only one predecessor.
Each node can have 0 or more successors.
A tree is a recursively defined data structure

5.1.2 Basic terminology

The degree of a node. The number of child nodes of a node. The degree
of a tree. The maximum degree of a node in the tree.
The depth of a node, starting from the root node and accumulating layer by layer from top to bottom.
The height of the node, starting from the leaf node and going from the bottom. Accumulate upward layer by layer
. The height (depth) of the tree. The maximum number of layers of nodes in the tree. The
path between two nodes. The sequence of nodes passing between two nodes.
The length of the path. The number of edges passing through on the path.
Pay attention to the branches in the tree. It is directed (parents point to the child), the path is from top to bottom, and there is no path between the two children
. Logically speaking, the subtrees of the nodes in the tree are ordered from left to right. Not interchangeable

Unordered tree - logically, the subtrees of the nodes in the tree are unordered from left to right and can be interchanged.

**Forest: **A forest is a collection of m (m≥0) disjoint trees
img

5.1.3 Properties of trees
  1. Number of nodes = total degree + 1

  2. The difference between a tree of degree m and an m-ary tree

    img

The i-th level of a tree with degree m has at most mi-1 nodes (i≥1)

The i-th level of the m-ary tree has at most mi-1 nodes (i≥1)

An m-ary tree with height h has at least h nodes.

A tree with height h and degree m has at least h+m-1 nodes.

An m-ary tree with height h has at most (mh -1)/m-1 nodes.

The minimum height of an m-ary tree with n nodes is [logm(n(m - 1) + 1)]
img

5.2 Concept of binary tree

5.2.1 Definition and main characteristics of binary tree

A binary tree is a finite set of n (n≥0) nodes:

① Or it is an empty binary tree, that is, n = 0.

② Or it consists of a root node and two disjoint left subtrees and right subtrees called roots. The left subtree and the right subtree are each a binary tree.

Features: ① Each node has at most two subtrees ② The left and right subtrees cannot be reversed (a binary tree is an ordered tree)

The difference between a binary tree and an ordered tree of degree 2

A tree with degree 2 has at least 3 nodes, and a binary tree can be empty.

The left and right order of the children of an ordered tree with degree 2 does not need to distinguish the left and right order relative to another child.

Binary tree is an ordered tree
img

special binary tree

Full binary tree: a binary tree with height h and 2h - 1 nodes

Each level in the tree contains the most nodes. Only the last level has leaf nodes and there are no nodes with degree 1.

Full binary tree: A complete binary tree is called a complete binary tree if and only if each node corresponds to the nodes numbered 1 to n in the full binary tree of height h.

Leaf nodes are only on the two largest levels. If there is a node with degree 1, there is only one, and this node can only be the left child.
img

Binary sorting tree

(1) The keywords of all nodes on the left subtree are less than the root node

(2) The keywords of all nodes on the right subtree are greater than the root node

(3) The left and right subtrees are each a binary sorting tree.

img
  1. Balanced binary tree: The depth difference between the left subtree and the right subtree of any node in the tree does not exceed 1 (high search efficiency)

    img

5.2.2 Properties of binary trees

Assume that the number of nodes with degrees 0, 1 and 2 in the non-empty binary tree are n0, n1 and n2 respectively, then n0 = n2 + 1 (there is one more leaf node than the two-branch node)

The i-th level of the binary tree has at most 2i-1 nodes (i≥1)

The i-th level of the m-ary tree has at most mi-1 nodes (i≥1)

An m-ary tree with height h has at most (mh -1)/m-1 nodes.

A binary tree of height h has at most 2h-1 nodes.

Common test points for complete binary trees

The height h of a complete binary tree with n (n > 0) nodes is log2(n + 1) or log2n + 1

For a complete binary tree, it can be deduced from the number of nodes n that the number of nodes with degrees 0, 1 and 2 is n0, n1 and n2

If a complete binary tree has 2k (even) nodes, then there must be n1=1, n0 = k, n2 = k-1

If a complete binary tree has 2k-1 (odd number) nodes, then there must be n1=0, n0 = k, n2 = k-1

5.2.3 Storage structure of binary tree

sequential storage

Several important basic operations that are often tested:

The left child of i - 2i
The right child of i - 2i+1
The parent node of i - i/2
The level where i is - log2(n + 1) or log2n+ 1
If there are n nodes in total in the complete binary tree, but

Determine whether i has a left child? ——2i ≤ n
determine whether i has a right child? ——2i+1 ≤ n
determine whether i is a leaf/branch node? ——In the sequential storage of i > n/2
binary trees, the node number of the binary tree must be corresponding to the complete binary tree.

Worst case scenario: a single-branch tree with height h and only h nodes (all nodes have only right children) also requires at least 2h-1 storage units.

Conclusion: The sequential storage structure of binary trees is only suitable for storing complete binary trees.

chain storage

Binary linked list has 3 fields: data, lchild, rchild

A binary linked list of n nodes has n+1 empty link fields (the root node does not use a pointer) to form a clue linked list.
img

5.3 Binary tree traversal and clue binary trees

5.3.1 Binary tree traversal

Traversal : visit all nodes in a certain order

Preorder traversal: around the root (NLR)

img

In-order traversal: left root right (LNR)

img

Postorder traversal: left and right roots (LRN)

img

level-order traversal

Algorithmic idea:

①Initialize an auxiliary queue

②Root node joins the team

③If the queue is not empty, the head node of the queue is dequeued, accesses the node, and inserts its left and right children into the end of the queue (if any)

④Repeat ③ until the queue is empty

Construct a binary tree from a traversal sequence

If only one of the front/middle/last/level-order traversal sequences of a binary tree is given, a binary tree cannot be uniquely determined.

Preorder and preorder
(1) In preorder: the first node is the root node

(2) In mid-order: the root node is divided into two subsequences, the front left subtree and the back right subtree.

(3) In preorder: Find two subsequences, and the first node of each is also the root node.
img

Postorder and Inorder The last node in postorder is equivalent to the first node in preorder

img

Layer sequence and post-sequence are not allowed

img img
5.3.2 Clue binary tree (didn’t understand)

Purpose : Speed ​​up the search for node predecessors and successors

Clue : Pointers to predecessors and successors

Threading : The process of traversing a binary tree in a certain order to turn it into a threaded binary tree.

If there is no left subtree, let lchild point to the predecessor node; if there is no right subtree, let rchild point to the predecessor of the successor node, and the successor is determined by the specific traversal method.

img img img

Binary tree threading

img

5.4 Trees and forests

5.4.1 Tree storage structure

Parental notation (sequential storage)

  1. Definition: Continuous space storage, each node adds a pseudo pointer to indicate the position of the parents in the array, the root node subscript is 0, and its pseudo pointer is -1

  2. Features: Parents can be obtained quickly, but the child must traverse the entire structure

    img

Child representation (sequential + chained storage)

  1. Definition: Store each node sequentially, and store the head pointer of the child list in each node.

  2. Features: It is very convenient to ask for children, but inconvenient to ask for parents.

    img

Child brother representation (chained storage)

  1. Definition: The left pointer points to the first child, the right pointer points to the first brother, and the binary linked list is used as a storage structure

  2. Advantages: Convenient to convert tree into binary tree, easy to find children

  3. Disadvantages: It is troublesome to find parents. It will be convenient if you add parent to point to the parents.

    img
5.4.2 Conversion of trees, forests and binary trees

Convert tree to binary tree

The left pointer points to the first child, the right pointer points to the first brother, the root has no brothers, and the binary tree has no right subtree.

img

Convert forest to binary tree

The trees in the forest are converted into binary trees in turn, and the root of each binary tree is used as the right subtree of the previous binary tree.

img

Convert binary tree to forest

  1. The root and left subtree of the binary tree are used as the binary tree form of the first tree, and then converted into a tree (the right child becomes a brother)
  2. The right subtree of the root and its left child serve as the second tree, and the right child serves as the third tree, and so on.
img img
5.4.3 Tree and forest traversal

Root-first traversal of the tree (depth-first traversal)

Visit the root first, and then traverse each subtree from left to right, which is the same as the preorder sequence of the corresponding binary tree.

Post-root traversal of the tree (depth-first traversal)

Traverse each subtree from left to right, and then visit the root, which is the same as the in-order sequence of the corresponding binary tree of this tree.

Tree level traversal (breadth-first traversal)

①If the tree is not empty, the root node is added to the queue

② If the queue is not empty, the head element is dequeued and accessed, and the children of the element are added to the queue in sequence.

③ Repeat ② until the queue is empty

Pre-order traversal of the forest == Pre-order traversal of each subtree in sequence

If the forest is non-empty, it is traversed according to the following rules:

(1) Visit the root node of the first tree in the forest.

(2) Pre-order traverse the subtree forest of the root node in the first tree.

(3) Pre-order traverse the forest consisting of the remaining trees after removing the first tree.

In-order traversal of the forest == Post-order traversal of each subtree in sequence

If the forest is non-empty, it is traversed according to the following rules:

(1) In-order traverse the sub-tree forest of the root node of the first tree in the forest.

(2) Visit the root node of the first tree.

(3) In-order traversal of the forest consisting of the remaining trees after removing the first tree

5.5 Application of trees and binary trees

5.5.1 Huffman tree and Huffman coding

The weight of the node: a numerical value with some practical meaning (such as: indicating the importance of the node, etc.)

The weighted path length of a node: the product of the path length (number of edges passed) from the root of the tree to the node and the weight of the node

The weighted path length of the tree: the sum of the weighted path lengths of all leaf nodes in the tree

Definition: In a binary tree containing n weighted leaf nodes, the binary tree with the smallest weighted path length (WPL) is called the Huffman tree, also called the optimal binary tree.

The structure of Huffman tree

Given n nodes with weights w1, w2,..., wn, the algorithm for constructing a Huffman tree is described as follows:

(1) Treat these n nodes as n binary trees containing only one node to form a forest F.

(2) Construct a new node, select the two trees with the smallest root node weights from F as the left and right subtrees of the new node, and set the weights of the new node to the left and right subtrees The sum of the weights of the root nodes.

(3) Delete the two trees just selected from F and add the newly obtained tree to F.

(4) Repeat steps 2) and 3) until there is only one tree left in F.

Features

  1. Each initial node eventually becomes a leaf node, and the smaller the weight, the greater the path length from the node to the root node.
  2. huffman treeThe total number of nodes is 2n − 1
  3. There is no node with degree 1 in the Huffman tree.
  4. Huffman tree unionNot unique,butWPL must be the same and optimal

Huffman coding

**Fixed length encoding:**Each character is represented by equal length of binary bits

Variable length encoding: allows different characters to be represented by unequal lengths of binary bits

Prefix encoding: No encoding is a prefix of another encoding

Construct Huffman code:

(1) Each character in the character set is used as a leaf node, and the frequency of occurrence of each character is used as the weight of the node. A Huffman tree is constructed according to the method introduced previously.

(2) Mark the sequence on the path from the root node to the leaf node, 0 goes to the left child, 1 goes to the right child

img
5.5.2 Union search
img img img

Chapter 6 Picture

6.1 Basic concepts of graphs

Definition of graph

The graph G consists of a vertex set V and an edge set E, denoted as G = (V, E), where V(G) represents the finite non-empty set of vertices in the graph G; E(G) represents the distance between the vertices in the graph G. A collection of relationships (edges). If V = {v1, v2, …, vn}, then use ==|V| to represent the number of vertices in graph G==, also called the order of graph G, E = {(u, v) | uÎV, vÎV }, use ==|E| to represent the number of edges in graph G ==.

Note: The linear table can be an empty table, and the tree can be an empty tree, but the graph cannot be empty, that is, V must be a non-empty set, and E can be an empty set.

Directed graph: If E is a finite set of directed edges (also called arcs), then the graph G is a directed graph. An arc is an ordered pair of vertices, denoted as ==<v,w>, where v and w are vertices, v is called the tail of the arc, and w is called the head of the arc==, which is called the arc from vertex v to vertex w, It is also said that v is adjacent to w, or w is adjacent to v. <v,w> ≠<w,v>

Undirected graph: If E is a finite set of undirected edges (abbreviated as edges), then the graph G is an undirected graph. An edge is an unordered pair of vertices, denoted (v, w) or (w, v), because == (v, w) = (w, v), where v, w are vertices ==. It can be said that vertex w and vertex v are adjacent points to each other. Edge (v, w) is attached to vertices w and v, or edge (v, w) is associated with vertices v and w.

**Simple graph:**① There are no duplicate edges; ② There are no edges from the vertex to itself.

Multigraph: The number of edges between two nodes in graph G is more than one, and the vertices are allowed to be associated with themselves through the same edge, then G is a multigraph.

Vertex degree, in-degree, out-degree

Undirected graph: The degree of a vertex v refers to the number of edges attached to the vertex, recorded as TD(v).

The sum of the degrees of all vertices of an undirected graph is equal to twice the number of edges.

Directed graph: In-degree is the number of directed edges ending at vertex v, recorded as ID(v);

Out-degree is the number of directed edges starting from vertex v, denoted as OD(v).

		==顶点v的度==等于其入度和出度之和,即TD(v) = ID(v) + OD(v)。

1
Vertex-vertex relationship description

Path - a path from vertex vp to vertex vq refers to the sequence of vertices,
loop - a path in which the first vertex and the last vertex are the same is called a loop or ring.
Simple path - in the path sequence, the vertices do not appear repeatedly. The path is called a simple path.
Simple circuit - A circuit in which the vertices except the first and last vertices do not appear repeatedly are called simple circuits.
Path length—the number of edges on the path. Point-
to-point distance—if the shortest path from vertex u to vertex v exists, the length of this path is called the distance from u to v. If there is no path from u to v at all, then the distance is recorded as infinity (∞).
In an undirected graph, if there is a path from vertex v to vertex w, then v and w are said to be connected.
In a directed graph, if from If there are paths from vertex v to vertex w and from vertex w to vertex v, then the two vertices are said to be strongly connected.
If any two vertices in graph G are connected, then the graph G is called a connected graph, otherwise It is called a disconnected graph.
If any pair of vertices in the graph is strongly connected, the graph is called a strongly connected graph.
img

Subgraph: Given two graphs G = (V, E) and G' = (V',E'), if V' is a subset of V and E' is a subset of E, then G¢ is subgraph of G.

Generate subgraph: subgraph G' that satisfies ==V(G') = V(G)==

Connected component: The maximal connected subgraph in an undirected graph is called a connected component.

Maximally connected subgraph: The subgraph must be connected and contain as many vertices and edges as possible

Strongly connected component: The maximum strongly connected subgraph in a directed graph is called the strongly connected component of the directed graph.

The spanning tree of a connected graph (undirected) is a minimal connected subgraph that contains all the vertices in the graph.

If the number of vertices in the graph is n, then its spanning tree contains n-1 edges.

In a disconnected graph, the spanning tree of connected components constitutes the spanning forest of the disconnected graph.

Edge rights, weighted pictures/nets

Edge weight - In a graph, each edge can be marked with a numerical value that has a certain meaning, which is called the weight of the edge.

Weighted graph/net - A graph with weighted edges is called a weighted graph, also called a net.

Weighted path length - When the graph is a weighted graph, the sum of the weights of all edges on a path is called the weighted path length of the path.

Special form of pictures

Undirected complete graph - there is an edge between any two vertices in the undirected graph

Directed complete graph - there are two arcs in opposite directions between any two vertices in the directed graph.

A graph with few edges is called a sparse graph, and the opposite is called a dense graph.
img

img

Tree - an undirected graph that has no loops and is connected

A tree with n vertices must have n-1 edges.

Directed tree - a directed graph in which the in-degree of one vertex is 0 and the in-degree of the other vertices is 1, is called a directed tree

img img

6.2 Storage and basic operations of graphs

img

The degree of the i-th node = the number of non-zero elements in the i-th row (or i-th column)

The out-degree of the i-th node = the number of non-zero elements in the i-th row

The in-degree of the i-th node = the number of non-zero elements in the i-th column

The degree of the i-th node = the sum of the number of non-zero elements in the i-th row and i-th column

The time complexity of finding the degree/out-degree/in-degree of a vertex using the adjacency matrix method is O(|V|)

Space complexity: O(|V|2) - only related to the number of vertices, not the actual number of edges

Properties of the adjacency matrix method

Assume that the adjacency matrix of graph G is A (matrix elements are 0/1), then the elements An [i] [j] of An are equal to the number of paths of length n from vertex i to vertex j.

img
6.2.2 Adjacency list method
img img
6.2.3 Adjacency multiple lists
img
6.2.4 Cross linked list
img img img
6.2.5 Basic operations on graphs

• Adjacent(G,x,y): Determine whether there is an edge or (x, y) in graph G.

• Neighbors(G,x): List the edges adjacent to node x in graph G.

• InsertVertex(G,x): Insert vertex x in graph G.

• DeleteVertex(G,x): Delete vertex x from graph G.

• AddEdge(G,x,y): If the undirected edge (x, y) or the directed edge does not exist, add the edge to the graph G.

• RemoveEdge(G,x,y): If an undirected edge (x, y) or a directed edge exists, delete the edge from the graph G.

• FirstNeighbor(G,x): Find the first neighbor point of vertex x in graph G, and return the vertex number if there is one. If x has no adjacent points or x does not exist in the graph, -1 is returned.

• NextNeighbor(G,x,y): Assuming that vertex y in graph G is an adjacent point of vertex x, return the vertex number of the next adjacent point of vertex x except y. If y is the last adjacent point of x, Returns -1.

• Get_edge_value(G,x,y): Get the edge (x, y) or the corresponding weight in graph G.

• Set_edge_value(G,x,y,v): Set the edge (x, y) or the corresponding weight in graph G to v.

6.3 Graph traversal

6.3.1 Breadth-first traversal of BFS

**Steps:**1. Find all vertices adjacent to a vertex

  1. Mark which vertices have been visited

  2. Requires an auxiliary queue

    img

The adjacency matrix representation of the same graph is unique, so the width-first traversal sequence is unique

The adjacency list representation of the same graph is not unique, so the width-first traversal sequence is not unique.

Existing problem : If it is a non-connected graph, all nodes cannot be traversed

img

Performance analysis : Space complexity: O(|V|)

Time complexity: Adjacency list: O(|V|+|E|) Adjacency matrix: O(|V|²)

breadth-first spanning tree

Definition : A traversal tree obtained during breadth traversal

Features : Unique in adjacency matrix, not unique in adjacency list

For a width-first traversal of a non-connected graph, a width-first generated forest can be obtained

img
6.3.2 Depth-first traversal of DFS

step:

  1. First visit the starting vertex v

  2. Visit any unvisited adjacent vertex w of v

  3. Then visit any unvisited adjacent vertex w2 of w

  4. Repeat until you can no longer access downwards, then return to the most recently visited vertex.

    img

Existing problem: If it is a non-connected graph, all nodes cannot be traversed

Performance analysis:

Space complexity: from the function call stack, in the worst case, the recursion depth is O(|V|)

Time complexity = time required to visit each node + time required to explore each edge

Adjacency list: O(|V|+|E|), adjacency matrix: O(|V|²)

The adjacency matrix representation of the same graph is unique, so the depth-first traversal sequence is unique, and the depth-first spanning tree is also unique.

The adjacency list representation of the same graph is not unique, so the depth-first traversal sequence is not unique, and the depth-first spanning tree is not unique either.

6.3.3 Graph traversal and graph connectivity.
To perform BFS/DFS traversal on an undirected graph, the number of times the BFS/DFS function is called = the number of connected components.

For connected graphs, BFS/DFS only needs to be called once

For strongly connected graphs, BFS/DFS only needs to be called once from any node.

6.4 Application of diagrams

6.4.1 Minimum spanning tree

For a weighted connected undirected graph G = (V, E), the generated trees are different, and the weight of each tree (that is, the sum of the weights of all edges in the tree) may also be different. Let R be the set of all spanning trees of G. If T is the spanning tree with the smallest sum of edge weights in R, then T is called the minimum spanning tree (Minimum-Spanning-Tree, MST) of G. .

• There may be multiple minimum spanning trees, but the sum of edge weights is always unique and smallest.

• Number of edges of the minimum spanning tree = number of vertices - 1. If you cut off an edge, there will be no connection; if you add an edge, a loop will appear.

• If a connected graph itself is a tree, then its minimum spanning tree is itself

• Only connected graphs can generate trees, and unconnected graphs can only generate forests.

Prim algorithm (Prim)

Start building a spanning tree from a certain vertex; each time the new vertex with the smallest cost is included in the spanning tree until all vertices are included.

step:

Initialization: First select any vertex as the initial vertex.
Loop (until all vertices are included): Then select the edge with the smallest weight among the adjacent edges of this vertex and will not form a loop.
Then select the smallest weight among the adjacent edges of these two vertices. The edges do not form a loop.
Features: Time complexity: O(|V|²), does not depend on |E|, suitable for graphs with dense edges.

Kruskal algorithm (Kruskal)

Select an edge with the smallest weight each time to connect both ends of this edge (the ones that are already connected will not be selected) until all nodes are connected.

step:

Initialization: First include all vertices, no edges.
Loop (until it becomes a tree): Select edges in order of increasing weight without forming a loop until n-1 edges are included. Features
: Use a heap to store edge sets, time complexity O(|E|log|E|), suitable for graphs with sparse edges and many vertices

6.4.2 Shortest path

BFS finds the single-source shortest path of an unweighted graph

An unweighted graph can be regarded as a special kind of weighted graph, except that the weight of each edge is 1.

The BFS algorithm to find the single-source shortest path is only suitable for unweighted graphs, or graphs where all edge weights are the same.

Dijkstra's algorithm for finding the shortest path from a single source

Initialization: The set S is {0}, dist[] is the distance from the initial vertex 0 to each vertex, if there is no infinite path[], the initial vertex 0 is -1 (always unchanged), and the distance from 0 to other points is 0. If there is no infinity
, select the point j with the smallest remaining value in dist[]. If dist[j]+arcs[j] [k] <dist[k], then update dist[k] and add this point to the set S. If After updating dist[k], set path[k]=j
and repeat the operation on the remaining points in the set S until S contains all points. The
single-source time complexity is O(|V|²) for all nodes. The pair is O(|V|³)

Floyd algorithm to find the shortest path between vertices

Recursion generates an n-order square matrix sequence, starting from A﹣¹ to
the initial time of Aⁿ﹣¹: If there is an edge between any two vertices, the weight is regarded as the shortest path. If there is no edge, it is infinity.
Then vertex k (k from 0 to n-1) as the intermediate vertex. If the path is reduced, replace the original path
A(k)[i][j]: from vertex i to vertex j, the sequence number of the intermediate node is not greater than the length of the shortest path k
Features:

Time complexity: O(|V|³)
Allows edges with negative weights, and does not allow edges containing negative weights to form loops
. Applicable to weighted undirected graphs, regarded as directed graphs with round-trip double edges.

6.4.3 Directed Acyclic Graph DAG

Directed acyclic graph: If there is no cycle in a directed graph, it is called a directed acyclic graph, or DAG graph for short.

Final steps:

Step 1: Arrange each operand in a row without duplication

Step 2: Mark the order in which each operator takes effect (it doesn’t matter if the order is slightly different)

Step 3: Add operators in order, paying attention to "layering"

Step 4: Check whether operators at the same level can be combined layer by layer from bottom to top

6.4.4 Topological sorting

AOV network (Activity On Vertex NetWork, using vertices to represent active networks): Use a DAG graph (directed acyclic graph) to represent a project. The vertices represent activities, and the directed edges represent that activity Vi must be performed before activity Vj

Topological sorting: In graph theory, a sequence composed of the vertices of a directed acyclic graph is called a topological sorting of the graph if and only if the following conditions are met: ① Each vertex appears and only appears Once. ② If vertex A is ranked before vertex B in the sequence, then there is no path from vertex B to vertex A in the graph.

Or defined as: Topological sorting is a sorting of the vertices of a directed acyclic graph such that if there is a path from vertex A to vertex B, then vertex B appears after vertex A in the sorting. Each AOV network has one or more topological sorting sequences.

Implementation of topological sorting:

① Select a vertex with no predecessor == (entry degree is 0) == from the AOV network and output it.

② Delete the vertex and all directed edges starting from it from the network.

③ Repeat ① and ② until the current AOV network is empty or there are no predecessor-less vertices in the current network.

For an AOV network, if the following steps are used for sorting, it is called reverse topological sorting:

① Select a vertex with no successor (out-degree 0) from the AOV network and output it.

② Delete the vertex and all directed edges ending with it from the network.

③ Repeat ① and ② until the current AOV network is empty.

6.4.5 Critical path

AOE network (Activity On Edge NetWork): In a weighted directed graph, events are represented by vertices, activities are represented by directed edges, and the cost of completing the activity is represented by the weight on the edge (such as the time required to complete the activity) , called a network that uses edges to represent activities, or AOE for short.

AOE network has the following two properties:

① Only after the event represented by a vertex occurs, the activities represented by the directed edges starting from the vertex can start;

② The event represented by a vertex can only occur when the activities represented by each directed edge entering a vertex have ended. In addition, some activities can be carried out in parallel

In the AOE network, there is only one vertex with an in-degree of 0, called the start vertex (source point), which represents the beginning of the entire project; there is also only one vertex with an out-degree of 0, called the end vertex (sink). point), which indicates the end of the entire project.

There may be multiple directed paths from the source to the sink. Among all paths, the path with the largest path length is called the critical path, and the activities on the critical path are called critical activities.

Find critical path steps

① Find the earliest occurrence time of all events ve( ) - determines the earliest time that all activities starting from vk can start

② Find the latest occurrence time of all events vl( ) -- it refers to the latest time that the event must occur without delaying the completion of the entire project.

③ Find the earliest occurrence time of all activities e( ) – refers to the earliest occurrence time of the event represented by the starting point of the activity arc

④ Find the latest occurrence time l( ) of all activities - it refers to the difference between the latest occurrence time of the event represented by the end point of the activity arc and the time required for the activity

⑤ Find the time margin d( ) for all activities – time margin d(i)=l(i)-e(i)

The activity with d(i)=0 is the key activity, and the critical path can be obtained from the key activities.

Find the earliest occurrence time of all events ve()

Add weights starting from the source point and take the maximum value of different paths.

Find the latest occurrence time vl() of all events

Vl (sink point) = Ve (sink point), subtract the weights from the sink point forward, and take the minimum value of different paths

Find the earliest occurrence time of all activities e()

If the edge represents activity ai, then e(i) = ve(k)

Find the latest occurrence time l() of all activities

If the edge represents activity ai, then l(i) = vl(j) - Weight(vk, vj)

Find the time margin d() for all activities

d(i) = l(i) - e(i)

If the time required for key activities increases, the duration of the entire project will increase or decrease.

Shortening the time of key activities can shorten the duration of the entire project

When shortened to a certain extent, critical activities may become non-critical activities

Chapter 7 Search

7.1 Basic concepts of search

Search - The process of finding data elements that meet certain conditions in a data set is called search.

Lookup table (lookup structure) - The data set used for lookup is called a lookup table, which consists of data elements (or records) of the same type

Keyword - The value of a data item in a data element that uniquely identifies the element. Using keyword-based search, the search result should be unique.

Common operations on lookup tables

① Find data elements that meet the conditions ② Insert or delete a data element

Just do operation ① - Static lookup table only focuses on search speed

Operation ② is also required - in addition to the search speed of the dynamic lookup table, we must also pay attention to whether the insertion/deletion operation is easy to implement

Suitable for static lookup tables: sequential search, binary search, hash search

Suitable for dynamic lookup tables: Binary sorting trees, hash search, binary balanced trees and B-trees are all improvements of binary sorting trees

Find evaluation metrics for an algorithm

Search length - In the search operation, the number of times that keywords need to be compared is called the search length.

Average Search Length (ASL, Average Search Length) - the average number of keyword comparisons in all search processes

7.2 Sequential search and binary search

7.2.1 Sequential search

Sequential search, also called "linear search", is usually used in linear tables

ASL success=(n+1)/2 ASL failure=n+1

7.2.2 Half search (binary search)

Half search, also known as "binary search", is only applicable to ordered sequence lists.

Construction of decision tree using binary search

If there are an odd number of elements between the current low and high, then after mid separation, the number of elements in the left and right parts will be equal.

If there are an even number of elements between the current low and high, then after mid separation, the left half will have one element less than the right half.

In the decision tree of binary search, if mid = ⌊(low + high)/2], then for any node, there must be: Number of nodes in the right subtree - Number of nodes in the left subtree = 0 or 1

The decision tree of half search must be a balanced binary tree

In the decision tree of the binary search, only the bottom level is unsatisfied. Therefore, when the number of elements is n, the tree height h = ⌈log2(n + 1)⌉

Time complexity of half search = O(log2n)

7.2.3 Block search

Block search, also known as index sequential search, the algorithm process is as follows: ① Determine the block to which the record to be searched belongs in the index table (can be sequential or half-folded) ② Search sequentially within the block

Features: Unordered within blocks, ordered between blocks

Algorithmic idea:

Divided into several sub-blocks, the blocks can be unordered, and the blocks are ordered.
The largest keyword in the first block < all records in the second block, and so on to
create an index table, which contains the largest keyword of each block and the number of each block. The address of an element, arranged in order by keyword.
If the index table does not contain the target keyword, the half search index table will eventually stop at low>high, and the search must be in the block pointed to by low.

low exceeds the range of the index table and the search fails.

Assume that the average search lengths of index search and intra-block search are LI and LS respectively, then the average search length of block search is ASL=LI + LS

7.3 Tree search

7.3.1 Binary sorting tree BST

Binary sorting tree, also known as binary search tree (BST, Binary Search Tree)

Binary sorting tree can be used for ordered organization and search of elements

It has the following properties: the keywords of all nodes on the left subtree are smaller than the keywords of the root node; the keywords of all nodes on the right subtree are greater than the keywords of the root node. The left subtree and the right subtree are each a binary sorting tree.

Left subtree node value < root node value < right subtree node value

By performing in-order traversal, an increasing ordered sequence can be obtained.
Deletion of a binary sort tree: ① If the deleted node z is a leaf node, it will be deleted directly without destroying the properties of the binary sort tree.

② If node z has only one left subtree or right subtree, let z’s subtree become the subtree of z’s parent node, replacing z’s position.

③ If node z has two subtrees, left and right, replace z with the direct successor (or direct predecessor) of z, and then delete the direct successor (or direct predecessor) from the binary sorting tree, thus converting to first or second case

Search efficiency analysis:

Average search length ASL = (number of each layer * number of corresponding layers) / total number
Worst case: similar to ordered singly linked list O(n)
Best case: balanced binary tree O(㏒₂n)
search process: similar to binary search , but the decision tree of binary search is unique

7.3.2 Balanced Binary Tree AVL

Balanced Binary Tree (Balanced Binary Tree), referred to as Balanced Tree (AVL Tree) - the height difference between the left subtree and the right subtree of any node on the tree does not exceed 1

Balance factor of node = height of left subtree - height of right subtree

Insertion of a balanced binary tree: Find the first unbalanced node from the insertion point, and adjust the subtree rooted at that node. The object of each adjustment is == "minimum unbalanced subtree" ==

LL balanced spin (right single spin). Since a new node is inserted into the left subtree (L) of the left child (L) of node A, the balance factor of A increases from 1 to 2, causing the subtree rooted at A to lose balance and need to move to the right once. rotation operation. Rotate A's left child B upward to the right to replace A as the root node, rotate node A downward to the right to become the root node of B's ​​right subtree, and B's original right subtree as the left child of A node Tree

RR balanced spin (left single spin). Since a new node is inserted into the right subtree (R) of the right child (R) of node A, the balance factor of A is reduced from -1 to -2, causing the subtree rooted at A to become unbalanced, requiring a Rotation operation to the left. Rotate A's right child B upward to the left to replace A as the root node, rotate node A downward to the left to become the root node of B's ​​left subtree, and B's original left subtree as the right subtree of A node
img

LR balanced rotation (double rotation first left and then right). Since a new node is inserted into the right subtree (R) of A's left child (L), the balance factor of A increases from 1 to 2, causing the subtree rooted at A to become unbalanced and requiring two rotation operations. First rotate left and then right. First, rotate the root node C of the right subtree of the left child B of the A node upward to the left to the position of the B node, and then rotate the C node upward to the right to the position of the A node.
img

RL balanced rotation (right and then left double rotation). Since a new node is inserted into the left subtree (L) of A's right child (R), the balance factor of A is reduced from -1 to -2, causing the subtree rooted at A to be unbalanced and requiring two rotations. To operate, first rotate right and then left. First, rotate the root node C of the left subtree of the right child B of the A node upward to the right to the position of the B node, and then rotate the C node upward to the left to the position of the A node. It contains n nodes
. The maximum depth of the balanced binary tree of a point is O(log2n), and the average search length of the balanced binary tree is O(log2n)

img
7.3.3 Red-black trees

Balanced Binary Tree AVL: Insertion/deletion can easily destroy the "balanced" property, requiring frequent adjustments to the shape of the tree. For example, if the insertion operation causes imbalance, you need to calculate the balance factor first, find the minimum unbalanced subtree (large time overhead), and then adjust LL/RR/LR/RL

Red-black tree RBT: Insertion/deletion will not destroy the "red-black" characteristics in many cases, and there is no need to frequently adjust the shape of the tree. Even if adjustments are needed, they can generally be completed within a constant time.

Balanced binary tree: suitable for scenarios where search is the main task and insertion/deletion is rare

Red-black tree: suitable for frequent insertion and deletion scenarios, more practical

A red-black tree is a binary sorted tree. The left subtree node value ≤ the root node value ≤ the right subtree node value.

definition

①Each node is either red or black

②The root node is black

③Leaf nodes (external nodes, NULL nodes, failed nodes) are all black

④ There are no two adjacent red nodes (that is, the parent node and child node of the red node are both black)

⑤For each node, the simple path from the node to any leaf node contains the same number of black nodes (the number of black node levels in the left and right subtrees of each node is equal)
img

The black height bh of a node - the total number of black nodes on the path starting from a node (excluding this node) to any empty leaf node

Properties: The longest path from the root node to the leaf node is not greater than twice the shortest path. The height of a red-black tree with n internal nodes is h ≤ 2log2(n+1)

Red-black tree search operation time complexity = O(log2n)

Insertion of red-black trees

insertion process
img

img img

7.4 B-tree and B+ tree

7.4.1 B-tree

B-tree, also known as multi-path balanced search tree, the maximum number of children of all nodes in the B-tree is called the order of the B-tree, usually represented by m. A B-tree of order m is either an empty tree or an m-ary tree that satisfies the following characteristics:

1) Each node in the tree has at most m subtrees, that is, it contains at most m-1 keywords.

2) If the root node is not a terminal node, there are at least two subtrees.

3) All non-leaf nodes except the root node have at least m/2 subtrees, that is, they contain at least m/2-1 keywords.

4) All leaf nodes appear at the same level and carry no information (can be regarded as external nodes or search failure nodes similar to the binary search decision tree. In fact, these nodes do not exist and point to these nodes. Pointer to point is null)

img

5) All non-leaf node structures: keywords are arranged in ascending order, all numbers in the left subtree <corresponds to the keyword, all numbers in the right subtree> correspond to the keyword

The core characteristics of m-order B-tree:

1) The number of subtrees of the root node is ∈ [2, m], and the number of keywords is ∈ [1, m-1]. The number of subtrees of other nodes ∈ [ , m]; the number of keywords ∈ [ -1, m-1]

2) For any node, all its subtrees have the same height

3) Keyword value: subtree 0<keyword 1<subtree 1<keyword 2<subtree 2<…. (analogous to binary search tree left<middle<right)

The height of the B-tree is calculated as the height of the B-tree, excluding leaf nodes (failure nodes)

img

Insertion into B-tree

Must be inserted into a non-leaf node at the lowest level

After inserting the key, if the number of keywords in the original node exceeds the upper limit, the keywords will be divided into two parts from the middle position (m/2). The keywords contained in the left part will be placed in the original node, and the keywords contained in the right part will be placed in the original node. The included keywords are put into the new node, and the node at the middle position (m/2) is inserted into the parent node of the original node.

If the number of keywords in its parent node exceeds the upper limit at this time, the splitting operation will continue until the process reaches the root node, causing the height of the B-tree to increase by 1

img

Deletion of B-tree

Number of node keywords ⌈m/2⌉ − 1 ≤n≤m-1

If the deleted keyword is in a terminal node, delete the keyword directly (note whether the number of node keywords is lower than the lower limit ⌈m/2⌉ − 1)

If the deleted keyword is in a non-terminal node, use the direct predecessor or direct successor to replace the deleted keyword.

Direct predecessor: the "lower-right" element in the subtree pointed to by the left pointer of the current keyword

Direct successor: the "lower-left" element in the subtree pointed to by the pointer to the right of the current keyword

Brother enough to borrow. If the number of keywords in the node where the deleted keyword is located before deletion is lower than the lower limit, and the number of keywords in the right (or left) sibling node of this node is still sufficient, you need to adjust the node, right ( or left) sibling node and its parent node (parental transposition method

Parent-child transposition method: A keyword in a sibling node enters the parent node, a keyword in the parent node enters the deleted node, and then the keyword is deleted

Essence: Always ensure that subtree 0<keyword 1<subtree 1<keyword 2<subtree 2

Brothers are not enough to borrow. If the number of keywords in the node where the deleted keyword is located before deletion is lower than the lower limit, and the number of keywords in the left and right sibling nodes adjacent to the node at this time are both = ⌈m/2⌉ − 1, Then delete the keyword and merge it with the keywords in the left (or right) sibling node and parent node.

During the merging process, the number of keywords in the parent node will be reduced by 1. If its parent node is the root node and the number of keywords is reduced to 0 (when the number of keywords in the root node is 1, there are 2 subtrees), the root node will be deleted directly, and the new node after merging Become the root; if the parent node is not the root node and the number of keywords is reduced to, it must be adjusted or merged with its own sibling nodes, and repeat the above steps until it meets the requirements of the B-tree For this reason

img
7.4.2 B+ tree

A B+ tree of order m must meet the following conditions:

1) Each branch node has at most m subtrees (child nodes).

2) The non-leaf root node has at least two subtrees, and each other branch node has at least one subtree.

3) The number of subtrees of a node is equal to the number of keywords. (The biggest difference from B-tree)

4) All leaf nodes contain all keywords and pointers to corresponding records. The keywords are arranged in the leaf nodes in order of size, and adjacent leaf nodes are linked to each other in order of size.

5) All branch nodes only contain the maximum value of the keywords in each of its sub-nodes and pointers to its sub-nodes.

B+ tree search: search from the minimum keyword order/multi-way search from the root node

In the B+ tree, no matter whether the search is successful or not, you must eventually reach the bottom node.

img

7.5 Hash table

7.5.1 Basic concepts of hash tables

Hash Table, also known as hash table. is a data structure,

Features: Keywords of data elements are directly related to their storage addresses

If different keywords are mapped to the same value through a hash function, they are called "synonyms"

If other elements are already stored at the location determined by the hash function, this situation is called a "collision"

Method to deal with conflicts - zipper method (also known as link method, chain address method): store all "synonyms" in a linked list

7.5.2 Construction method of hash function

Division with remainder method - H(key) = key % p

Selection of P: The prime number that is not greater than m (hash table length) but is closest to or equal to m

Direct addressing method - H(key) = key or H(key) = a*key + b

Among them, a and b are constants. This method is the simplest to calculate and will not cause conflicts. It is suitable for situations where the distribution of keywords is basically continuous. If the distribution of keywords is discontinuous and there are many empty spaces, it will cause a waste of storage space.

Number analysis method - select a few digits with a relatively even distribution of numbers as the hash address

Assume that the keyword is an r-base number (such as a decimal number), and the frequency of occurrence of r numbers in each position is not necessarily the same. It may be more evenly distributed in some positions, and each number has an equal chance of appearing; However, some bits are unevenly distributed, and only certain types of numbers appear frequently. In this case, several bits with a relatively even distribution of numbers can be selected as the hash address. This method is suitable for known keyword sets. If the keywords are changed, a new hash function needs to be reconstructed.

Square the middle method - take the middle digits of the square value of the keyword as the hash address

The specific number of digits depends on the actual situation. The hash address obtained by this method is related to each bit of the keyword, so the distribution of the hash address is relatively even. It is suitable for situations where the values ​​of each bit of the keyword are not even enough or smaller than the hash address. required number of digits

Folding method - divide the keyword into several parts with the same number of digits, and take the superposition sum as the hash location

Suitable for a large number of digits and the digits on each bit are roughly evenly distributed

7.5.3 Methods of handling conflicts

Use the zipper method (also known as the link method and the chain address method) to handle "conflicts": store all "synonyms" in a linked list

The open addressing method means that the free address that can store a new entry is open to both its synonym entry and its non-synonym entry.

The mathematical recurrence formula is: Hi = (H(key) + di) % m

i = 0, 1, 2,…, k (k≤m - 1), m represents the length of the hash table; di is the incremental sequence; i can be understood as "the i-th conflict"

① Linear detection method - di = 0, 1, 2, 3, ..., m-1; that is, when a conflict occurs, each time it is detected whether the next adjacent unit is empty.

The linear detection method can easily cause the "aggregation (accumulation)" phenomenon of synonyms and non-synonyms, seriously affecting the search efficiency.

Reason - detection after conflict must be placed in a continuous position

② Square detection method. When di = 02, 12, -12, 22, -22, …, k2, -k2, it is called the square detection method, also known as the quadratic detection method where k≤m/2

Advantages: Can avoid accumulation problems, Disadvantages: Only half of the cells can be detected

③Pseudo-random sequence method. di is a pseudo-random sequence, such as di= 0, 5, 24, 11, …

img

Notice:

  1. You cannot physically delete existing elements at will, as it will truncate the search addresses of other elements with the same hash address.

  2. Can be marked for deletion and logical deletion

  3. Side effects: After multiple deletions, the hash table appears to be very full, but in fact there are still many unused locations and requires regular maintenance.

    img
7.5.4 Hash search and performance analysis

Search efficiency:

Depends on: hash function, method of handling collisions, filling factor
Filling factor (α): Defines the filling degree of a table α = number of records in the table n/hash table length m The
average search length depends on α, not directly
Common mistakes in n or m :

K synonyms are filled into the hash table using linear detection, which requires K(K+1)/2 detections. The
probability of conflict is proportional to the size of the filling factor. The fuller the number, the easier it is to conflict. The
hash function cannot be constructed with a random number function, and it cannot be performed. Normal search
points to note:

The number of successful ASL searches = the number of conflicts + 1
determines the total required search positions based on the hash function, and searches for each position until it is empty. If it is not empty, use the corresponding conflict handling method to search again. Need to compare

Chapter 8 Sorting

8.1 Basic concepts of sorting

Sorting (Sort) is the process of rearranging the elements in the table so that the elements in the table are ordered by keywords.

algorithm stability. If there are two elements Ri and Rj in the list to be sorted, their corresponding keywords are the same, that is, keyi = keyj, and Ri is in front of Rj before sorting. If a certain sorting algorithm is used to sort, Ri is still in front of Rj. If so, the sorting algorithm is said to be stable; otherwise, the sorting algorithm is said to be unstable.
img

8.2 Insertion sort

8.2.1 Direct insertion sort

Algorithm idea: Each time a record to be sorted is inserted into the previously sorted subsequence according to its key size, until all records are inserted.

img img

Space complexity: O(1)

Best time complexity - O(n)

Worst time complexity - O(n2)

Average time complexity: O(n2)

8.2.2 Half-way insertion sort

First use half search to find the position where it should be inserted, and then move the element

When low>high, the halving search stops. All elements in [low, i-1] should be moved to the right, and A[0] should be copied to the position pointed by low.

When A[mid]==A[0], in order to ensure the "stability" of the algorithm, you should continue to find the insertion position to the right of the position pointed by mid.

img

Only the number of comparison elements is reduced, the number of moves has not changed

img
8.2.3 Hill sorting

**Hill sorting:** First divide the table to be sorted into "special" sub-tables of the shape L[i, i + d, i + 2d,…, i + kd], and perform separate operations on each sub-table. Rows are inserted directly into the sort. Reduce the increment d and repeat the above process until d=1.

img

Space complexity: O(1)

Time complexity: It is related to the choice of the incremental sequence d1, d2, d3..., and it is currently impossible to prove the exact time complexity using mathematical means.

Stability: Unstable

Applicability: only applicable to sequence lists, not applicable to linked lists

img

8.3 Exchange sort

8.3.1 Bubble sort

**Bubble sorting:** Compare the values ​​of adjacent elements from back to front (or from front to back). If they are in reverse order (i.e. A[i-1]>A[i]), exchange them. Until the sequence comparison is completed.

Implementation steps:

Starting from the last element, the two adjacent elements are compared. If they are in reverse order, they are exchanged. In
one bubble, the smallest element will be exchanged to the first position
. In the next bubble, the smallest element determined in the previous bubble will not be Participate again, and the sequence to be sorted is reduced by one element.
The result of each bubble is that the smallest element in the sequence is placed at the final position, and up to n-1 is completed.

img img img
8.3.2 Quick sort

Smaller elements are swapped to the left and larger elements are swapped to the right.

Implementation steps:

Each time, the first element in the current table is taken as the base pivot (pivot value) to divide the table.
i points to the first element (base), j points to the last element.
Start with j, and find the first one from back to front. For an element smaller than the baseline, j points to the position of this element. Use this element to replace the element pointed to by
i. Then start from i and find the first element larger than the baseline from front to back. i points to the position of this element. Replace it with this element. The element pointed to by
j starts from j again and repeats until the contact between i and j stops. Put the reference value at the contact position and divide the sequence into two pieces. The first one is smaller than the reference value and the second one is greater than the reference value. Take the first two subsequences respectively
. One element serves as the base value and the operation is repeated
img

img

Time complexity = O(n*number of recursion levels)

Best time complexity = O(nlog2n)

Worst time complexity = O(n2)

Space complexity = O (number of recursion levels)

Best space complexity = O(log2n)

Worst space complexity = O(n)

Quicksort is the sorting algorithm with the best average performance among all internal sorting algorithms.

Stability: Unstable
img

8.4 Selection sorting

8.4.1 Simple selection sorting

Selection sorting : In each pass, the element with the smallest (or largest) keyword is selected from the elements to be sorted and added to the ordered subsequence.

Simple selection sorting: in each pass, select the element with the smallest keyword among the elements to be sorted and add it to the ordered subsequence

img

Space complexity : O(1)

Time complexity =O(n2)

Stability : Unstable

Applicability : Can be used for both sequence lists and linked lists

img
8.4.2 Heap sort

If the n keyword sequence L[1…n] satisfies one of the following properties, it is called a heap:

① If it is satisfied: L(i)≥L(2i) and L(i)≥L(2i+1) (1 ≤ i ≤n/2) - big root heap (big top heap)

② If: L(i)≤L(2i) and L(i)≤L(2i+1) (1 ≤ i ≤n/2) - small root heap (small top heap)

img img img img

Time complexity of heap sort = O(n) + O(nlog2n) = O(nlog2n)

Space complexity of heap sort = O(1)

Stability: Unstable

img
8.4.3 Heap insertion and deletion

img

img

img

8.5 Merge sort and radix sort

8.5.1 Merge sort

Merge : merge two or more ordered sequences into one

img img img img img
8.5.2 Radix sort
img img

Space complexity = O®

Time complexity = O(d(n+r))

Stability: stable

Problems that radix sort is good at solving:

①The keywords of data elements can be easily split into d groups, and d is smaller

②The value range of each group of keywords is not large, that is, r is smaller

③The number of data elements n is larger

img

8.6 Comparison and application of internal sorting algorithms

img

8.7 External sorting

8.7.1 Basic concepts of external sorting

External sorting : There are too many data elements and it is impossible to read them all into the memory at one time for sorting.

**"Merge Sort"** requires each subsequence to be in order, reading the contents of two blocks at a time, performing internal sorting and writing them back to the disk.

External sorting time overhead = time to read and write external storage + time required for internal sorting + time required for internal merging

img img img
8.7.2 Loser Tree

K-way balanced merge: ① Only k segments can be merged into one at most; ② In each merge pass, if m merge segments participate in the merge, then ⌈m/k⌉ new merges will be obtained after this pass of processing. part

The loser tree can be regarded as a complete binary tree (with one more head). The k leaf nodes are the elements currently participating in the comparison. The non-leaf nodes are used to remember the "losers" in the left and right subtrees, allowing the winner to continue the comparison up to the root node.

The record of the leaf node currently participating in the comparison.
The internal node memorizes the serial number of the loser in the left and right subtrees, allowing the winner to continue to compare upward until the
current minimum/maximum serial number of the root node is not the value itself (winner). By)

img

After using the loser tree to obtain the minimum value sequence number, take out the minimum value number, add the next keyword at its position, continue the comparison, and construct the loser tree

After using the loser tree, the number of comparisons has nothing to do with m. You can increase m to reduce the height of the merge tree.

m is not bigger, the better. As m increases, the input buffer increases, its capacity decreases, and the number of data exchanges between internal and external memory increases.

8.7.3 Permutation selection sorting

Assume that the initial file to be scheduled is FI, the initial merge segment output file is FO, the memory work area is WA, the initial status of FO and WA is empty, and WA can accommodate w records. The steps of the replacement-selection algorithm are as follows:

1) Input w records from FI to workspace WA.

2) Select the record with the minimum keyword value from WA and record it as the MINIMAX record.

3) Output MINIMAX records to FO.

4) If FI is not empty, input the next record from FI into WA.

5) Select the smallest keyword record from all records in WA with keywords larger than the keywords of the MINIMAX record as the new MINIMAX record.

6) Repeat 3) to 5) until no new MINIMAX record can be selected in WA, thus obtaining an initial merge segment and outputting an end flag of the merge segment to FO.

7) Repeat 2) to 6) until WA is empty. From this, all initial merged segments are obtained.

8.7.4 Optimal merge tree

Idea: Let the merged segments with fewer records be merged first, and those with more records be merged last.

Important conclusion: The number of disk I/Os during the merge process = WPL of the merge tree * 2

To minimize the number of disk I/Os, the WPL of the merge tree must be minimized - Huffman tree!

Note: For k-ary merging, if the number of initial merging segments cannot constitute a strict k-ary merging tree, you need to add several "virtual segments" with a length of 0, and then construct a k-ary Huffman tree.

m-shaped Huffman tree:

An initial merge segment where the leaf node participates in the merge.
The weight of the leaf node. The number of records in the initial merge segment. The
length of the path from the leaf node to the root node. The number of merge passes. The band of the
new merge segment generated by the merger of non-leaf nodes.
The merge tree Weight path length, total number of read records

img

Guess you like

Origin blog.csdn.net/qq_51432166/article/details/131989652