Data structure interview frequently asked questions--postgraduate entrance examination and postgraduate entrance examination re-examination

前言: Hello everyone, my name is Dream. This year, I joined the Artificial Intelligence Major of Shandong University(Experience Post) Now I will sort out my own knowledge points for preparing for professional courses and share them with everyone. I hope it can help everyone! This is a summary of key knowledge. If you want to see all the content, I have packed it here for everyone and you need to pick it up: Full set of materials for postgraduate re-examination + 408 professional course knowledge summary and thinking Map (click to click) , quick look at the content:
1. A complete set of materials for postgraduate re-examination, including summer camp preparation materials (including contact teacher templates, self-interview Introduction, submission materials and recommendation letter templates) and pre-recommendation without preparation materials;
2. 408 professional course review materials, mind maps;
3. Advanced mathematics, linear algebra , probability, discrete review materials and mind maps;
4. Summary of frequently asked questions about machine learning algorithms;
5. Summary of algorithms and training tips for computer test scores;
6. Must-read knowledge before the interview, summary of 408 frequently asked questions and common interview questions.
Insert image description here

Data structure interview frequently asked questions directory:

Chapter One Introduction

knowledge framework

Insert image description here

1. Time complexity

The frequency of a statement refers to the number of times the statement is executed repeatedly in the algorithm. The sum of the frequencies of all statements in the algorithm is recorded as T(n), which
is a function of the problem size n of the algorithm. The time complexity mainly analyzes T (n). The frequency of basic operations in the algorithm is of the same order of magnitude as T(n), so the frequency f(n) of the basic operations in the algorithm is usually used to analyze the time complexity of the algorithm. Therefore, the time complexity of the algorithm is recorded asT(n) = O(f(n))
In the above formula, the meaning of O is T(n ), its strict mathematical definition is:If T(n) and f(n) are two functions defined on the set of positive integers
, then there are positive constants C and n0, such that when n >= n0, both satisfy 0 <=T(n) <=Cf(n).

2. Space complexity

The space complexity S(n) of the algorithm is defined as the storage space consumed by the algorithm, which is a function of the problem size n. Recorded asS(n) = O(g(n))
When a program is executed, it needs storage space to store its own instructions, constants, In addition to variables and input data, some work units are also needed to operate on the data and some auxiliary space to store some information required to implement calculations. If the space occupied by the input data only depends on the problem itself and has nothing to do with the algorithm, then only the extra space besides the input and program needs to be analyzed.
The algorithm works in place means that the auxiliary space required by the algorithm is constant, that is, O(1).

3. The logical structure of numbers

refers to the logical relationship between data elements, which has nothing to do with the storage structure of numbers and is independent of the computer:
Insert image description here

4. Number storage structure

The storage structure refers to the representation of the data structure in the computer, also called the physical structure. There are mainly the following four types:
1) Sequential storage. Store logically adjacent elements in storage units that are also physically adjacent. The relationship between elements is reflected by the adjacency relationship of storage units. The advantage is that it can achieve random access, and each element occupies the least storage space; the disadvantage is that it can only use an entire adjacent storage unit. Therefore, more external debris may be generated.
2) Chain storage. It is not required that logically adjacent elements are also physically adjacent. The logical relationship between elements is represented by a pointer indicating the storage address of the element. The advantage is that there will be no fragmentation and all storage units can be fully utilized; the disadvantage is that each element occupies additional storage space due to the storage of pointers, and only sequential access can be achieved.
3) Index storage. While storing element information, additional index tables are also created. Each item in the index table is called an index item, and the general form of an index item is (keyword, address). The advantage is that the retrieval speed is fast; the disadvantage is that the additional index table takes up additional storage space. In addition, the index table must be modified when adding and deleting data, which will take more time.
4) Hash storage. The storage address of the element is directly calculated based on the keyword of the element, also known as hash storage. The advantage is that the operations of retrieving, adding and deleting nodes are very fast; the disadvantage is that if the hash function is not good, the element storage unit may be damaged. Conflicts, and resolving conflicts will increase time and space overhead.

5. Is looping more efficient than recursion?

Loops and recursions are interchangeable. It cannot be said conclusively that loops are more efficient than recursions.
The advantages of recursion are: The code is concise and clear, and it is easy to check the correctness; the disadvantages are: When the number of recursive calls is large, additional stack processing needs to be added, which may cause stack overflow and have a certain impact on execution efficiency.
The advantages of loop are: Simple structure and fast speed; The disadvantages are: It cannot solve all problems. Some problems are suitable to be solved by recursion but not by loops.

6. What is the difference between greedy algorithm, dynamic programming and divide and conquer method?

As the name suggests, the greedy algorithm is to produce the best result at present. It does not consider the overall situation, that is, the local optimal solution. The greedy algorithm optimizes from top to bottom, step by step from the top, to get the final result. It cannot guarantee the global optimal solution, and the selection of greedy strategy related.
Dynamic programming decomposes the problem into sub-problems. These sub-problems may be repeated, and the results of the previous sub-problems can be recorded to prevent repeated calculations. Dynamic programming solves sub-problems, and the solution to the previous sub-problem has a certain impact on the subsequent sub-problem. In the process of solving sub-problems, retain which local solutions are likely to be optimal and discard other local solutions until the last problem is solved, which is the solution to the initial problem. Dynamic programming is to find the global optimal solution step by step from bottom to top. (Each sub-problem overlaps)
Divide-and-conquer: Divide the original problem into n smaller ones with the same structure as the original problem Similar subproblems; solve these subproblems recursively and then combine the results to obtain the solution to the original problem. (Each sub-problem is independent)
The divide-and-conquer mode has three steps at each level of recursion:
Decomposition (Divide): Decompose the original problem into a series of sub-problems;
Conquer: Solve each sub-problem recursively. If the sub-problem is small enough, solve it directly;
Combine: Combine the results of the sub-problem into the solution of the original problem.
For example, merge sort

Chapter 2, Linear Table

Knowledge framework:

Insert image description here

7. Comparison between sequence list and linked list

1. Access (read and write) method

顺序表can be accessed sequentially or randomly, 链表can only access elements sequentially from the header. For example, to perform a save or retrieve operation on the i-th position, the sequential list only needs to be accessed once, while the linked list needs to be accessed i times starting from the head of the table.

2. Logical structure and physical structure

When using顺序存储, logically adjacent elements have corresponding physical storage locations that are also adjacent . When 链式存储 is used, the physical storage locations of logically adjacent elements are not necessarily adjacent , and the corresponding logical Relationships are represented through pointer links.

3. Search, insert and delete operations

Forfind by value, 顺序表out of order, The time complexity of both is O(n); sequence tableordered, can be searched by half, and the time complexity at this time is O(log2n).
For search by serial number , 顺序表 supports random access, and the time complexity is only 0(1), and the average time complexity of 链表 is O( n). The insertion and deletion operations of the sequence table require an average movement of 半个表长的元素. For insertion and deletion operations in a linked list, you only need to modify the pointer field of the relevant node. Since each node of the linked list has a pointer field, the storage density is not large enough.

4. Space allocation

顺序存储In the case of static storage allocation, once the storage space is full, it cannot be expanded. If new elements are added, memory overflow will occur, so sufficient storage needs to be allocated in advance space. The node space of 链式存储 is only allocated when needed. As long as there is space in the memory, it can be allocated, 操作灵活、高效.

8. What is the difference between head pointer and head node?

Head pointer: is the pointerpointing to the storage location of the first node, which has an identification function. The head pointer is a necessary element of the linked list. The head pointer exists regardless of whether the linked list is empty or not.
Head node: is placed before the first element node, It is convenient for inserting and deleting operations before the first element node. The head node is not a necessary element of the linked list. It is optional. The head node The data field can also store no information.

Chapter 3, Stack and Queue

knowledge framework

Insert image description here

9. What is the difference between stack and queue?

A queue is a linear list that allows insertion at one end and deletion at the other end. As the name suggests, the queue is just like queuing. The elements entering the queue are processed according to the "first in, first out" rule, and are deleted at the head and inserted at the end of the table. Since the queue needs to be frequently inserted and deleted, generally for efficiency, fixed-length arrays are chosen to store queue elements. Before operating on the queue, it is necessary to determine whether the queue is empty or full. If you want dynamic length, you can also use a linked list to store the queue. In this case, remember the addresses of the queue head and the alignment pointer.
The stack is a linear list that can only perform insertion and deletion operations at the end of the table. Elements inserted into the stack are processed according to the "last in, first out" rule. Insertion and deletion operations are performed on the top of the stack. Similar to queues, fixed-length arrays are generally used to store stack elements. Since both pushing and popping are performed on the top of the stack, there is a size variable to record the size of the current stack. When pushing into the stack, size cannot exceed the length of the array, size+1. When popping from the stack, the stack is not empty, size- 1.

10. Shared stack

Using the relatively unchanged characteristic of the stack bottom position, two sequential stacks can share a one-dimensional array space, and the two stacks can be The stack bottoms of are set at both ends of the shared space , and the two stack tops of extend toward the middle of the shared space< /span>. In this way, the storage space can be used more effectively. The spaces of the two stacks are adjusted to each other, and overflow will only occur when the entire storage space is full.
Insert image description here

11. How to distinguish whether the circular queue is empty or full?

Under normal circumstances, the conditions for determining whether the circular queue is empty or full are the same, both areQ.front == Q.rear.
ps: The head pointer of the queue points to the first number; the tail pointer of the queue points to the next position of the last number, which is about to enter the queue.
Method 1: Sacrifice a unit to distinguish between an empty team and a full team. At this time, (Q.rear+1)%MaxSize == Q.front is the team full mark
mark.
Method 2: Add a data member representing the number of elements to the type. In this way, the condition for the team to be empty is Q.size == 0; the condition for the team to be full is
Q.size == MaxSize.

12. What is the algorithmic idea of ​​stack in bracket matching?

Bracket matching algorithm idea
(1) Whenever a "left bracket" appears, it is pushed onto the stack;
(2) A "right bracket" appears brackets",
First check if the stack is empty? If the stack is empty, it means that the "right bracket" is redundant
. Otherwise, compare it with the top element of the stack? If they match, the "left parenthesis" on the top of the stack is popped, otherwise it indicates no match
(3) At the end of the expression test,
if the stack is empty, then Indicates that the match in the expression is correct, otherwise it indicates that the "left bracket" is surplus.

13. What is the algorithmic idea of ​​stack evaluation through suffix expressions?

Scan each item of the expression sequentially, and then perform the following corresponding operations according to its type:If the item is an operand, push it onto the stack
in
; If the item is an operator, the two operands y and x are continuously removed from the stack to form the operation instruction XY , and push the calculation result back onto the stack. When all terms of the expression have been scanned and processed, the final calculation result is stored on the top of the stack.
Insert image description here

14. What is the application of stack in recursion?

Recursion is an important programming method. If itself is used in the definition of a function, procedure or data structure, the function, procedure or data structure is said to be defined recursively, referred to as recursive.
It usually transforms a large and complex problem into a smaller problem similar to the original problem to solve. The recursive strategy can describe the problem-solving process with only a small amount of code. The multiple repeated calculations required greatly reduce the amount of code in the program. But under normal circumstances, itsefficiency is not too high.
Convert a recursive algorithm to a non-recursive algorithm, usually using a stack to achieve this conversion.

15. What is the role of queue in hierarchical traversal?

There is a large class of problems in information processing that need to be dealt with layer by layer or line by line. The solution to this type of problem is often to preprocess the next layer or row when processing the current layer or current row, and arrange the processing sequence. After the current layer or current row is processed, the next layer or row can be processed. Next line. The queue is used to save the next processing sequence. The following uses an example of binary tree level traversal to illustrate the application of queues.
Insert image description here

16. What is the application of queue in computer system?

Queue is widely used in computer systems. The following is a brief description of the role of queues in computer systems from two aspects:The first aspect is to solve the problem of host and external The second aspect of the problem of speed mismatch between devices is to solve the problem of resource competition caused by multiple users.
For the first aspect, we only briefly explain the problem of speed mismatch between the host and the printer . The host outputs data to the printer for printing, and the speed of outputting data is much faster than the speed of printing data. Due to the speed mismatch, it is obviously not possible to directly send the output data to the printer for printing. The solution is to set up a print data buffer. The host writes the data to be printed into this buffer in sequence. When it is full, it pauses the output and switches to other things. The printer takes out the data from the buffer and prints it sequentially according to the first-in, first-out principle. After printing, it sends a request to the host. The host writes the print data to the buffer after receiving the request. This not only ensures the accuracy of the printed data, but also improves the efficiency of the host. It can be seen that the data stored in the print data buffer is a queue.
For the second aspect, CPU (i.e. central processing unit, which includes arithmetic units and controllers) competition for resources is a typical example. On a computer system with multiple terminals, there are multiple users who each need the CPU to run their own programs, and they make CPU-occupying requests to the operating system through their respective terminals. The operating system usually puts each request in a queue according to the time order, and allocates the CPU to the user requesting at the head of the queue each time. When the corresponding program ends or the specified time interval expires, it is dequeued and the CPU is allocated to the user requested by the new queue leader. This not only satisfies each user's request, but also allows the CPU to operate normally.

17. Compressed storage of matrices

The data structure provides compressed storage structures for some special matrices. The special matrices mentioned here are mainly divided into the following two categories:
matrices containing a large number of identical data elements, such as symmetric matrices;
matrices containing a large number of 0 elements , such as sparse matrices and upper (lower) triangular matrices;
For the above two types of matrices, the compressed storage idea of ​​the data structure is: the same data elements in the matrix (Including element 0) Only one is stored.

Chapter 4, string

Knowledge framework:

Insert image description here

18. String pattern matching

The positioning operation of substring is usually called string pattern matching. What it seeks is the position of the substring (often called the pattern string) in the main string.
The idea of ​​the brute force pattern matching algorithm is: Starting from the first character of the main string, compare it with the first character of the substring. If they are equal, continue the comparison. ; If not, start from the next position of the main string and continue to compare with the substring until finally seeing whether the match is successful.
The following substring is: 'abcac':
Insert image description here
Improved pattern matching algorithm-----KMP algorithm:
Start by analyzing the structure of the pattern itself. If there is a suffix in the sequence of equal prefixes that happens to be the prefix of the pattern, then the pattern can be slid back to align with these equal characters. position, the main string i pointer does not need to backtrack, and the comparison continues from this position. The calculation of the number of backward sliding digits in the pattern is only related to the structure of the pattern itself and has nothing to do with the main string.

Chapter 5, Trees and Binary Trees

Knowledge framework:

Insert image description here

19. What are the related concepts of trees and binary trees?

A tree is a non-linear structure, with obvious hierarchical relationships between its elements. In the tree structure, each node has only one antecedent called the parent node, and the node without antecedent is the root node of the tree, referred to as the root of the tree; each node can have multiple consequents that become the child nodes of the node. , nodes without successors are called leaf nodes.
In the tree structure, the number of child nodes owned by a node is called the degree of the node. The degree of the largest node in the tree is the degree of the tree. The maximum level of the tree is called Depth of tree
Binary tree: Binary tree is another tree structure. Its characteristic is that each node has at most two subtrees, and the subtrees of the binary tree can be divided into left and right subtrees, and their order cannot be arbitrary. reverse. Similar to trees, binary trees are also defined recursively. A binary tree is a finite set of n (n >=0) nodes:
1) Or it is an empty binary tree, that is, n=0.
2) Or it is composed of a root node and two disjoint left subtrees and right subtrees called roots. The left subtree and the right subtree are each a binary tree.
A binary tree is an ordered tree. If its left and right subtrees are reversed, it will become another different binary tree. Even if the node in the tree has only one subtree, it must be distinguished whether it is a left subtree or a right subtree
Full binary tree: A full binary tree means that all nodes except the last level have two subtrees. Tree.
Complete binary tree: A complete binary tree means that except for the last layer, the number of nodes in any other layer reaches the maximum, and the last layer is only in The rightmost node is missing
Storage of binary trees:Binary trees can be stored using a chained storage structure, and full binary trees and complete binary trees can be stored using a sequential storage structure< /span> . In recursive traversal, the stack depth of the recursive working stack is exactly the depth of the tree, so in the worst case, the binary tree is a single branch tree with n nodes and depth n, and the space complexity of the traversal algorithm is O(n ). each node is visited once and only once, so the time complexity is Among the three traversal algorithms, the order of recursively traversing the left and right subtrees is fixed, but the order of visiting the root node is different. No matter which traversal algorithm is used,
Binary tree traversal: Binary trees include pre-order traversal (left and right roots), in-order traversal (left root and right) and subsequent traversal (left and right roots); there is also hierarchical traversal, which requires the use of a queue.
O(n)

20.How to construct a binary tree from a traversal sequence?

1)A binary tree can be uniquely determined by its preorder sequence and inorder sequence.
In the pre-order traversal sequence, the first node must be the root node of the binary tree; while in the in-order traversal, the root node must split the in-order sequence into two subsequences. The former subsequence is the in-order sequence of the left subtree of the root node, and the latter subsequence is the in-order sequence of the right subtree of the root node. Based on these two subsequences, find the corresponding left subsequence and right subsequence in the preorder sequence. In the preorder sequence, the first node of the left subsequence is the root node of the left subtree, and the first node of the right subsequence is the root node of the right subtree. By proceeding recursively in this way, this binary tree can be uniquely determined.
2)A binary tree can also be uniquely determined by the post-order sequence and in-order sequence of the binary tree.
Because the last node of the post-order sequence is just like the first node of the pre-order sequence, the in-order sequence can be divided into two subsequences, and then recursively divided using a similar method. , and then get a binary tree.
3) A binary tree can also be uniquely determined by the hierarchical sequence and in-order sequence of the binary tree . It should be noted that if you only know the preorder sequence and postorder sequence of a binary tree, you cannot uniquely determine a binary tree.

21. What is the concept of clue binary tree?

For a binary tree with n nodes, there are n+1 empty link fields in the binary chain storage structure. These empty link fields are used to store the predecessor node and successor of the node in a certain traversal order. Pointers to nodes, these pointers are called clues, and the binary tree plus clues is called a clue binary tree.
This kind of binary linked list with clues is called a clue linked list, and the corresponding binary tree is called a threaded binary tree (ThreadedBinaryTree). According to the different nature of clues, clue binary trees can be divided into three types: preorder clue binary tree, midorder clue binary tree and postorder clue binary tree.
Note: The clue linked list solves the problem of being unable to directly find the predecessor and successor nodes of the node in a certain traversal sequence, and solves the difficulty of finding the left and right children of the binary linked list.
Binary tree traversal is essentially to convert a complex nonlinear structure into a linear structure, so that each node has a unique predecessor and successor (the first node has no predecessor, and the last Node has no successor). For a node in a binary tree, it is convenient to find its left and right children, and its predecessors and successors can only be obtained during traversal. In order to easily find the predecessor and successor, there are two methods. One is to add forward and backward pointers to the node structure. This method increases storage overhead and is not advisable; the other is to use the empty link pointer of the binary tree.

22. What is the storage structure of tree?

1. Expression of parents:

This storage method uses a set of continuous spaces to store each node, and adds a pseudo pointer to each node to indicate the position of its parent node in the array.
Insert image description here
This storage structure takes advantage of the fact that each node (except the root node) has only one parent. The parent node of each node can be obtained quickly, but when finding the children of the node, Need to traverse the entire structure.

2. Child representation:

The child representation is to link the child nodes of each node with a single linked list to form a linear structure. At this time, there are n child linked lists for n nodes. (The child linked list of a leaf node is an empty list). The operation of finding children in this storage method is very straightforward, while the operation of finding parents requires traversing the child linked list pointer fields of n nodes. n child linked list.
Insert image description here

3. Representation of children’s brothers:

Child brother representation is also called binary tree representation, which uses a binary linked list as the storage structure of the tree. The child sibling notation makes each node include three parts: the node value, a pointer to the node's first child node, and a pointer to the node's next sibling node (along this field you can find the node's All sibling nodes)
Insert image description here
This storage representation is more flexible. Its biggest advantage is that it can easily convert the tree into a binary tree, and it is easy to find the children of the node. However, the disadvantage is that it can easily convert the tree into a binary tree. It is troublesome for the current node to find its parent nodes. If you add a parent field for each node to point to its parent node, it is also very convenient to find the parent node of the node.

23.Binary sorting tree

1. Definition of binary sorting tree:

A binary sorting tree (also called a binary search tree) is either an empty tree or a binary tree with the following properties:

  1. If the left subtree is not empty, then the values ​​of all nodes on the left subtree are less than the value of the root node.
  2. If the right subtree is not empty, then the values ​​of all nodes on the right subtree are greater than the value of the root node.
  3. The left and right subtrees are also a binary sorting tree respectively.
    According to the definition of a binary sorting tree, the left subtree node value < the root node value < the right subtree node value, so by performing in-order traversal on the binary sorting tree, you can get a Increasing ordered sequence.

2. Search in binary sorting tree:

The search in a binary sorting tree starts from the root node and proceeds to a certain branch to compare downwards layer by layer. If the binary tree is not empty, first compare the given value with the keyword of the root node. If they are equal, the search is successful; if not, if it is less than the keyword of the root node, search on the left subtree of the root node. , otherwise search on the right subtree of the root. This is obviously a recursive process.

24. Balanced Binary Tree

In order to prevent the height of the tree from growing too fast and reducing the performance of the binary sorting tree, it is stipulated that when inserting and deleting binary tree nodes, it is necessary to ensure that the absolute value of the height difference between the left and right subtrees of any node does not exceed 1. Such a binary tree is called a balanced binary tree, or a balanced tree for short. Define the height difference between the left subtree and the right subtree of a node as the node's balance factor. Then the value of the balance factor of a balanced binary tree node can only be -1, 0 or 1. Therefore, a balanced binary tree can be defined as either an empty tree, or a binary tree with the following properties: its left subtree and right subtree are both balanced binary trees, and the absolute value of the height difference between the left subtree and the right subtree No more than 1.

25. Huffman tree and Huffman coding:

Insert image description here
2. Construction of Huffman tree
Given n nodes with weights W1, W2..., Wn, the algorithm for constructing Huffman tree is described as follows:

  1. Treat these n nodes as n binary trees containing only one node to form a forest F.
  2. Construct a new node, select the two trees with the smallest root node weights from F as the left and right subtrees of the new node, and point the new node
    The weight of is set to the sum of the weights of the root nodes on the left and right subtrees.
  3. Delete the two trees just selected from F and add the newly obtained trees to F.
  4. Repeat steps 2) and 3)' until there is only one tree left in F.
    From the above construction process, it can be seen that the Huffman tree has the following characteristics:
  5. Each initial node eventually becomes a leaf node, and the smaller the weight, the greater the path length from the node to the root node.
  6. A total of n — 1 new nodes (double-branch nodes) are created during the construction process, so the total number of nodes of the Huffman tree is 2n -1.
  7. Each construction selects 2 trees as the children of the new node, so there is no node with degree 1 in the Huffman tree.
    3. Huffman coding:
    In data communication, if each character is represented by binary bits of equal length, this encoding method is called fixed Length encoding. If different characters are allowed to be represented by binary bits of different lengths, this encoding method is called variable length encoding. Variable-length encoding is much better than fixed-length encoding. Its characteristic is that characters with high frequency are assigned short codes, and characters with low frequency are assigned longer codes, which can reduce the average encoding length of characters. Short, it has the effect of compressing data. Huffman coding is a widely used and very effective data compression code. If no encoding is a prefix of another encoding, then such encoding is called a prefix encoding.
    Obtaining Huffman coding from Huffman trees is a natural process. First, each appearing character is treated as an independent node, its weight is the frequency (or number of times) of its occurrence, and the corresponding Huffman tree is constructed. Obviously, all character nodes appear in leaf nodes. We can interpret the encoding of a character as a sequence of edge tags on the path from the root to the character, where an edge tag of 0 means "go to the left child" and an edge tag of 1 means "go to the right child."
    Insert image description here

Chapter 6, Picture

Knowledge framework:

Insert image description here

26. Some related definitions of graphs

  1. Undirected Graph: Each edge has no direction and can be traversed in both directions.
  2. Directed Graph: Each edge has a direction and can only be traversed in the specified direction.
  3. Weighted Graph: Each edge has a weight or cost, which is used to represent the association strength or cost of the edge.
  4. Directed Acyclic Graph (DAG for short): A directed graph without cycles.
  5. Connected Graph: In an undirected graph, there is a path between any two vertices.
  6. Strongly Connected Graph: In a directed graph, there is a bidirectional path between any two vertices.
  7. Subgraph: A graph composed of a part of the vertices and edges of the original graph.
  8. Complete Graph: A graph in which any two vertices are connected by edges.
  9. Regular Graph: A graph in which all vertices have equal degrees.
  10. Minimum Spanning Tree: The tree with the smallest total weight of the edges connecting all vertices on the graph.
  11. Eulerian Graph: A graph in which each vertex can be traversed once and only once along an edge.
  12. Hamiltonian Graph: A graph containing a Hamiltonian path containing all vertices.
    ,

27. Storage structure of graph:

1. Adjacency matrix method:

The so-called adjacency matrix storage refers to using a one-dimensional array to store the information of the vertices in the graph, and using a two-dimensional array to store the information of the edges in the graph (that is, the adjacency relationship between the vertices), A two-dimensional array that stores adjacency relationships between vertices is called an adjacency matrix. The adjacency matrix example diagrams corresponding to directed graphs, undirected graphs and nets are as follows:
Insert image description here
Suitable for dense graphs.

2. Adjacency list method:

** When a graph is a sparse graph, using the adjacency matrix method will obviously waste a lot of storage space, and the graph adjacency list method combines sequential storage and chain storage methods, which greatly reduces this unnecessary of waste. **The so-called adjacency list refers to establishing a singly linked list for each vertex V in the graph G. The node in the i-th singly linked list represents the edge attached to the vertex v, (for a directed graph, the vertex v, is the tail arc), this singly linked list is called the edge list of vertex vi (for a directed graph, it is called the outgoing edge list). The head pointer of the edge table and the data information of the vertex
are stored sequentially (called the vertex table), so there are two types of nodes in the adjacency list: vertex table nodes and edge table nodes.
Insert image description here

3. Cross linked list method:

The cross linked list method is a linked storage structure of directed graphs. In a cross-linked list, there is a node corresponding to each arc in the directed graph, and a node corresponding to each vertex.
Insert image description here

4. Adjacency multiple tables:

Adjacency multiple list is another linked storage structure of undirected graph.
In the adjacency list, it is easy to obtain various information about vertices and edges. However, when finding whether there is an edge between two vertices in the adjacency list and performing operations such as deleting the edges, you need to separately Traversing the edge tables of two vertices is less efficient. Similar to the cross-linked list, in the adjacency multiple list, each edge is represented by a node, and each vertex is also represented by a node.
Insert image description here

28. Graph traversal

Graph traversal means starting from a certain vertex in the graph and following a certain search method along the edges of the graph to visit all the vertices in the graph once and only once. Note that a tree is a special kind of graph, so tree traversal can actually be regarded as a special kind of graph traversal. The graph traversal algorithm is the basis for algorithms such as solving graph connectivity problems, topological sorting, and finding critical paths. Graph traversal is much more complicated than tree traversal, because any vertex of the graph may be adjacent to other vertices, so after visiting a vertex, you may search along a certain path and return to the vertex. In order to avoid the same vertex being visited multiple times, during the process of traversing the graph, each visited vertex must be recorded. For this purpose, an auxiliary array visited[] can be set up to mark whether the vertex has been visited. There are two main graph traversal algorithms: breadth-first search and depth-first search.

1. Breadth-First-Search (BFS):

is similar to the level-order traversal algorithm of a binary tree. The basic idea is: first visit the starting vertex V, then starting from V, sequentially visit each unvisited adjacent vertex W1, W2,... Wn, then visit all unvisited adjacent vertices of W1, W2,..., Wn in sequence; then starting from these visited vertices, visit all their unvisited adjacent vertices until all vertices in the graph are visited Until it passes. If there are still unvisited vertices in the graph at this time, select another unvisited vertex in the graph as the initial point and repeat the above process. Dijkstra's source shortest path algorithm and Prim's minimum spanning tree algorithm also apply similar ideas.

2. Depth-First-Search (DFS):

Its basic idea is as follows: first visit a certain starting vertex V in the graph, then starting from v, visit any unvisited vertex W1 adjacent to v, and then visit any unvisited vertex W2 adjacent to W1... …repeat the above process. When the downward access can no longer continue, return to the recently visited vertex in turn. If it has adjacent vertices that have not been visited, continue the above search process from that point until all vertices in the graph have been visited.

29. Minimum spanning tree and shortest path:

Minimum spanning tree refers to finding a subgraph in a weighted connected graph such that this subgraph contains all the vertices in the original graph and the sum of the weights of all edges is the smallest. In other words, the minimum spanning tree is a tree that contains all the vertices of the original image, and this tree The sum of edge weights in is the smallest. Minimum spanning tree can be used to solve some problems, such as network planning, circuit wiring, etc.

The shortest path refers to finding the shortest path connecting two vertices in a weighted graph, that is, the sum of the weights of the edges on the path is the smallest. The shortest path algorithm is often used to solve problems such as network routing, navigation, and resource allocation. Classic examples of shortest path algorithms are Dijkstra's algorithm and Floyd-Warshall algorithm.

The minimum spanning tree and shortest path are not necessarily unique. For a minimum spanning tree, there may be multiple different minimum spanning trees; for a shortest path, there may be multiple shortest paths of equal length between two vertices. When choosing a minimum spanning tree or shortest path, you need to make a reasonable choice based on the specific situation.

30. Critical path:

Critical Path is a concept in project management that is used to determine the longest path in the entire project and the key tasks on that path. The critical path contains the tasks in the project that cannot be delayed because delaying these tasks will delay the completion of the entire project.

Chapter 7, Search

Knowledge framework:

Insert image description here

31. Summary of various search methods?

search is divided into static lookup table and dynamic lookup table; static lookup table includes: sequential search, half search, block search; dynamic search
includes: binary sort tree and Balanced binary tree.
(1) Sequential search: Put the keyword key to be searched into the sentinel position (i=0), and then compare the elements in the table with the key from back to front
Comparatively, if the return value is 0, the search fails. There is no such key value in the table. If the return value is the position i of the element (i!=0), the search succeeds
and the sentinel is set. The location is to speed up execution. Its time efficiency is O(n), and its characteristics are: simple structure, suitable for both
sequential structures and continuous structures, but the search efficiency is too low.
(2) Half search: The lookup table is required to be a sequential storage structure and in order. If the keyword is in the table, the position of the keyword is returned. If
the key A typical sign to stop a search when the word is not in the table is: upper bound of search range <= lower bound of search range.
(3) Block search: First divide the lookup table into several sub-tables, and the elements of each sub-table are required to be smaller than the elements of the subsequent sub-table.
That is to ensure that the blocks are in order (but the sub-tables are not necessarily in order), the largest keywords in each sub-table are formed into an index table
, and the table is also Contains the starting address of each subtable. Its characteristics are: order between blocks, disorder within blocks, index search is performed between blocks during search time, and sequential search is performed within blocks. (4) Binary sorting tree: The definition of a binary sorting tree is: either an empty tree, or a tree with the following characteristics: If The tree has a left subtree, then the values ​​of all nodes in the left subtree are less than the value of the root; if the tree has a right subtree, then the values ​​of all the nodes in the right subtree are The value is greater than the root; its left and right subtrees are also binary sorted trees. Dynamic insertion can be performed during search. The inserted node must conform to the definition of a binary sort tree. This is also the difference between dynamic search and static search. Static search cannot be dynamically inserted. (5) Balanced binary tree: A balanced binary tree is also called an AVL tree. It is either an empty tree or has the following characteristics: its left subtree and The absolute value of the height difference of the right subtree cannot be greater than 1, and its left and right subtrees are also balanced binary trees. Balance factor: refers to the height of the left subtree minus the height of the right subtree. Its value can only be 1,0,-1 If Inserting a node into a balanced binary tree may cause imbalance. In this case, the tree structure must be adjusted, that is, balance rotation. Including 4 situations: one-way rotation to the right when inserting a node into the left subtree of the left subtree; one-way rotation to the left when inserting a node into the right subtree of the right subtree Rotation; when inserting a node into the right subtree of the left subtree, first rotate to the left and then to the right; when inserting a node into the left subtree of the right subtree, first rotate to the right and then to the left. .











32.B tree and B+ tree:

1. B-tree, also known as multi-way balanced search tree, the maximum number of children of all nodes in the B-tree is called the order of the B-tree, usually represented by m. A B-tree of order m is either an empty tree or an m-ary tree that satisfies the following characteristics:

  1. Each node in the tree has at most m subtrees, that is, it contains at most m-1 keywords.
  2. If the root node is not a terminal node, there are at least two subtrees.
  3. All non-leaf nodes except the root node have at least "m/2] subtrees, that is, they contain at least "m/2]- 1 keyword.
  4. All leaf nodes appear at the same level and carry no information (can be regarded as external nodes or search failure nodes similar to a thousand-and-a-half search decision tree. In fact, these nodes do not exist. Pointers pointing to these nodes Is empty).

B tree is a multi-way balanced search tree in which the balance factors of all nodes are equal to 0.
Insert image description here
2. B+ tree is a deformed tree of B tree that appears in response to the needs of the database.
A B+ tree of order m must meet the following conditions:

  1. Each branch node has at most m subtrees (child nodes).
  2. The non-leaf root node has at least two subtrees, and each other branch node has at least "m/2] subtrees.
  3. The number of subtrees of a node is equal to the number of keywords.
  4. All leaf nodes contain all keywords and pointers to corresponding records. The keywords are arranged in size order in the leaf nodes,
    and adjacent leaf nodes are in size order. linked to each other.
  5. All branch nodes (indexes that can be regarded as indexes) only contain the maximum value of the keyword in each of its child nodes (i.e., the next-level index block)
    and pointers to its child nodes
    Insert image description here
    The main differences between the m-order B+ tree and the m-order B-tree are as follows:
  6. In the B+ tree, a node with n keywords only contains n subtrees, that is, each keyword corresponds to a subtree; while in the
    B tree, there are A node with n keywords contains n+1 subtrees.
  7. In the B+ tree, the range of the number n of keywords for each node (non-root internal node) is "m/2]<=n<= m (root node: m-1 .
    1<=n<=m); In the B-tree, the range of the number n of keywords for each node (non-root internal node) is "m/2]-1<=n< ;=
  8. In the B+ tree, leaf nodes contain information, and all non-leaf nodes only serve as indexes. Each index item in a non-leaf node only
    contains the corresponding subtree The maximum keyword and the pointer to the subtree do not contain the storage address of the record corresponding to the keyword.
  9. In the B+ tree, leaf nodes contain all keywords, that is, keywords that appear in non-leaf nodes will also appear in leaf nodes
    ; and in In the B-tree, the keywords contained in the leaf nodes and the keywords contained in other nodes are not repeated

Chapter 8, Sorting

Knowledge framework:

Insert image description here

33. Summary and summary of various internal rankings?

Sorting: refers to arranging a sequence of arbitrary elements into an ordered sequence according to the keyword key. Internal sorting includes: insertion sort, selection sort, exchange sort, merge sort, and radix sort. Insertion sort includes: direct insertion sort, half insertion sort, Hill sort; selection sort includes: simple selection sort, heap sort; exchange sort includes: bubble sort, quick sort.
(1) Direct insertion sort (stable): The basic idea is: divide the sequence into an ordered part and an unordered part, select elements from the unordered part and compare them with the ordered part to find the appropriate one Position, move the original element back and insert the element at the corresponding position. The time complexity is: O(n^2), the space complexity is O(1)
(2) Half insertion sort (stable): The basic idea is: set three variables low high mid , let mid=(low+high)/2, if a[mid]>key, let high=mid-1, otherwise let low=mid+1, stop looping until low>high, for each item in the sequence Perform the above processing on each element, find a suitable position, and move other elements back for insertion. The number of comparisons is O(nlog2n), but because it needs to be moved backward, the time complexity is O(n^2) and the space complexity is O(1). The advantage is: the number of comparisons is greatly reduced.
(3) Hill sorting (unstable): The basic idea is: first divide the sequence into several subsequences, perform direct insertion sorting on each subsequence, and wait until the sequence is basically in order. The entire sequence is subjected to direct insertion sort. The advantage is that elements with small key values ​​can be quickly moved to the front, and when the sequence is basically ordered, the time efficiency of direct insertion sorting will be greatly improved, and the space complexity is O(1).
(4) Simple selection sorting (unstable): The basic idea is: divide the sequence into 2 parts, find a minimum value in the disordered part after each pass, and then combine it with the disordered part The first element swaps places. The advantage is that it is very simple to implement. The disadvantage is that only the position of one element can be determined in each trip, which is low in time efficiency. The time complexity is O(n^2) and the space complexity is O(1).
(5) Heap sorting (unstable): Given an arbitrary sequence, k1, k2,...,kn, it is called a heap when it meets the following characteristics: let this sequence be arranged into a complete binary tree , this tree has the following characteristics: any node in the tree is greater than or less than its left and right children, and the root node of this tree is the maximum or minimum value. The advantage is: the efficiency for large files is significantly improved, but the efficiency for small files is not obvious. The time complexity is O(nlog2n) and the space complexity is O(1).
(6) Bubble sorting (stable): The basic idea is: compare elements in pairs in each pass, and exchange them according to the rule of "small first, big last". The advantage is: each pass can not only find the largest element and put it at the end of the sequence, but also straighten out other elements. If no exchange occurs in the next sorting pass, the sorting can be ended early. The time complexity is O(n^2) and the space complexity is O(1).
(7) Quick sorting (unstable): The basic idea is: arbitrarily select an element in the sequence as the center, elements larger than it will all move backward, and elements smaller than it will all move toward Move forward to form two subsequences on the left and right, and then adjust the subsequences according to the above operations until all subsequences have only one element, and the sequence is ordered. The advantage is: not only one element can be determined in each trip, but the time efficiency is high. The time complexity is O(nlog2n), and the space complexity is O(log2n).
(8) Merge sort (stable): The basic idea is: sort two or more The tables are merged into a new ordered list. The time complexity is O(nlogn), and the space complexity is the same as the number of elements to be sorted.
(9) Radix sort: The time complexity is: the time complexity of chain radix sort for n records is O(d(n+rd)), in which the time complexity allocated for each pass is The time complexity is O(n), and the time complexity of recycling is O(rd).
The summary table of various sorting is as follows:
Insert image description here
Direct insertion sort, bubble sort and simple selection sort are basic sorting Methods, they are mainly used when the number of elements n is not very large (n< 10000).
For a medium-sized element sequence (n <=1000), Hill sort is a good choice.
For the situation where the number n of elements is very large , quick sort can be used , heap sort, merge sort or radix sort, among which quick sort and heap sort are unstable, while merge sort and radix sort are stable sorting algorithms.

34. Basic principles of quick sort:

Quick Sort is one of the commonly used sorting algorithms. The basic principle is as follows:

1. Select an element as the pivot (pivot), usually select the first element of the array or randomly select an element as the pivot.
2. Split the array into two sub-arrays, so that the elements in the left sub-array are less than or equal to the pivot, and the elements in the right sub-array are greater than or equal to the pivot.
3. Quick sort the left subarray and right subarray recursively.
4. Merge the sorted left subarray, pivot element and right subarray to obtain the final ordered array.

Quick sort is an efficient sorting algorithm with an average time complexity of O(nlogn), but its worst time complexity is O(n^2), that is, in the worst case, time complexity degradation may occur. . In order to avoid the worst case scenario, some optimization strategies can be adopted, such as randomly selecting pivot elements or using the three-number method to select pivot elements.

本期推荐:

Silicon Story AI writing expert: Learn to write from scratch using ChatGPT, see the logic of AI writing clearly, and explain the art of AI writing (purchase channel)
Insert image description here

Guess you like

Origin blog.csdn.net/weixin_51390582/article/details/133829594