Summary of data structure retest knowledge points

data structure

red black tree

A special AVL (balanced binary tree) of the red-black tree. When the balanced binary tree inserts and deletes nodes, a large number of rotation operations are required to maintain the characteristics, which will reduce the performance. The red-black tree sacrifices the strict balance conditions and improves the performance. performance. When the red-black tree is inserted into an unbalanced tree, a maximum of two rotations are performed, and when an unbalanced tree is deleted, a maximum of three rotations are performed.

Features:

  • Either black or red (either black or red)

  • The root node is black

  • Leaf nodes are black

  • If a node is red, its two child nodes must be black (all paths from root to leaves, there cannot be two consecutive red nodes)

  • All paths from the root node to the leaf nodes contain the same number of black nodes (the same black height)

How processes are scheduled

Memory fragmentation management (memory is related to red-black trees)

Network card data

B-tree and B+ tree

The difference between them?

B-tree is also called multi-way balanced search tree, and the maximum number of child nodes it has is called order. Usually represented by m, the following conditions are met:

  • A node in the tree has at most m subtrees and at most m-1 keywords.
  • If the root node is not a terminal node, it must have at least two subtrees
  • All non-child nodes except the root node have at least M/2 subtrees and at most M subtrees, and the range of keywords is M/2-1 to M-1.
  • Keywords of nodes usually contain pointers to child nodes.
  • Leaf nodes appear on the same layer without information
  • All node balance factors are 0

The search process of B-tree is divided into two steps. The first step is to search for the node, which is done on the disk. Then the keyword of the node is read into the memory, and the search is carried out through the half search method. If the search cannot be found, then the corresponding search is performed. Search for the pointer node. If a leaf node is found, it means it does not exist.

B+tree

  • A node in the tree has at most M subtrees and M keywords.
  • The range of the number of keywords (subtrees) is M/2-M, and the keywords are also in this range
  • The root node still has at least two subtrees

the difference:

  • The non-leaf nodes of the B+ tree only serve as an index. Each index item of the non-leaf node only contains the maximum keyword of the corresponding subtree and a pointer to the subtree, but does not include the storage address of the keyword. Therefore, leaf nodes contain keywords, that is, keywords in non-leaf nodes will also appear in leaf nodes; in B-tree, keywords contained in leaf nodes and keywords contained in other nodes

minimum spanning tree

Kruskal: Based on greedy thinking, a specific algorithm step,

Arrange the edges, that is, the weights, between the vertices in ascending order in an array, and take out the variables from the array in turn to form a tree. If the removed edge forms a cycle, discard the edge. If it does not form a cycle, add it until With n-1 edges, the spanning tree is complete.

This algorithm is suitable for sparse graphs with relatively few edges.

Prim's rule: It is also based on greedy thinking. The specific algorithm idea is as follows:

Randomly select a point, select the point closest to the point, add it to the set, then find the point closest to the set, add it to the set, and continue searching until the number of points in the set is n, generate The tree is finished.

This algorithm is suitable for dense graphs with many edges.

Binary sorting tree and balanced binary tree

A binary sorted tree is a data structure that is easy to find. The left child of each node is smaller than the node, and the right child is larger than the node. An ordered arrangement can be obtained through inorder traversal. The speed of insertion, search and deletion is Log2N.

A balanced binary tree means that the absolute value of the height difference between the left and right children of a node cannot exceed 1. Balanced binary trees are created to maintain the tree structure and improve search speed. Prevent the sorting tree from degenerating into a linked list.

AVL adjustment process:

Delete the node. If it is a leaf node, delete it directly. If there is only one subtree, replace it directly with the subtree. If there are both left and right children, select the leaf node of the higher subtree and exchange it with the deleted node and delete it.

They all first judge the balance factor from top to bottom, find the unbalanced subtree, and then select 3 nodes from the root node for adjustment.

Several adjustment methods for binary sorting trees:

Left-handed LL, inserts a node into the right subtree of the right child, resulting in imbalance.

  • Right-rotation RR inserts a node into the left subtree of the left child, resulting in imbalance.
  • Left and right LR inserts nodes into the right subtree of the left child, resulting in imbalance.
  • Right-left RL inserts a node into the left subtree of the right child, resulting in imbalance.

8 major sorting algorithms

[The external link image transfer failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly (img-mMuhTmim-1661000957154)(%E7%BA%A2%E9%BB%91%E6%A0%91. assets/image-20220722110258118.png)]

Insertion sort:
  • direct insertion sort

The average time complexity is O(N2). The idea is to insert an element to be sorted into the previously sorted sequence. When the elements themselves are ordered, the time complexity is at least O(N)

  • Hill sort:

In fact, it is a special insertion sort. It divides the elements into multiple groups by setting the increment. Each group is sorted, and then the increment is halved, and then the groups are sorted. When the increment is reduced to 1, it degenerates. It becomes direct insertion sort.

Exchange sort:
  • Bubble Sort

    As the name suggests, the elements are compared in sequence and moved back continuously. When the bubbling condition is not met, the final position is determined. The average time complexity is also O(N2), and the minimum time complexity is O(N).

  • Quick sort

    This is an unstable sorting method with an average time complexity of (NlogN) and is based on the idea of ​​divide and conquer. Select any element as the pivot, and divide the sorted list into two independent parts through one sorting. Each sorting will determine the final position of an element, so that the elements in front of the element are smaller than it, and the elements behind it are larger than it. Quickly sort the sublist recursively.

selection sort
  • Simple selection sort

    As the name suggests, the smallest number is selected each time and placed in the corresponding position, so that the sequence gradually becomes orderly.

  • Heap sort

    Small root pile and big root pile. A small root heap satisfies the requirement that all child elements are larger than the root node, while a large root heap satisfies the opposite. Properties include the maximum or minimum value of the top element of the heap. The heap stored in an array satisfies that the subscript of the left child is equal to 2n+1, and the node subscript of the right child is 2n+2.

    Then the sorting process is to compare each child node with its parent node, and exchange it if it is greater than the parent node. And perform exchange operations recursively until it is less than the parent node. This process is called falling.

    The process of inserting elements is to put the elements at the end and then drop them.

    Deleting the top element of the heap means exchanging the top element of the heap with the last element, and then dropping the last element after deleting it.

merge sort

The idea of ​​​​merging sort is to merge ordered lists to obtain a larger ordered list. When merging, you only need to compare the first elements in two or more ordered lists, and select the smallest element to put into the new ordered list. Merge sort initially treats each element as an ordered list, then merges them two by two, and finally merges them into a large ordered list through recursive operations. The so-called divide and conquer idea is to recursively divide a large problem into countless small problems to solve.

bucket sort

It is an extension of radix sorting. It uses the idea of ​​​​divide and conquer to put elements into buckets in the corresponding range, then sort the elements in each bucket, and then take out the elements.

Radix sort
Briefly describe the process of binary search

Binary search, also called half search, is suitable for ordered lists. The idea is to first compare with the middle element. If it is less than the element, it will be searched in the first half. If it is greater than the element, it will be searched in the second half. In this way, the search scope is continuously narrowed, and the average search time complexity is log2N.

What is the difference between stack and queue?

A stack is a linear structure (an ordered collection of data items) with a pointer above it. The characteristic is that elements are inserted at one end and deleted at the other end, first in, last out.

A queue is a linear structure with a head pointer and a tail pointer. The characteristic is that elements are inserted at one end and deleted at the other end, first in, first out.

What data structure is used to find the fastest name by student number?

Through the hash table, the time complexity is O(1), because the storage principle of the hash table is to use the hash algorithm to obtain a Hash Code, which is the stored subscript value, and then put the value into the corresponding position. storage. A hash table is essentially an array.

To sort a lot of data, use

When deleting and inserting data when using half search, the position needs to be moved. Is there any more efficient way?

Half search actually builds a binary sorting tree, so we can use a binary sorting tree to store the sequence, which will be much more convenient when adding and deleting.

Sorting the results of millions of test takers

The disadvantage of using bucket sorting is that it consumes a lot of extra space, but the time complexity is low, O(N+M), n is the number of elements, M is the number of buckets, and the time complexity is O(k*N). The time complexity of radix sort is also this

If you use a high-performance computer to look at the average time complexity, the time complexity of heap sort, bucket sort, and merge sort is still lower.

What if you design it yourself?

Briefly describe the KMP algorithm

Full binary tree: As the name suggests, except that the leaf nodes have no child nodes, each level has two child nodes.

For ordered quick sort, it will degenerate into a single-branch tree, and the time complexity is O(N2).

What is a data structure and the relationship between data structures and algorithms (unfamiliar)

Program = data structure + algorithm.

The relationship between data structures and algorithms is: the data structure is the bottom layer, the algorithm is the high-level layer, the data structure serves the algorithm, and the algorithm operates around the data structure. The algorithm selects the appropriate data structure in specific application scenarios. For example, in scenarios where there are many data deletions and insertion operations, the data structure of linked lists is suitable. When the number of data reads is large, the data structure of arrays is suitable. structure.

Data structure: Data does not exist independently. There is a relationship between data. This relationship is called a structure. Data structure includes three aspects: logical structure, storage structure, and data operation. The design of the algorithm depends on the selected logical structure, and the implementation of the algorithm depends on the storage structure used.

What are the applications of data structures in computer networks and operating systems?

  • Operating system: process scheduling is applied to stacks and queues, file management is applied to trees, graphs and sorting, and information storage is applied to linear lists and strings.
  • Computer Network: Grouping and forwarding use queues and graphs. DNS parsing is applied to trees, and linear tables are used to represent information.

What are the similarities and differences between linked lists and arrays?

Similar points: Both can represent linear structures. The linked list points to the next node through the previous node, while the array is stored sequentially in memory.

The difference: the physical address of the data stored in the array is continuous, while the physical location of the data stored in the linked list is not continuous, but points to the next node through the pointer of each node. Linked lists are suitable for inserting and deleting data, and arrays are suitable for searching data.

What is a structure?

A custom data structure composed of basic data structures

What is the difference between a binary tree and a tree with only two children?

The subtrees of each element in the binary tree are ordered, that is, they can be distinguished by the left and right subtrees. There is no right or left among the two children of the tree.

Find factorial recursively

int fact(int n){
    
    
  if(n==1){
    
    
  return 1;
  }else{
    
    
      return n*fact(n-1);
  }
}

Perform a hierarchical traversal of the tree

You can use a queue to traverse, that is, breadth-first search traversal, put the root node into the queue, and then dequeue it. After each dequeue, you need to put the child nodes into the queue in turn, until all nodes are dequeued, and After no nodes are put into the queue, the level traversal is completed.

What method is used to sort numeric arrays

  • If n is relatively small, use direct insertion sort or simple selection sort.
  • If the array is basically in order, use direct insertion or bubble sort.
  • If n is relatively large, use quick sort, heap sort and merge sort.
  • If n is large, radix sorting is used.

What is a Huffman tree? what's the effect

Huffman tree: The binary tree with the shortest weighted path length among all binary trees composed of n weighted leaf nodes.

Huffman coding: For a Huffman tree with n leaf nodes, if each left branch in the tree is assigned 0 and the right branch is assigned 1, then the path from the root to each leaf constitutes A binary string is called Huffman coding. Huffman coding is the optimal prefix code (each code cannot be called the prefix of other codes) and is often used for data compression.

Heap storage structure

The physical storage structure is an array, and the logical storage structure is a tree.

Hash conflicts and how to resolve them

Hash conflict refers to the situation where different values ​​map to the same value

Conflict resolution methods include zipper method and open address method

  • Open address:
    • Linear probing: place conflicting values ​​in subsequent storage units
    • Square detection: Center the unit with conflicting values ​​and search for appropriate storage units in square numbers on both sides.
    • Rehashing: Create a second hash function for processing
  • Zipper method, using pointers to connect conflicting values

The height of a full binary tree: there are n nodes, the height is log2(n+1)

Find intermediate node in singly linked list

Through the fast and slow pointers, the slow pointer moves one node at a time, and the fast pointer moves two nodes at a time. When the fast pointer reaches the end, the slow pointer reaches the middle position.

The difference between binary search tree and binary search implemented by array

Binary search tree is a dynamic search that can facilitate the insertion and deletion of data. Binary search is a static search and is not convenient for deleting data.

Shortest path?

Dijkstra (based on greedy thinking)

Requires: visited[], D[], path[], arcs[] [];

Refers to the visited nodes respectively, and then the remaining nodes, the shortest path to the corresponding node, used for recording, and then finally a two-dimensional array is used to represent the relationship between the nodes, which is the path length.

Through rounds of comparisons, each time the distance to the nodes that can be reached is listed, and then the nodes that can be reached with the shortest distance are added to the Vist set, and then the distances to the remaining nodes in D are listed, because Every time a new node is added, the path will be updated, but each time the node with the shortest path is selected to join until the remaining node set is empty, and the path remains the shortest path.

Freud's algorithm

The idea is simpler, it uses the idea of ​​dynamic programming, and the time complexity is O(N3).

Update a two-dimensional array recording distance through a three-level loop

The transfer equation is a[i] [j]=min(a[i] [j], a[i] [k]+a[k] [i]);

Minimum spanning tree:

Prim, the point-adding method, first randomly selects a point and puts it into the set of visited nodes, and then selects the node closest to the node, that is, the node with the smallest weight, from the remaining nodes to join, and so on, until all nodes have been access.

Kruskal's algorithm (edge ​​addition method): It is the process of continuously merging deep forests into trees, sorting the edges in the graph from small to large, selecting the smallest edge, and adding its corresponding node to the visited node set . Until all nodes are added to the set, but in this process, it is necessary to determine whether the added edges form a cycle. This requires judgment, through the method of union search, and whether the nodes are already in the same tree, and their ancestors. The nodes are consistent, and initially the ancestor of each node is itself.

KMP algorithm

This algorithm is a string matching algorithm that searches for the string contained in a main string.

The traditional brute force solution method is to configure a pointer for each string, start from the beginning, and then continuously move and compare one by one. Once they are different, the pointers of the two strings will return to the head, and the main string will advance one step. This algorithm is time-complex The degree is O(N*M),

KMP, the time responsibility of the algorithm is O(M*N). It mainly uses the repeated prefixes of the strings during the matching process. It uses the failure information to avoid starting the matching again every time. The matching The key is to create an array, and each array records the position from which the match will start after we fail to match.

The solution to the next array is dynamic programming

The method to find the longest common prefix and suffix and determine the common prefix of strings is to compare them in sequence. If they are the same

The key code of the longest suffix of the k-th element:

S represents a string array, and next represents an array of records.

next(k) = S[k]==S[0+next[k-1] ] ? next(k-1)+1 : 0

Guess you like

Origin blog.csdn.net/yang12332123321/article/details/126444468