Data structure and algorithm - summary of knowledge points

This article contains the main basic knowledge points of data structure and algorithm, which is convenient for combing and reviewing knowledge.

Please refer to the column for detailed introduction of some knowledge points.

Table of contents

I. Overview

Second, the linear table

Three, the stack

4. Queue

Five, string

6. Multidimensional arrays and generalized tables

7. Trees and Binary Trees

Eight, figure

Nine, search

10. Sorting


I. Overview

Data structure (logical structure, storage structure, algorithm)
data item ∈ data element (record) ∈ data.

Data element (node): The basic unit of data.
Data item: indivisible, the smallest data unit.
Data object: A collection of data elements of the same nature, a subset of data.
1. Logical structure (linear and nonlinear)
data structure (a collection of data elements that have one or more specific relationships between each other)

  1. Collection: Belonging to a collection is the only relationship between data elements.
  2. Linear structure: "one-to-one" relationship with only one immediate predecessor and one immediate successor.
  3. Tree structure: "one-to-many" relationship, except for the root node, there is only one direct predecessor, and all nodes can have 0 or more direct successors.
  4. Graph structure: "many-to-many" relationship, multiple immediate predecessors and multiple immediate successors.

2. Storage structure

  1. Sequential storage (one-dimensional array)
  2. Chained storage (linked list)
  3. Index storage (index table)
  4. Hash (hash) storage (construct hash function)

3. Algorithm and efficiency

  • Algorithms: finiteness, determinism, feasibility, input, output.
  • Goals: correctness, readability, robustness, efficiency, low storage.
  • Efficiency: prior estimation method and post-event statistical method.
  • Evaluation (order of magnitude): time complexity and space complexity.

Second, the linear table

1. Linear table
Basic operations: create, find length, search (by value), insert, delete, display.
2. Sequential storage
sequence table
        The physical sequence of elements is consistent with the logical sequence.
Features : random access according to the serial number of data elements.
·Advantages : saving storage space.
Disadvantages :
        Insertion and deletion operations are time-consuming.
        Pre-allocate maximum space, storage space is wasted.
        The capacity of the table is difficult to expand.
3. Chained storage
linear linked list

  1. Singly linked list: a data field and a pointer field
  2. Circular linked list: one data field and two pointer fields
  3. Doubly linked list: the pointer of the last node points to the head node

Three, the stack

last in first out.
Operations: push into the stack, pop out of the stack, judge the stack is empty, judge the stack is full, read the top element of the stack, and display the stack elements.

  1. Sequential stack (sequential storage)
  2. Chain stack (chain storage)

Application:
        Convert the number system
                from decimal N to k base, loop (execute N=N/k operation, and push the remainder into the stack) until the remainder is 0, and the result is popped out of the stack.
        · Expression evaluation
                1. Infix expression (Infix Notation)
                        operator is placed between two operands. (There is a priority problem, and the processing speed is slow)
                2. The prefix expression (Prefix Notation)
                        operator is placed before the two operands. (There is no priority problem, scan from right to left)
                3. The postfix expression (Postfix Notation)
                        operator is placed after the two operands. (There is no priority problem, scan from left to right) Infix expression to postfix expression :
        

        (1) Read the operand and directly output to the suffix expression.
        (2) Read in operator. Push onto the operand stack.
            ① If the priority of the advanced operator is higher than that of the advanced one, continue to push the stack.
            ②If the priority of the advanced operator is not higher than that of the advanced one, the operators in the operator symbol stack that are higher than or equal to the level of the advanced operator are popped up in sequence and output to the postfix expression.
        (3) Bracket processing.
            ①Encounter the opening bracket "(", enter the operation symbol stack.
            ②Encounter the closing bracket ")", then pop the closest opening bracket "(" and the operator pushed into the stack in turn and output them to the suffix expression (open Brackets and closing brackets are not output).
        (4) When encountering the terminator "#", pop up all the operation symbols in the operation symbol stack one by one, and output them to the suffix expression. (
        5) If the input is +, - For the purpose operator, change to 0 and the operand before the operator.

        · Postfix expression evaluation
        · Interrupt handling and context saving

4. Queue

First in first out.
Operations: enter the queue, exit the queue, read the queue head, display queue elements, judge the team is empty, judge the team is full, ask for the captain.
1. Sequential queue
        1. Sequential queue
                "false overflow" phenomenon.
        2. Circular queue
                solves "false overflow".
2. Chain queue
Application:
        ·Input and output management
        ·CPU allocation management
        ·Priority queue (priority queue)
                weight priority.
        ·Double (double-ended) queue (double-ends queue)
                The scheduling work of the operating system uses double-ended queues.

Five, string

Terms: length, empty string, space string, string equality, string, main string, pattern matching (main string: target string, substring: pattern).
Operations: find string length, connect strings, find substrings, compare strings, insert substrings, delete substrings, and pattern matching.

  1. Sequential storage (similar to linear tables)
  2. link storage
  3. Heap allocation storage (open up a storage space with continuous addresses)

6. Multidimensional arrays and generalized tables

1. Multidimensional
        array An array is an ordered collection with a fixed format and quantity, and each data element is identified by a unique set of subscripts. A multidimensional array is a non-linear structure, and an n-dimensional array has at most n direct predecessors and n direct successors. Row-based (row-first order) Column -based order (column-first order) 2. Compressed storage of special matrices Symmetrical matrix: n(N+1)/2 Triangular matrix: n(n+1)
        / 2+1 · Upper triangular matrix  · Lower triangular matrix 3. Sparse matrix 1) Storage  ·Triple table row                 and column values ​​(i,j,v). ·The linked list with row pointer                 connects the value (triple form) of the same row with the linked list according to the row pointer.   ·The cross-linked list                 solves the disadvantage of triple storage sparse matrix: the position and number of non-zero elements will change when operations (such as addition and subtraction) are performed. Idea:         Store non-zero elements as a node (i, j, v, down, right, triplet + column pointer field + row pointer field). Column pointer field: point to the next non-zero element in this column. Row pointer field: point to the next non-zero element in this row. Line header nodes use only the right pointer field. Column head nodes use only the down pointer field. The total head node stores the number of rows and columns of the original matrix. 2) Algorithms
        



        
       


       

        

      









create

addition

4. The generalized table
is the extension of the linear table. The generalized table is usually enclosed in parentheses and separated by commas.
        Length: the number of elements (atoms, subtables) contained in the first layer.
        Depth: The number of levels (nesting) of parentheses that are included after expansion.
end-to-end storage

  • Table node: flag field, header pointer field, and tail pointer field.
  • Atomic node: flag field, value field.

Features: Use chain storage (data elements of generalized tables can have different structures)

7. Trees and Binary Trees

1. Tree
terms :

Node (including data and branches), node degree (the number of subtrees of the node), tree degree (the maximum degree of each node in the tree), leaf (degree is zero), branch node (degree is not is zero), siblings, number of layers, depth (height) of the tree, forest (zero or finite set of disjoint trees), ordered tree (subtrees of a node are ordered from left to right) and unordered tree.
There is one and only one tree root (root node), no predecessor node.
Notation

  • Nested set method (Venn diagram method)
  • parenthesis notation
  • concave method

2. Binary tree
is a special ordered tree.
Property 1: There are at most 2^(i-1) nodes (i>=1) on the i-th layer of a non-empty binary tree.
Property 2: In a binary tree with a depth of h, there are at most 2^h-1 nodes (h>=1).

Property 3: For a complete binary tree with n nodes, if the nodes are numbered in the same way as a full binary tree, then for any node with the sequence number i, there are:

  1. (Parent node): If i=1, the node with the serial number i is the root node. If i>1, then the serial number of the parent node of the node with the serial number i is (rounded down to i/2).
  2. (Left child): If 2i<=n, then the serial number of the left child node of the node whose serial number is i is 2i. If 2i>n, then the node with sequence number i has no left child.
  3. (Right child): If 2i+1<=n, then the serial number of the right child node of the node whose serial number is i is 2i+1. If 2i+1>n, then the node with sequence number i has no right child.

Property 4: The depth (h) of a complete binary tree (including a full binary tree) with n(n>0) nodes is (rounded down log2^n)+1.

Property 5: For a non-empty binary tree, let n0, n1, and n2 be the number of nodes representing degrees 0, 1, and 2 respectively, then n0=n2+1.
A full binary tree
        is a binary tree with depth h and 2^h-1 nodes.
A complete binary tree
        is a binary tree with depth h and n nodes. If and only if each node has a one-to-one correspondence with the nodes numbered from 1 to n in the full binary tree with depth h, the missing part must be on the right.
Store
        a general binary tree:

                Chained storage (binary linked list, counting from left to right, ternary linked list, counting from left to right parent)
        complete binary tree or full binary tree:

                Sequential storage (it can not only save space, but also use the subscript to determine the position of the node in the binary tree)
3. Traverse the binary tree and the thread binary tree
1. Traverse the binary tree

  1. Preorder traversal (DLR, root left and right)
  2. Inorder traversal (LDR, left root right)
  3. Post-order traversal (LRD, left and right root)
  4. Hierarchical traversal (layer-by-layer access, top-down, left-to-right)

2. Restore
the preorder + inorder of the binary tree (the preorder determines the root node, and the inorder determines the left and right subtrees)

  1. The root node is determined according to the preorder sequence, and the left subtree and right subtree are determined according to the inorder sequence.
  2. Find the root nodes of the left subtree and the right subtree respectively, and connect the root nodes of the left and right subtrees to the parent node.
  3. Repeat the above two steps for the left subtree and the right subtree until the subtree has only 1 node or 2 nodes or is empty.

Inorder + postorder (postorder determines the root node, inorder determines the left and right subtrees)

        Ditto
3. Thread binary tree
        A binary tree with threads (pointers to direct predecessor nodes or direct successor nodes).
Advantages:
        · There is no need to use stack processing when performing in-order traversal, the traversal speed is fast, and the storage space is saved.
        · Arbitrary node can directly find its immediate predecessor and immediate successor node corresponding to the traversal order.
Disadvantages:
        · The insertion and deletion of nodes is cumbersome and slow.
        · Clue trees cannot be shared.
Threading: The process of traversing a binary tree in a certain order to turn it into a threaded binary tree.
Pre-order thread binary tree, in-order thread binary tree (in-order threading is most used), and post-order thread binary tree.
4. Binary tree conversion
Convert a general tree into a binary tree.
        Take the eldest child of the general tree as the left subtree of its parent node, and the second brother as the right subtree of its brother node.

method:

  1. Connection: The connection between all adjacent brothers in the link tree.
  2. Line deletion: Keep the connection between the parent node and the eldest child, and break the connection between the parent node and the non-eldest child.
  3. Rotation: With the root node as the axis, the whole tree is rotated clockwise by a certain angle to make it hierarchical.

Convert Forest to Binary Tree

        A forest is a collection of trees. As long as the roots of each tree in the forest are regarded as brothers, and each tree can be represented by a binary tree, the forest can also be represented by a binary tree.
Methods:
(1) Convert each tree in the forest into a corresponding binary tree.
(2) The first binary tree remains unchanged. Starting from the second binary tree,
the root node , until the root node of the last binary tree is point as the right subtree of the previous binary tree.

Binary tree to tree and forest

        After the tree is converted into a binary tree, its root node must have no right subtree; and after the forest is converted into a binary tree, its root node has a right branch. Obviously, this conversion process is reversible, that is, a binary tree can be restored to a tree or a forest according to whether the root node of the binary tree has a right subtree.
Method:
(1) If a node is the left child of its parent node, connect the right child of the node, the right child of the right child, and the last stone core to the parent node of the node .
(2) Delete all the connections between the parent node and the right child node in the original binary tree.
(3) Organize the results of (1) and (2) to make them clear.

5. Huffman tree
terminology :

The path length (between nodes), the path length of the tree (the sum of the path lengths from the root node to each node), the weighted path length of the node (the path length between the node and the root node and the node The product of the weights above), the weighted path length of the tree (the sum of the weighted path lengths of all leaf nodes in the tree).
Note: In the decision-making problem, the Huffman tree can obtain the best decision-making algorithm.
basic idea

(1) Construct n binary trees with only one leaf node from the given n weights {W1, W2,...,Wn}, so as to obtain a set of binary trees F={T1, T2,..., Tn}.
(2) In F, select two binary trees with the smallest and second smallest root node weights as the left and right subtrees to construct a new binary tree. The weight of the root node of this new binary tree is its left and right The sum of the weights of the root nodes of the subtree.
(3) Delete the two binary trees as the left and right subtrees in the set F, and add the newly created binary tree to the set F.

(4) Repeat steps (2) and (3) until there is only one binary tree left in F, and this binary tree is the Huffman tree to be built.

Huffman coding

        In data communication, it is often necessary to convert the transmitted text into a binary code composed of binary characters 0 and 1, which is called encoding. If the frequency of occurrence of characters is considered during encoding, the characters with high frequency of occurrence are coded as short as possible, and the characters with low frequency of occurrence are coded with a slightly longer code, so as to construct a code of unequal length, the code of the message may be shorter .
        Huffman coding is a coding scheme used to construct the shortest total coding length of the message.

Eight, figure

Edge: undirected direct connection (edge)
Arc: directed direct connection (arc: arc head, arc tail)
Terminology :

Undirected graph, directed graph, undirected complete graph (any two vertices are connected by an edge, n(n-1)/2 edges), directed complete graph (any two vertices are connected by two arcs in opposite directions, n(n-1) arcs), dense graph, sparse graph, degree of vertex (number of edges owned, degree, in-degree, out-degree), weight, network (weighted graph, undirected network, directed network) , path, path length (the number of edges on the path), circuit, simple path, simple circuit, subgraph, connected graph (any two vertices are connected), connected component (maximally connected subgraph), strongly connected graph (directed in the graph), strongly connected component (directed graph), weakly connected graph (directed graph, considering the direction is not connected, and not considering the direction is connected), spanning tree (a subgraph of the connected graph, the subgraph is a A tree containing all vertices of a connected graph).
1. Store
adjacency matrix
A matrix that represents the adjacency relationship between vertices.
Adjacency list
graph is a storage method that combines sequential storage and chain storage.

  • Vertex node structure: vertex flag field, pointer field (pointing to the first adjacent edge)
  • Edge node structure: subscript field (adjacent vertices), pointer field (pointing to the first adjacent edge)

Advantages: It is very convenient to find the arc node with the vertex as the arc tail through a certain vertex.
Disadvantages: Finding the arc node with it as the arc head through this vertex needs to traverse the entire adjacency list.
Note: The advantages and disadvantages of the inverse adjacency matrix are opposite to those of the adjacency matrix.
cross linked list

Advantages: It is convenient to find the arc node with the vertex as the arc tail and arc head at the same time through a certain vertex.
2. Traverse

  • Breadth-first search (BFS, tree-like level traversal)
  • Depth-first search (DFS, tree-like preorder traversal)

3. Connectivity
Minimum spanning tree
        construction algorithm: Prim algorithm, Kruskal algorithm
4. Shortest path
algorithm: Dijkastra (Dijkastra),
5. Directed acyclic graph
(Directed graph without rings) )
applications:
        Topological sorting

        · Critical path

Nine, search

1. Static lookup table
sequential lookup

Basic idea:

        Scan the linear table from one end of the table, and compare the given value with the keyword in turn. If they are equal, the search is successful, and the position of the data element in the table is given; if the entire table is searched, the given value is still not found. keywords with equal values, the search fails and a failure message is given.
Time Complexity: The magnitude of the lookup length (O(n))
Advantages: There is no requirement for the storage of data elements in the table.
Disadvantages: When n is large, the average search length is large and the efficiency is low. Only sequential search (linear linked list)

binary search

Basic idea:

        In the ordered list, the middle element is taken as the comparison object. If the given value is equal to the keyword of the middle element, the search is successful; if the given value is less than the keyword of the middle element, the search is continued in the left half of the middle element ; If the given value is greater than the keyword of the middle element, continue searching in the right half of the middle element. Repeat the above search process until the search succeeds, or the searched area has no data elements, and the search fails.
Time complexity: O(log2^n)
Advantages: High efficiency
Disadvantages:
    Must be sorted by keywords, and sometimes sorting is time-consuming.
    ·It is only applicable to the sequential storage structure, and a large number of nodes must be moved for insertion and deletion operations.

block search

Basic idea:

        Divide the main table with n elements into m blocks (also known as sub-tables). The elements in each block can be out of order, but the blocks must be in order, and an index table is established. The index table includes two fields: a key field (to store the maximum key value in the corresponding block) and a pointer field (to store the first address pointing to the corresponding block). The search method is as follows:
        (1) Detect the key field in the index table to determine the block (binary search available) position of the value kx to be found.
        (2) According to the first address indicated by the index table, search sequentially in the block.

2. Dynamic lookup table

Binary sorting tree
        A binary sorting tree is an empty tree or a binary tree with the following properties:
        (1) If the left subtree is not empty, the values ​​of all nodes on the left subtree are less than the value of the root node.
        (2) If the right subtree is not empty, the values ​​of all nodes on the right subtree are greater than the value of the root node.
        (3) The left and right subtrees are also binary sorted trees.
Basic idea:

  1. If the search tree is empty, the search fails.
  2. The search tree is not empty, compare the given value with the key of the root node of the search tree.
  3. If they are equal, the search is successful and the search process ends, otherwise:

        · When the given value is smaller than the key of the root node, the search will continue on the left subtree, and go to ①.
        · When the given value is greater than the key of the root node, the search will continue on the right subtree, go to ①.
Time complexity: (n+1)/2 (worst)
balanced binary tree
        A balanced binary tree refers to a binary tree in which the heights of the left and right subtrees of any node in the tree are approximately equal. There are many types of balanced binary trees, the most famous being the AVL tree. A balanced binary tree is an empty tree or a binary sorted tree with the following properties:
        (1) The absolute value of the difference between the heights of its left subtree and right subtree (called the balance factor) does not exceed 1.
        (2) Its left subtree and right subtree are both balanced binary trees.
Solve the loss of balance after inserting a node:

  1. The shape of unbalanced subtree is LL type.
  2. The shape of unbalanced subtree is LR type.
  3. The shape of unbalanced subtree is RR type.
  4. The shape of unbalanced subtree is RL type.

3. Hash table

        Hash lookup, also known as hash lookup, is both a lookup method and a storage method (hash storage). The memory storage form of hash storage is called a hash table (hash table).
        The hash lookup is different from the aforementioned method, there is no definite relationship between the storage location of the data element and the keyword, and there is no need for a series of keyword lookup comparisons. It directly obtains the corresponding data element position according to the keyword, that is, there is a one-to-one correspondence between the keyword and the data element. Through this relationship, the corresponding data element position can be quickly obtained by the keyword.
How to construct a hash function

1) Direct addressing method

2) Remainder method

Methods of dealing with conflicts
1) Open addressing method

  • linear probing
  • Quadratic probing (square probing)

2) Zipper method (chain address method)

10. Sorting

Comparison of Top Ten Common Sorting Algorithms
Sorting Algorithm average time complexity best case worst case space complexity sort by stability
bubble O(n^{2}) O(n) O(n^{2}) O(1) in-place Stablize
choose O(n^{2}) O(n^{2}) O(n^{2}) O(1) in-place unstable
insert O(n^{2}) O(n) O(n^{2}) O(1) in-place Stablize
Hill O(nlogn) O(log^{2}n) O(log^{2}n) O(1) in-place unstable
Merge O(nlogn) O(nlogn) O(nlogn) O(n) out-place Stablize
fast O(nlogn) O(nlogn) O(n^{2}) O(nlogn) in-place unstable
heap O(nlogn) O(nlogn) O(nlogn) O(1) in-place unstable
count O(n+k) O(n+k) O(n+k) O(k) out-place Stablize
bucket O(n+k) O(n+k) O(n^{2}) O(n+k) out-place Stablize
Cardinality O(n\times k) O(n\times k) O(n\times k) O(n+k) out-place Stablize

Guess you like

Origin blog.csdn.net/qq_41750911/article/details/125041841