[Soft Exam - Notes on Essential Knowledge Points for Software Designers] Chapter 3 Data Structure

Preface

Since the notes copied to CSDN have invalid styles, I don’t have the energy to completely check and set the styles again. Those with points can download the word, pdf, and Youdao cloud note versions.
It should be noted that the downloaded content is consistent with the content shared in this article. The only difference is the style [for example, the key memory and frequently tested content have colors, font sizes, weights, etc., and the directory structure is more complete. The table is not a picture, etc.]
Download address of this chapter:
https://download.csdn.net/download/chengsw1993/85505572

If you find that the article has reading differences, abnormal display, etc., please let us know in the comment area so that we can modify it. This should be caused by CSDN's markdown syntax.

Series of articles

Previous article:[Soft Exam - Notes on Essential Knowledge Points for Software Designers] Chapter 2 Basic Knowledge of Programming Languages

Next article:[Soft Exam - Notes on Essential Knowledge Points for Software Designers] Chapter 4 Operating System Knowledge

data structure

linear structure

Linear structure: Each element has at most one out-degree and one in-degree, appearing as a line. Linear lists are divided into sequential lists and linked lists according to storage methods.
Insert image description here

Storage structure:
Sequential storage: Use a set of storage units with consecutive addresses to store the data elements in the linear table sequentially, so that they are logically consistent Neighboring elements are also physically adjacent.
Chain storage: The addresses of nodes storing each data element are not required to be consecutive. The data elements are logically adjacent and physically separated.

linear table

Performance comparison between sequential storage and chained storage:
Insert image description here

Single list

Insert image description here

Insert the node pointed by s after the node pointed by p in the above figure. The operation is:
s->next=p->next;
p->next=s;
Similarly, when deleting the successor node q of the node pointed to by p in a singly linked list, the operation is:
p->next=p->next->next;
free(q);

stacks and queues

Insert image description here

In a circular queue, the head pointer points to the first element, and the tail pointer points to the next position of the last element. Therefore, when the queue is empty, head=tail, and when the queue is full, head=tail. This makes it impossible to distinguish, so , generally the queue will store one less element, so that the condition when the queue is full becomes tail+1=head. Considering it is a circular queue, the remainder must be divided by the maximum number of elements, that is, (tail+1)%size =head, the two formulas are shown on the right side of the figure above. The length formula of the circular queue is (Q.tail-Q.head)%size.

Priority Queue: Elements are given priority. When elements are accessed, the element with the highest priority is removed first. Use the heap for storage because it is not determined by the order in which elements are put into the queue.
Insert image description here

string

A string is a special linear list whose data elements are characters.

Empty string: A string of length 0 without any characters.

Space string: A string consisting of one or more spaces. Spaces are whitespace characters and account for one character in length.

Substring: A sequence of consecutive characters of any length in a string is called a substring. A string containing substrings is called the main string, and an empty string is a substring of any string.

String pattern matching algorithm: substring positioning operation, an algorithm used to find the position of the first occurrence of a substring in the main string.

Basic pattern matching algorithm: Also known as the Brut-Fosse algorithm, its basic idea is to compare the first character of the main string with the first character of the pattern string. If they are equal, continue to perform subsequent characters one by one. Compare; otherwise, start from the second character in the main string and re-compare with the first character in the pattern string until each character in the pattern string is equal to a continuous character sequence in the main string. This is called The match is successful, otherwise it is called a match failure.

KMP algorithm: An improvement on the basic pattern matching algorithm. The improvement is that whenever the compared characters are not equal during the matching process, there is no need to backtrack the character position pointer of the main string, but use the "partial" already obtained Match" results in "sliding" the pattern string as far to the right as possible before continuing the comparison.

Arrays, matrices, and generalized tables

matrix

Special matrix: The distribution of elements (or non-zero elements) in the matrix has certain rules. Common special matrices include symmetric matrices, triangular matrices and diagonal matrices.

Sparse matrix: In a matrix, if the number of non-zero elements is much less than the number of zero elements, and the distribution of non-zero elements is irregular. The storage method is a triplet structure, that is, (row, column, value) of each non-zero element is stored.
Insert image description here

generalized table

A generalized list is a generalization of a linear list and is a finite sequence composed of 0 or more single elements or sublists.

The difference between a generalized table and a linear table: The elements of a linear table are structurally indivisible single elements, while the elements of a generalized table can be either a single element or a structured table.

The generalized table is generally recorded as: LS=(α1, α2,…, αn)

Where LS is the table name, αi is the table element, which can be a table (called a subtable) or a data element (called an atom). Among them, n is the length of the generalized table (that is, the number of elements contained in the outermost layer). The generalized table with n=0 is an empty table; and the multiplicity of the recursive definition is the depth of the generalized table, that is, the multiplicity of the brackets contained in the definition. number (the number of single-sided brackets, the depth of an atom is 0, and the depth of an empty list is 1).

head() and tail(): take the head of the table (the first table element of the generalized table, which can be a subtable or a single element) and the tail of the table (all other table elements except the first table element in the generalized table For a table composed of table elements, the tail of the non-empty generalized table must be a table, even if the tail of the table is a single element) operation.

Trees and Binary Trees

The tree structure is a non-linear structure, with one predecessor and multiple successors

A tree is a finite set of n nodes (n>=0). n=0 is called an empty tree. In any non-empty tree, there is only one root node.
Insert image description here

Binary tree

A node can have at most two children.

Full binary tree: each level is full of nodes

Complete binary tree: level n-1 is full of nodes, and level n is full from left to right.

Binary tree storage structure

Sequential storage is to use a set of continuous storage units to store the nodes in the binary tree, and store each node in order from top to bottom and from left to right.

For a complete binary tree with a depth of k, except for the k-th layer, the number of nodes in each layer is twice that of the previous layer. Therefore, from the number of a node, the numbers of its parents, left children, and right children can be inferred. serial number. Suppose there is a node numbered i, then there is:

If =1, the node is the root node and has no parents.
If >1, the node's parents are (i+1)/2 (take an integer).
If 2i<=n, then the left child number of the node is 2i, otherwise there is no left child.
If 2i+1<=n, then the right child number of the node is 2+1, otherwise there is no right child.
If i is an odd number and not 1, the number of the left sibling of the node is i-1, otherwise there is no left sibling.
If i is an even number and less than n, the number of the right brother of the node is i+1, otherwise there is no right brother.

Linked storage structure of binary tree: Binary linked list is generally used to store binary tree nodes. In addition to the data of the node itself, the binary linked list also stores the pointer of the left child node and the pointer of the right child node, that is, a data + Two pointers. Each binary linked list node stores a binary tree node, and the head pointer points to the root node.

Binary tree traversal

A non-empty binary tree consists of three parts: the root node, the left subtree, and the right subtree. Traversing these three parts also traverses the entire binary tree. The basic order of traversal of these three parts is the left subtree first and then the right subtree, but the order of the root nodes is variable. There are the following three traversal methods based on the order of root node access:

Preorder (preorder) traversal: around the root.
In-order traversal: left root right.
Postorder traversal: left and right roots.

There are also level traversal methods:

Level traversal: by level, from top to bottom, from left to right.

Example:
Level: 12345678 Preamble: 12457836 Midorder: 42785136 Subsequent: 48752631

clue binary tree

The clue binary tree is introduced to save the information of the predecessor node and successor node of a node when traversing the binary tree. The chain storage of the binary tree can only obtain the left child and right child nodes of a node, but cannot obtain the predecessor and successor nodes of the node during traversal. , so you can add two more pointer fields to the chain storage to point to the predecessor and successor nodes respectively, but this wastes storage space. Consider the following implementation method:

If a binary tree with n nodes is stored using a binary linked list, there must be n+1 null pointer fields. These null pointer fields are used to store the node's predecessor and successor node information. For this reason, two flags need to be added to distinguish the pointers. Whether the domain stores child nodes or traversal nodes is as follows:

ltaglchilddatarchildrtag

If the binary linked list of a binary tree adopts the above structure, it is called a clue linked list, in which the pointers pointing to the predecessor and successor nodes are called clues, and the binary tree with clues added is called a clue binary tree.

optimal binary tree

The optimal binary tree is also calledHuffman tree (different translations can also be called Huffman tree), which is a type of weighted The tree with the shortest path length, the related concepts are as follows:

Path: The path between one node and another node in the tree.

The path length of a node: the number of branches on the path.

The path length of the tree: the sum of the path lengths from the root node to each leaf node.

Weight: The value represented by the node.

The weighted path length of a node: the length of the path from the node to the root node multiplied by the weight of the node.

The weighted path length of the tree (the cost of the tree): the sum of the weighted path lengths of all leaf nodes of the tree.

How to find the Huffman tree: Given a set of weights, use the two smallest weights as leaf nodes, and their sum as the parent node to form a binary tree, then delete the two leaf node weights and replace the parent node with The value is added to the group weight. Repeat the above steps until all values have been used, the left node is small and the right node is large, the left branch is 0 and the right branch is 1.

In the constructed Huffman tree, all the initially given weights are used as leaf nodes. At this time, the weighted path length of each leaf node is found, and then added, it is the weighted path length of the tree. This Length is minimal.
Insert image description here

Huffman coding: After the Huffman tree is constructed, the coding sequence starting from the root node and branch coding to the leaf node represents the coding of the leaf node.
Insert image description here

Numbers and forests

Forest: Two or more trees together are called a forest.

Tree storage structure [understand]

Parent representation: A set of consecutive address units are used to store the nodes of the tree, and each node is accompanied by an indicator indicating the subscript of the array element where its parent node is located.

Child representation: Use a pointer to indicate each child of a node in the storage structure, and create a linked list for the children of each node in the tree.

Child sibling representation: Also known as binary linked list representation, two pointer fields are set for each storage node, pointing to the first child and next sibling node of the node respectively.

Traversing trees and forests【Understanding】

Since each node in the tree may have multiple subtrees, there are two ways to traverse the tree:

Root first traversal: visit the root node first, and then traverse each subtree of the root in sequence.

Post-root traversal: first traverse each subtree of the root, and then visit the root node.

There are many trees in the forest, and the forest traversal methods are also divided into two types. Similar to tree traversal, each tree in the forest is subjected to root-first traversal or root-last traversal in turn.

Conversion between tree and binary tree
The rule is: the leftmost node of the tree is used as the left subtree of the binary tree, and the other sibling nodes of the tree are used as the right subtree nodes of the binary tree.

An example is as shown below: Use the connection method to connect the leftmost node and its sibling nodes, while the connection between the original parent node and sibling nodes is disconnected. This method is the simplest and requires mastery.
Insert image description here

Find binary tree

Search (sort) each node on the binary tree stores a value, and all left child node values of each node are less than the parent node value, and all right child node values are greater than the parent node value, It is a regularly arranged binary tree. This data structure can facilitate data operations such as search and insertion. Using inorder traversal, the results are arranged from small to large

The search efficiency of sorted binary trees depends on the depth of sorted binary trees. For binary sorted trees with the same number of nodes, the depth of balanced binary trees is the smallest; while the depth of single-branch trees is the largest, so the efficiency is the worst.

balanced binary tree

As mentioned earlier, searching (sorting) binary trees is characterized by the fact that all left subtree values are less than the root node value, and all right subtree values are greater than the root node value. This feature can construct multiple different binary trees, which are not unique, so a balance is proposed The concept of a binary tree, based on finding the characteristics of a binary tree, requires that the balance degree of each node can only be 0 or 1 or -1.

The depth of the left and right subtrees of a node is the number of layers of its left and right subtrees. Then subtract the depth of the left subtree from the depth of the right subtree to obtain the balance degree of the node. Therefore, a balanced binary tree means that the difference between the levels of any left and right subtrees does not exceed 1

Example (second half of 2013): 59. The pre-order traversal sequence of a certain binary tree is cabfedg, and the in-order traversal sequence is abcdefg, then the binary tree is (C).
A. Complete binary tree B. Optimal binary tree C. Balanced binary tree D. Full binary tree

Analysis: This binary tree can be constructed inversely based on pre-order traversal and in-order traversal. During construction, pre-order traversal can determine the root node, and in-order traversal is used to determine the left subtree node and right subtree node of the root node; by It can be seen that this tree is definitely not AD, has no weight, and has nothing to do with optimality. It can only be a balanced binary tree, because the absolute value of the depth subtraction of any left and right subtrees is <= 1.

picture

The graph is also anonlinear structure, and there may be a direct relationship between any two nodes in the graph. The relevant definitions are as follows:

Undirected graph: The connecting lines between the nodes of the graph have no arrows and no direction.

Directed graph: The connecting lines between the nodes of the graph are arrows, and there are two lines that distinguish A to B and B to A.

Complete graph: In an undirected complete graph, there are connections between every node. The number of connections for n nodes is (n-1)+(n-2)+…+1=n*(n-1 )/2; In a directed complete graph, there are two arrows connecting each node, and the number of connections between n nodes is n*(n-1).

Degree, out-degree, and in-degree: The degree of a vertex is the number of edges associated with that vertex. In a directed graph, the degree of a vertex is the sum of out-degree and in-degree. Out-degree is the number of directed edges (pointing out) starting from the vertex. Indegree is the number of directed edges (pointing to itself) that end at this vertex.

Path: There is a path that can reach from one vertex to another vertex. The path of a directed graph also has a direction.

Connected graphs and connected components: for undirected graphs. If there is a path from vertex v to vertex u, it means that v and u are connected. If any two vertices in an undirected graph are connected, it is called a connected graph. The maximal connected subgraph of an undirected graph G is called its connected component.

Strongly connected graphs and strongly connected components: for directed graphs. If there are paths between any two vertices of a directed graph, that is, there are paths from v to u and from u to v, it is called a strongly connected graph. The maximal strongly connected subgraph of a directed graph is called a strongly connected component.

Net: A graph with side weights is called a net.

Note: A weighted graph refers to a graph with a weight on each edge, and is often used to represent the cost of a path.

Graph storage

Adjacency matrix: Assuming there are n nodes in a graph, an n-order matrix is used to store the relationship between the nodes in the graph. The rule is that if there is a connection from node i to node j, then the matrix Ri,j=1, otherwise it is 0 . Therefore, if it is an undirected graph, it must be symmetrical along the diagonal. You only need to store the upper triangle or the lower triangle, but a directed graph is not necessarily symmetrical. An example is shown below:
Insert image description here

Adjacency linked list: Two data structures are used. First, a one-dimensional array is used to store all the vertices in the graph. Then, for each vertex element of the one-dimensional array, a linked list is used to hang the nodes that have a connection relationship with it. The number and weight of , an example is shown in the figure below:
Insert image description here

Example question:
The basic storage structures of graphs include adjacency matrix representation and adjacency linked list representation. The number of vertices in the graph determines the order of the adjacency matrix and the number of singly linked lists in the adjacency list. Whether it is a directed graph or an undirected graph, the number of edges determines the number of nodes in the singly linked list without affecting the adjacency matrix. scale, so complete graphs are suitable for adjacency matrix storage.

Graph traversal

Graph traversal means starting from any node in the graph and visiting all nodes in the graph along a certain search path and only once. It is divided into the following two methods:

Depth-first traversal: Start from any vertex, traverse to the end, until return, then select any other node to start, repeat this process until the entire graph is traversed;

Breadth-first traversal: First visit all the adjacent vertices of a vertex, and then visit all the adjacent vertices of its adjacent vertices in sequence, similar to hierarchical traversal.
Insert image description here

Minimum spanning tree of graph

Assuming there are n nodes, then the minimum spanning tree of this graph has n-1 edges (it will not form a loop, it is a tree rather than a graph). These n-1 edges will connect all vertices into a tree, and The sum of the weights of these edges is the smallest, so it is called a minimum spanning tree.

Prim's algorithm Prim (Understand): Starting from any vertex, find the edge with the smallest weight adjacent to it. At this time, the other vertex of this edge is automatically added to the tree. in the set, and then find the edge with the smallest weight adjacent to it from all the vertices of the tree set, and add another vertex of this edge to the tree set, and then recurse until all the vertices in the graph are added to the tree set. At this time, this tree is the minimum spanning tree of the graph.

Kruscal algorithm (recommended): This algorithm starts from the edges, because the essence is to select the n-1 edges with the smallest weight. Therefore, The edges are sorted by their weights, and the edges with the smallest weights are selected in order until all nodes are included. It should be noted that after each edge selection, it is necessary to check that no loops are formed.
Insert image description here

The above two algorithms both consider the local optimum first and are greedy algorithms; the weights of the minimum spanning trees obtained by the two algorithms are the same, but the spanning trees are not necessarily the same. From the description, it can be seen that Kruscal is related to the number of edges, the more edges there are The less conducive to selection, therefore, when the network is dense, it is better to use the Prim algorithm. There is no absolute higher efficiency between the two algorithms.

topological sequence of graphs

AOV network (a network in which vertices represent activities): In a directed graph, vertices represent activities, and directed edges represent the priority relationships between activities.

The AOV network is used to represent large engineering project execution plans, so directed cycles cannot appear. If it exists, it means that an activity must be based on the completion of its own task. Therefore, if you want to detect whether a project is feasible, you should first Check whether there is a loop in the corresponding AOV network. The detection method is to construct a topologically ordered sequence of vertices of a directed graph.

Construction method: Use the directed edges of the directed graph as the order in which activities start. If the in-degree of a node in the graph is 0, this activity should be executed first, and then the node and its associated directed edges should be deleted, and then Find other nodes in the graph that have no in-degree, perform activities, and proceed in sequence. The example is as follows:
Insert image description here

The above diagram can start from 1 or 6, so the topological sequence is not necessarily unique.