[Soft Exam - Notes on Essential Knowledge Points for Software Designers] Chapter 8 Algorithm Analysis and Design

Preface

Since the notes copied to CSDN have invalid styles, I don’t have the energy to completely check and set the styles again. Those with points can download the word, pdf, and Youdao cloud note versions.
It should be noted that the downloaded content is consistent with the content shared in this article. The only difference is the style [for example, the key memory and frequently tested content have colors, font sizes, weights, etc., and the directory structure is more complete. Tables are not pictures, etc.]

Location of this chapter:
https://download.csdn.net/download/chengsw1993/86016249

If you find that the article has reading differences, abnormal display, etc., please let us know in the comment area so that we can modify it. This should be caused by CSDN's markdown syntax.

Series of articles

Previous article:[Soft Exam - Notes on Essential Knowledge Points for Software Designers] Chapter 7 Object-Oriented Technology

Next article:[Soft Exam - Notes on Essential Knowledge Points for Software Designers] Chapter 9 Basics of Database Technology

Algorithm basics

Basic knowledge of algorithms

Five important characteristics of the algorithm:

  • Finiteness
  • certainty
  • feasibility
  • Input: 0 or more
  • Output: one or more

Algorithm analysis basics

Algorithm complexity includes two aspects, one is the measurement of algorithm efficiency (Time complexity), and the other is the amount of time required to run the algorithm. A measure of the amount of computer resources (Space Complexity), which is also an important basis for evaluating the quality of an algorithm.

time complexity

The time complexity of a program refers to the time it takes for the program to run from start to finish. The usual method of analyzing time complexity is to select an operation from the algorithm that is a basic operation for the problem under study, and use the number of times the operation is repeated as the time measurement of the algorithm. Generally speaking, the number of times the original operation is repeated in the algorithm is some function T(n) of size n. Since it is difficult to accurately calculate T(n) in many cases, asymptotic time complexity is introduced to quantitatively estimate the execution time of an algorithm.

We usually use "O()" to express time complexity, which is defined as follows: If there are two positive constants c and m, for all n, there is f(n) when n≥m ≤cg(n), then f(n)=O(g(n)). That is to say, as n increases, f(n) is asymptotically no larger than g(n). For example, if the actual execution time of a program is 3n^3+2n^2+n, then T(n)=O (n^3). The values ​​of T(n) and n^3 move asymptotically closer as n grows. Common asymptotic time complexities are:
Insert image description here

space complexity

The space complexity of a program refers to the amount of storage required to run the program from start to finish. It usually consists of two parts: a fixed part and a variable part.

In the analysis and design of algorithms, it is often found that there is a subtle relationship between time complexity and space complexity, and they can often be converted into each other, that is, you can use space to exchange time, or you can use time to exchange space.

Find

Sequential search: Compare the element with the keyword key to be found from the beginning to the end with the elements in the table. If there is an element with the keyword key in the middle, return success ; Otherwise, the search fails.
The time complexity is O(n).

Half (bisection) search: The premise is that the data must be sorted in ascending/decreasing order
Suppose the elements of the lookup table are stored in a one-dimensional array In r[1...n], when the elements in the table have been sorted in ascending order of keywords, the method of performing a binary search is:
1. First, the keywords of the elements to be searched ( key) value is compared with the key recorded in the middle position of table r (the subscript is mid). If they are equal, the search is successful;
2. If key>r[mid].key, This means that the record to be searched can only be in the second half of the sub-table r[mid+1...n], and the next step should be to search in the second half of the sub-table;
3. If key<r[mid ].key, indicating that the record to be searched can only be in the first half of the sub-table r[1...mid-1], and the next step should be to search in the first half of the sub-table of r;
4. Repeat the above steps , gradually narrow the range until the search succeeds or the subtable fails when it is empty.
Two points should be noted: if the middle value position is a decimal, it should be rounded down, that is, 4.5=4, not rounded; the middle value has been compared and is not equal, and the next comparison is done after division interval, there is no need to include the intermediate value position in the next comparison interval. When more data is searched, binary search becomes more efficient.

In the previous search method, since the relative position of the record in the storage structure is random, the search must go through a series of comparisons with keywords to determine the position of the record in the table. In other words, this type of search is based on the comparison of keywords, and the hash table uses a recorded keyword as its own The function of the variable (called the hash function) obtains the storage address of the record, so when performing a search operation in the hash table, you need to use the same hash function to calculate the storage address of the record to be searched, and then go to the corresponding storage unit Obtain relevant information and then determine whether the search is successful.
Hash (hash) table: Based on the set hash function H (key) and the method of handling conflicts, a set of keywords are mapped to A limited continuous address set, and the "image" of the key in the address set is used as the storage location of the record in the table.
Insert image description here

In the above figure, it is obvious that the hash function has produced a conflict, and the linear detection method is used to resolve the conflict. There are other methods as follows:

  • Linear detection method: fetch the next free storage space in physical address order.
  • Pseudo-random number method: randomly store conflicting data into any free address.
  • Rehash method: After the original hash function conflicts, continue to use this data to calculate another hash function to resolve the conflict.

sort

Sorted categories:

Stable and unstable sorting: The basis is that the order of two identical values ​​in a sequence to be sorted and the sorted order should be relatively unchanged, that is, at the beginning 21 is before 21. After the sorting is completed, if 21 is still before 21, it is a stable sorting and the relative order of the same values ​​will not change.
Internal sorting and external sorting: Depending on whether the sorting is performed in memory or externally.

There are many sorting algorithms, which can be roughly classified as follows:

  • Insertion sorting: direct insertion sort, Hill sort.
  • Exchange sorting: bubble sort, quick sort.
  • Selection sorting: simple selection sort, heap sort.
  • Merge sort.
  • Radix sort.

Direct insertion sort - insertion sort

Direct insertion sorting is a simple sorting method. The specific method is: when inserting the i-th record, R1, R2,..., Ri-1 have been sorted. At this time, the keyword Ki of Ri is Compare it with the keywords Ki-1, Ki-2, etc. in order to find the position that should be inserted and insert Ri. The insertion position and subsequent records are moved backward in sequence.
It's simple and straightforward, but slow.
It should be noted that 前提条件是前i-1个元素是有序的 is essentially insertion sort.
Insert image description here

Hill sort - insertion sort

The basic idea of ​​Hill sorting is: first divide the entire sequence of records to be sorted into several sub-sequences, and then perform direct insertion sorting respectively. When the records in the entire sequence are basically in order, perform another pass on all records. Direct insertion sort.
is an improvement on direct insertion sorting, which is suitable for sorting big data and can improve efficiency.
Insert image description here

Simple selection sorting - selection sorting

The essence is to select the smallest element for exchange each time, which is mainly the selection process, and the exchange process is only once.
Insert image description here

Heap sort - selection class sort

Heap sort is suitable for the design of finding the top few elements among multiple elements, because heap sort is a selection sort, and it is very efficient to select the top few elements.
Insert image description here
Insert image description here
Insert image description here

Summary: Construct the initial tree according to the complete binary tree. Starting from the leaf node, compare the child node with its parent node, and put the larger one on top. The first few digits of the final tree obtained are from Order from largest to smallest

Bubble sort - exchange sort

The essence is to start the comparison from the last two elements, swap the smaller elements to the front, and compare and swap in sequence. Smaller bubbles rise to the surface.
Insert image description here

Quick sort - exchange sort

The basic idea of ​​quick sort is to divide the records to be sorted into two independent parts through sorting. The keywords of one part of the record are not greater than the keywords of the other part of the record, and then continue to quickly sort the two parts of the records separately. , to achieve the ordering of the entire sequence.

Example:
A quick sort: set a reference to 57, set two pointers i=1, j=n, starting from the nth element pointed to by j, Compare with the reference value. If it is less than the reference value, the j element is exchanged with the reference value and is j-1. At this time, the i-th element pointed to by the pointer i is compared with the reference value. If it is greater than the reference value, then It is exchanged with the reference value and i+1. At this time, the value pointed from j is compared with the reference value.

Finally, 57 is used as the boundary. The left side is all elements less than 57, and the right side is all elements greater than 57. Complete a quick sort, and then perform recursion on the two blocks respectively.
Insert image description here

merge sort

The sorting method that repeatedly merges two ordered files into one ordered file is called two-way merge sort.
Insert image description here

Radix sort

Insert image description here

★Summary of sorting algorithm★

Insert image description here

As mentioned before, stability means that the relative positions of two equal elements remain unchanged before and after sorting.

In terms of space complexity, most sorting is comparison and exchange, which requires no extra space. Quick sorting requires storing the baseline value each time. Merge sorting requires a new table. Radix sorting requires a new table and space for storing keywords.

时间复杂度中,与堆、树、二分有关的算法都是n*log₂n,直接的算法都是n², the above conclusion can be easily reached by analyzing the algorithm principles.

Forbasically ordered data, such as {1, 1, 2, 4, 7, 5}, < a i=3>The optimal solution for sorting is insertion sort, and the worst solution is quick sort.

Because insertion sort is compared sequentially, and the data itself is basically ordered, just compare the last data each time. The time complexity is the number of comparisons, that is, O(n). Only the data compared each time is needed as assistance, so the space Complexity O(1)

Quick sorting needs to start from the i-th number and compare it sequentially. Since it is basically ordered, it must be compared to the last element j every time, so the time complexity is O(n²) and the space complexity is O(n)

Commonly used algorithms

divide and conquer

The principle of divide and conquer isrecursive.

Recursion: Calling itself directly or indirectly through a subroutine

Basic elements of recursion: boundary conditions (when does recursion end, that is, recursion exit), recursion mode (how to decompose large problems into small problems, that is, recursive body)

Divide and Conquer Method: For a problem of size n, if the problem can be solved easily, solve it directly; otherwise, decompose it into k smaller sub-problems , these sub-problems are independent of each other and have the same form as the original problem. Solve these sub-problems recursively, and then combine the solutions of each sub-problem to obtain the solution of the original problem.

Steps: Decompose (decompose the original problem into a series of sub-problems) - Solve (recursively solve each sub-problem, and if the sub-problem is small enough, solve it directly) - Merge (merge the solutions of the sub-problems into the solution of the original problem).

凡是涉及到分组解决的都是分治法(二分查找、归并排序、快速排序、希尔排序等)

Backtracking

Backtracking method: is known as the "universal problem-solving method", which can systematically search for all solutions or any solution to a problem . In the solution space tree containing all solutions to the problem, the search for the solution space tree is triggered from the root node according to the depth-first strategy. When searching for any node, always first determine whether the node definitely does not contain the solution to the problem. If it does not, skip the search for the subtree rooted at the node and search for its ancestor nodes layer by layer. Backtrack; otherwise, enter the subtree and continue searching according to the depth-first strategy.

can be understood as First perform a depth-first search and explore downwards. When this path is blocked, return to the previous layer to explore other branches and repeat this step. This is backtracking. , which means to keep detecting first, and then return to the previous level when unsuccessful.
一般用于解决迷宫类的问题。

Real question: Question 4 in the afternoon of the first half of 2019

dynamic programming

Dynamic programming method: In solving the problem, for each decision-making step, various possible local solutions are listed, and then based on certain judgment conditions, discarding which ones will definitely not lead to the best result The local solutions of the optimal solution are screened at each step, so that each step is the optimal solution to ensure that the global solution is the optimal solution.

The essence is alsoDivide a complex problem into a sub-problem. The difference from the divide-and-conquer method is that each sub-problem is not independent of each other and not all the same .

常用于求解具有某种最优性质的问题。

This algorithm will put a lot of effort into constructing the table in the early stage. It will list various possible answers for each step. These answers will be stored. When a certain result is finally obtained, it is through query Get this table. Dynamic programming method is not only optimal at each step, but also optimal globally.

The time complexity is O(nW). For the knapsack problem, W represents the knapsack capacity.

greedy method

Greedy method: Always make the best choice at the moment without considering it as a whole. Each step it makes is only the current step. The local optimal choice, but it may not be the optimal choice overall. Since it is no more time-consuming than exhausting all possible solutions in order to find the optimal solution, it can generally obtain a satisfactory solution quickly, but cannot obtain the optimal solution.

局部贪心,只针对当前的步骤取最优,而非整体考虑。

To judge this type of algorithm, it depends on whether the algorithm is optimal at every step, and the overall meaning of the question does not reveal that the final result is optimal.

Time complexity O(n²)
Insert image description here

0-1 backpack: An item is a whole, either all of it is put in or none is put in. In the above example, {2, 2, 6, 5, 4} respectively represent one kind of item. For example, in the 0-1 backpack, 6 Either put them all in or not put them in, you can't just put them in one. In some backpacks, it can be separated and put in.

branch and bound method

Branch and bound method: Similar to the backtracking method, it is also an algorithm that searches for a solution to the problem on the solution space tree of the problem.

Search principle: In the branch and bound method, each live node has only one chance to become an expansion node. Once a live node becomes an extended node, all its child nodes are generated at once. Among these child nodes, the child nodes that lead to infeasible solutions or non-optimal solutions are discarded, and the remaining child nodes are added to the live node list. Thereafter, a node is taken from the live node table to become the current expansion node, and the above node expansion process is repeated. This process continues until the required solution is found or the slipknot table is empty.

The difference from the backtracking method:

  1. The solution goal is to find a solution that satisfies the constraint conditions, or to find a solution that maximizes or minimizes the value of a certain objective function among the solutions that satisfy the constraint conditions, that is,The optimal solution in a certain sense.
  2. Search the solution space tree in a breadth-first manner or in a minimum cost (maximum benefit) first manner.

Select the type of next extended node from the live node table:

  • Queued (FIFO) branch and bound method: Select the next node as the expansion node according to the queue first-in-first-out (FIFO) principle.

  • Priority queue branch and bound method: Select the node with the highest priority to become the current expansion node according to the priority specified in the priority queue.

Probabilistic algorithm

Probabilistic algorithm: When the algorithm performs certain steps, it canrandomly choose how to proceed in the next step,At the same time, the results are allowed to have errors with a smaller probability, and at the expense of this,obtain a substantial reduction in the algorithm running time (reduce the complexity of the algorithm) degree).

基本特征: If you use the same probability algorithm to solve the same instance of the problem you are solving twice, you may get completely different results.

Applicable scenarios: If a problem does not have an effective deterministic algorithm that can provide a solution in a reasonable time, but the problem can accept a small probability error, then a probabilistic algorithm can be used to quickly find the solution to the problem.

Four types of probabilistic algorithms:

  • Numerical Probability Algorithms (Solving of Numerical Problems)
  • Monte Carlo algorithm (finding the exact solution to the problem)
  • Las Vegas algorithm (will not get incorrect solutions)
  • Sherwood's algorithm (can always find a correct solution to the problem)

Characteristics of probabilistic algorithms:

  1. The input of the probabilistic algorithm includes two parts, one is the input of the original problem, and the other is a random number sequence for the algorithm to randomly select.
  2. During the running process of the probabilistic algorithm, one or more random selections are included, and the running path of the algorithm is determined based on the random values.
  3. The results of probabilistic algorithms cannot be guaranteed to be correct, but they can limit the probability of errors.
  4. Probabilistic algorithms can have different results for the same input instance during different runs. Therefore, for the same input instance, the execution time of the probabilistic algorithm may be different.

approximation algorithm

Approximate algorithm: An effective strategy for solving difficult problems. The basic idea is to give up seeking the optimal solution and replace the optimal solution with an approximate optimal solution in exchange for simplification of algorithm design and reduction of time complexity.

Although it may not find an optimal solution, it will always provide a solution to the problem to be solved.

In order to be practical, the approximate algorithm must be able to give a limit on the difference or ratio between the solution generated by the algorithm and the optimal solution, which ensures that the difference between the approximate optimal solution and the optimal solution of any instance is Chengdu. Obviously, the smaller the difference, the more practical the approximation algorithm is.

There are two criteria for measuring the performance of approximation algorithms:

  1. The time complexity of the algorithm. The time complexity of the approximation algorithm must be of polynomial order, which is the basic goal of the approximation algorithm.
  2. The degree of approximation of the solution. The degree of approximation to the approximate optimal solution is also an important goal in designing approximation algorithms. The degree of approximation is related to the approximation algorithm itself, the size of the problem, and even different input instances.

Data Mining Algorithms [New Knowledge Points]

Overview: Technology that analyzes the explosive growth of various types of data to discover valuable information and knowledge hidden in these data. Data mining uses its learning methods to analyze and mine a variety of data. Its core is the algorithm, and its main functions include classification, regression, association rules and clustering.

Classification: is a supervised learning process, a model that predicts future data based on historical data.

Categorical data object attributes: general attributes, classification attributes, or target attributes.

Data for classification design: training data set, test data set, unknown data.

Two steps for data classification:Learning model (using a classification algorithm to establish a learning model based on the training data set), < a i=3>Apply model (Apply the data from the test data set to the learning model, evaluate the quality of the model based on the output, and input unknown data into the learning model to predict the type of data) . Classification algorithm: Decision tree induction (top-down recursive tree algorithm), Naive Bayes algorithm, Backpropagation BP, Support Vector Machine SVM .

Mining of frequent patterns and association rules: Mining frequent patterns and association rules in massive data can effectively guide enterprises to discover cross-selling opportunities, conduct decision-making analysis and business management, etc. (Walmart-Beer Diaper Story)

First, we need to find the frequent patterns in the data set, and then generate association rules from the frequent patterns.

Association rule mining algorithms: Apriori-like algorithms, methods based on frequent pattern growth such as FP-growthh, and algorithms using vertical data formats, such as ECLAT.

聚类:是一种无监督学习过程. According to the characteristics of the data, similar data objects are classified into one category, and dissimilar data objects are classified into different categories. Birds of a feather flock together.

Typical algorithms: partition-based method, hierarchy-based method, density-based method, grid-based method, statistical model-based method.

Applications of data mining: credit analysis, classification and clustering of customers for targeted promotions, etc.

Intelligent optimization algorithm [new knowledge point]

Overview: Optimization technology is an application technology that is based on mathematics and used to solve optimal solutions to various engineering problems.

Artificial Neural Network ANN: A dynamic system with a directed graph as its topology, which processes information by responding to continuous or intermittent state inputs. Abstract the human brain neuron network from the perspective of information processing, establish a simple model, and form different networks according to different connection methods.

The concept of deep learning originates from the research of artificial neural networks and is a new field in machine learning research.

Genetic Algorithm: Derived from simulating Darwin’s theory of evolution of “survival of the fittest, survival of the fittest” and Mendel Morgan’s theory of genetic variation, maintaining the existing structure during the iterative process , while looking for better structures. Its original intention is to design a natural-based evolutionary mechanismin an artificial adaptive system.

Simulated annealing algorithm SA:求解全局优化算法. The basic idea comes from the physical annealing process, which includes three stages: heating stage, isothermal stage, and cooling stage.
Heating the solid to a sufficiently high temperature and then allowing it to cool slowly. When heating, the particles inside the solid become disordered as the temperature rises, and the internal energy increases. When it is slowly cooled, the particles gradually become disordered. It tends to be orderly, reaching an equilibrium state at each temperature, and finally reaches the ground state at room temperature, and the internal energy is reduced to a minimum.

Contraindication Search Method TS:模拟人类智力过程的一种全局搜索算法,是对局部邻域搜索的一种扩展.

Starting from an initial feasible solution, a series of specific search directions (movements) are selected as trials, and the movement that changes the value of the specific objective function the most is selected. In order to avoid falling into the local optimal solution, a flexible "memory" technology is used in TS search to record and select the optimization process that has been carried out to guide the next search direction. This is the establishment of the Tabu table.

Ant colony algorithm: It is a method used for寻找优化路径的概率型算法.

The behavior of a single ant is relatively simple, but the ant colony as a whole can embody some intelligent behaviors. For example, ant colonies can find the shortest path to food sources in different environments. This is because the ants in the ant colony can transmit information through a certain information mechanism. Later, further research found that ants will release a substance called "pheromone" along the path they pass. The ants in the ant colony have the ability to sense "pheromone", and they will follow the "pheromone". "The ant colony walks along the path with higher concentration, and each passing ant will leave "pheromone" on the road. This forms a mechanism similar to positive feedback. In this way, after a period of time, the entire ant colony will follow the shortest path to the destination. Food source.

The walking path of ants is used to represent the feasible solution to the problem to be optimized, and all paths of the entire ant colony constitute the solution space of the problem to be optimized. Ants with shorter paths release more pheromones. As time goes by, the accumulated pheromone concentration on the shorter path gradually increases, and the number of ants choosing this path also increases. Eventually, the entire ants will converge on the best path under the action of positive feedback, which corresponds to the optimal solution to the problem to be optimized.

Particle Swarm Optimization Algorithm PSO: Also known as 鸟群觅食算法, when birds are flying for food, they often change direction suddenly during the flight. , disperse and gather, its behavior is unpredictable, but its overall consistency is always maintained, and the optimal distance between individuals is also maintained. Through the study of the behavior of similar biological groups, it was discovered that there is an information sharing mechanism in biological groups, which provides an advantage for the optimization of the group. This is the basis for the formation of the basic particle swarm algorithm.

Consider this scenario: a flock of birds are randomly searching for food. There is only one piece of food in this area. None of the birds know the food is there. But they know how far away the food is from their current location. So what is the optimal strategy for finding food. The simplest and most effective way is to search the area around the bird currently closest to the object.

PSO takes inspiration from this model and is used to solve optimization problems. In PSO, the solution to each optimization problem is a bird in the search space. We call them "particles". All particles have a fitness value determined by the optimized function, and each particle also has a speed that determines the direction and distance in which they fly. The particles then search in the solution space following the current optimal particle.

PSO is initialized as a group of random particles (random solution). Then find the optimal solution through iteration. In each iteration, the particle updates itself by tracking two "extreme values". The first one is the optimal solution found by the particle itself. This solution is called the individual extreme value pBest. The other extreme value is the optimal solution currently found by the entire population. This extreme value is the global extreme value gBest. In addition, you can not use the entire population but only use a part of it as the neighbors of the particles, then the extreme value among all neighbors is the local extreme value.

Guess you like

Origin blog.csdn.net/chengsw1993/article/details/125714839