[Algorithm Analysis and Design] Greedy Algorithm (Part 2)


1. Single-source shortest path

  Given a weighted directed graph G = (V,E), the weight of each edge is a non-negative real number . In addition, a vertex in V is also given, called the source. NowTo calculate the shortest path length from the source to all other vertices. The length of the road here refers to the sum of the rights on each side of the road . This problem is often called the single-source shortest path problem .

1.1 Basic idea of ​​algorithm

  Dijkstra's algorithm is a greedy algorithm for solving the single-source shortest path problem.   Relevant concepts of Dijkstra   's
  algorithm   : The shortest path to u and passing through the vertices in S   dist[u]: The length of the shortest path from s to u relative to S short   [u]: The length of the shortest path from s to u   dist[u]>=short[u]






1.2 Algorithm design ideas

  Input: directed graph G=(V,E), V={1,2,…,n}, s=1
  Output: shortest path from s to each vertex
  1. Initial S={1}
  2. For i∈VS, calculate the shortest path from 1 to i relative to S, length dist[i], no path can be recorded as ∞ or maxint
  3. Select j with the smallest dist value in VS, add j to S, and modify the vertex of VS dist value
  4. Continue the above process until S=V

  The basic idea is,Set up a vertex set S and continuously make greedy selections to expand this set. A vertex belongs to a set S if and only if the length of the shortest path from the source to the vertex is known .
  Initially, S contains only sources. Suppose u is a vertex of G. The path from source to u that only passes through the vertices in S is called a special path from source to u, and the array dist is used to record the length of the shortest special path corresponding to each current vertex . Dijkstra's algorithm takes out the vertex u with the shortest special path length from VS each time, adds u to S, and makes necessary modifications to the array dist.Once S contains all vertices in V, dist records the length of the shortest path from the source to all other vertices.

  For example, forThe directed graph in the picture on the right, the process of applying Dijkstra's algorithm to calculate the shortest path from source vertex 1 to other vertices is listed in the table on the next page .
Insert image description here
  The iterative process of Dijkstra's algorithm:
Insert image description here


1.3 Correctness and computational complexity of the algorithm

  (1) Greedy selection property
  (2) Optimal substructure property
  (3) Computational complexity
  For a weighted directed graph with n vertices and e edges, if this graph is represented by a weighted adjacency matrix, then the Dijkstra algorithmMain loop bodyIt takes Insert image description heretime. This loop needs to be executed n-1 times, so it takes time to complete the loopInsert image description here . The rest of the algorithm takes no more time than Insert image description here.


1.4 Inductive proof ideas

  Proposition: When the algorithm reaches the kth step, for each node i in S, dist[i]=short[i]
  induction basis
  k=1, S={s},dist[s]=short[s] =0
  Inductive step
  proof: Assuming that the proposition is true for k, then the proposition is also true for k+1


1.5 Proof by induction steps

  Assume that the proposition is true for k, consider a k+1-step algorithm to select vertex v (edge ​​<u,v>). It needs to be proved that dist[v]=short[v]
if there is another sv path L, the last vertex of S is x, passes through the first vertex y of Vs, and then reaches v from y through a path in VS
Insert image description here


2. Minimum spanning tree

  Let G = (V, E) be an undirected connected weighted graph, that is, a network. The weight of each edge (v,w) in E is c[v][w]. If the subgraph G' of G is a tree containing all the vertices of G, then G' is called a spanning tree of G.The sum of the weights of each edge on the spanning tree is called the cost of the spanning tree.. Among all the spanning trees of G, the spanning tree with the least cost is called the minimum spanning tree of G.
  The minimum spanning tree of the network is widely used in practice . For example, when designing a communication network, use the vertices of the graph to represent cities, and use the weight c[v][w] of edge (v, w) to represent the cost required to establish a communication line between city v and city w, then the minimum Spanning tree provides the most economical solution for establishing a communication network.


2.1 Minimum spanning tree properties

  Using the greedy algorithm design strategy, we can design an effective algorithm for constructing a minimum spanning tree . This section introduces the Prim algorithm and Kruskal algorithm for constructing a minimum spanning tree.can be regarded as examples of applying greedy algorithm design strategies.. Although these two algorithms make greedy selection in different ways, they both take advantage of the following minimum spanning tree properties:
  Let G = (V, E) be a connected weighted graph, and U be a proper subset of V. If (u,v)E, and uU, vVU, and among all such edges, the weight c[u][v] of (u,v) is the smallest, then there must be a minimum tree of G Spanning tree, which has (u, v) as one of the edges. This property is sometimes called the MST property .


2.1.1 Properties of Spanning Tree

  Suppose G is an n-order connected graph, then
  T is a spanning tree of G if and only if T has no cycle and has n-1 edges .
  If T is a spanning tree of G and e does not belong to T, then T∪{e} contains a cycle C (loop).
  Remove any edge of the circle C, and you will get another spanning tree T' of G.


2.1.2 Application of Spanning Tree Properties

  Algorithm steps: Select edges
  Restrictions: Does not form a loop
  cutoff condition
  : Method to improve spanning tree T when the number of edges reaches n-1
  : add a non-tree edge e to T to form a loop C, remove a tree edge ei from C to form a new spanning tree T' W(T
  ' )-W(T)=W(e)-W(ei)
  If W(e)<=W(ei), then W(T')<=W(T)


2.2 Prim’s algorithm

  Suppose G=(V,E) is a connected weighted graph, V={1,2,…,n}. The basic idea of ​​Prim's algorithm
  for constructing the minimum spanning tree of G is: first set S={1}, and then, as long as S is a proper subset of V, make the following greedy selection: select iS, jVS that satisfy the conditions, And the edge with c[i][j] being the smallest adds vertex j to S. This process continues until S=V.   All the edges selected in this process exactly constitute a minimum spanning tree of G.

  It is easy to prove using the minimum spanning tree properties and mathematical induction that the edge set T in the above algorithm always contains the edges in a certain minimum spanning tree of G. Therefore, at the end of the algorithm, all edges in T form a minimum spanning tree of G.
  For example, for the weighted graph in the right figure, the process of selecting edges according to Prim's algorithm is shown in the figure on the following page.
Insert image description here
Insert image description here


2.2.1 Proof of correctness

  Proposition:For any k<n, there exists a minimum spanning tree containing the edges selected in the first k steps of the algorithm..
  Induction basis: k=1, there is a minimum spanning tree T containing edges e={1,i}, where {1,i} is the smallest weight among all edges associated with 1.
  Induction step: Assume that the edges selected in the first k steps of the algorithm constitute the edges of a minimum spanning tree, then the edges selected in the first k+1 steps of the algorithm also constitute the edges of a minimum spanning tree .


2.2.2 Inductive basis

  Proof: There exists a minimum spanning tree T that contains the minimum weight edge e={1,i} of associated node 1.
  Proof: Let T be a minimum spanning tree, assuming that T does not contain {1,i}, then T∪ {1,i} contains a cycle, and the other edge {1,j} of the cycle is associated with 1. Replace {1,j} with {1,i} to obtain tree T', then T' is also a spanning tree, and W(T')<=W(T)
Insert image description here


2.2.3 Induction steps

  Assume that the algorithm proceeds k steps, the edges of the spanning tree are e1, e2,...ek, and the endpoints of these edges form a set S. It is assumed by induction that there is a minimum spanning tree T of G containing these edges.
  In step k+1 of the algorithm, the vertex ik+1 is selected. Then the vertex edge weight from ik+1 to S is the smallest. Let this edge ek+1={ik+1 ,il}, if ek+1∈T, the algorithm step k+1 is obviously correct

  Assuming that T does not contain ek+1, add ek+1 to T to form a loop. This loop has another edge e of the mid-vertex connecting the edge e of the vertex in S and VS.
  Let T*=(T-{e})∪{ek+1}, then T is a spanning tree of G, including e1 ,e2,…ek+1, and W(T)<=W(T ) algorithm still obtains the minimum spanning tree at step k+1.
Insert image description here
  In the above Prim algorithm, we should also consider how to effectively find the condition iS, jVS, and the edge (i, j) with the smallest weight c. The simpler way to achieve this purpose is to set up 2 arrays closest and lowcost .
  During the execution of Prim's algorithm,First find the vertex j in VS that minimizes the lowcost value, then select the edge (j, closest[j]) according to the array closest, and finally add j to S, and make necessary modifications to closest and lowcost..
The calculation time required for Prim's algorithm implemented in this way is Insert image description here.


2.3 Kruskal algorithm

  The basic idea of ​​Kruskal's algorithm to construct the minimum spanning tree of G is to first treat the n vertices of G as n isolated connected branches . Sort all edges from small to large . Then starting from the first edge, view each edge in order of increasing edge weight, and connect two different connected branches as follows :When looking at the k-th edge (v, w), if the endpoints v and w are the vertices in the current two different connected branches T1 and T2 respectively, use the edge (v, w) to connect T1 and T2 into A connected branch, and then continue to look at the k+1th edgeIf endpoints v and w are in the same connected branch, directly check the k+1th edge.. This process continues until only one connected branch remains .
  For example, for the previous connected weighted graph, the edges on the minimum spanning tree obtained according to the Kruskal algorithm are as shown in the figure below.
Insert image description here
  Proposition: For any n, the algorithm finds a minimum spanning tree for an n-order graph .

2.3.1 Proof idea

  Inductive basic proof: n=2, the algorithm is correct. G has only one edge, and the minimum spanning tree is G.
  Proof of the induction step: Assuming that the algorithm is correct for n-order graphs, where n>1, then the algorithm also obtains a minimum spanning tree for any n+1-order graph. Short-circuit operation:
  any Given a graph G with n+1 vertices, the weight of the smallest edge in G is e={i,j}, merge i and j from G, and get the graph G'


2.3.2 Inductive step proof

  For any n+1-order graph G by shorting the shortest edge e, the n-order graph G' is obtained.
  According to the inductive hypothesis algorithm, the minimum spanning tree T' of G' is obtained.
  By "stretching" the shorted edge e back to its original length, we get Tree T
  proves that T is the minimum spanning tree of G
Insert image description here


2.3.3 T is the minimum spanning tree of G

  T=T'∪{e} is the minimum spanning tree about G. Otherwise, there is a minimum spanning tree   T*
  containing edge e of G , W(T*)<W(T). (If e does not belong to T*, add edge e to
T to form a loop. Remove any other edges in the loop and the resulting spanning tree still has the smallest weight.) Short-circuit e in T to get the spanning tree T*-{e} of G' ,   W(T*-{e})=W(T*)-w(e)<W(T)-w(e)=W(T'), contradicting the optimality of T'

  Some basic operations on sets can be used to implement Kruskal's algorithm.
  Viewing in increasing order of weight is equivalent to performing a removeMin operation on the priority queue . This priority queue can be implemented using a heap .
  Continuously modifying a set composed of connected branches requires the use of abstract data types and basic operations supported by UnionFind.
  When the number of edges of the graph is e, the calculation time required by Kruskal's algorithm is: Insert image description here. At that Insert image description heretime , Kruskal's algorithm was worse than Prim's algorithm, but Insert image description hereat that time, Kruskal's algorithm was much better than Prim's algorithm.


2.4 Application: Data Grouping Problem

  A set of data (photos, documents, etc.) needs to be classified according to correlation.
  Use a similarity function or "distance" to describe the differences between individuals.
  How many categories should they be divided into? Make the individuals within each category as close as possible and the individuals between different categories as "far apart" as possible. How to divide?


2.5 Single-chain clustering

  Similar to Kruskal's algorithm.
  Sort the edges in ascending order of edge length and
  examine the current shortest edge e. If e does not form a loop with the selected edge, add e to the set, otherwise skip e. Count the number of connected branches of the graph
  until k connected branches are retained.


3. Multi-machine scheduling problem

  The multi-machine scheduling problem requires a job scheduling plan so that the given n jobs can be processed by m machines in the shortest possible time.
  It is agreed that each job can be processed on any machine, but processing is not allowed to be interrupted before completion. Jobs cannot be split into smaller subjobs.
This problem is NP-complete and there is no effective solution so far. For this type of problem, a better approximation algorithm can sometimes be designed using a greedy selection strategy.
A better approximate algorithm for solving multi-machine scheduling problems can be designed by using the greedy selection strategy of giving priority to jobs with the longest processing time.
  According to this strategy, Insert image description herewhen , just allocate the [0, ti] time interval of machine i to job i, and the algorithm only requires O(1) time.
  At that Insert image description heretime , the n jobs are first sorted from large to small according to the processing time required. Jobs are then assigned to idle processors in this order. The calculation time required by the algorithm is O(nlogn).

  For example, assume that 7 independent jobs {1, 2, 3, 4, 5, 6, 7} are processed by 3 machines M1, M2 and M3. The processing time required for each job is {2,14,4,16,6,5,3} respectively. The job schedule generated by greedy algorithm is shown in the figure below, and the required processing time is 17.
Insert image description here


4. Summary

  greedy algorithmUsually used to find the optimal solution,
  always select the local optimal solution under the current situation, and proceed in sequence to obtain the overall optimal solution .
  The current best choice is usually a greedy algorithm that is easy to find
  . The correctness must be proved. Mathematical induction is generally used.
  The first step is an obvious
  induction step. It is usually proved by contradiction and falsified by giving counterexamples .

Guess you like

Origin blog.csdn.net/m0_65748531/article/details/133443035