Greedy algorithm-minimum spanning tree Kruskal and Prim algorithm

Preface

When discussing graphs, we used DFS and BFS search to form a spanning tree. However, in a weighted undirected graph, the spanning tree generated by DFS or BFS generally does not have the minimum cost (that is, the sum of the edge sets of the spanning tree).

But in real life there are applications of minimum spanning tree, such as the question of where a water delivery station or express company should be built in a city. Our excellent predecessors began to design algorithms to solve this practical problem. Kruskal and Prim to be introduced this time are also algorithms named after their authors.

The logic of these two algorithms is very simple, the key point is to understand why these two algorithms are feasible, that is, to verify the correctness of the algorithm. At the same time, pay attention to the selection of data structure.

Kruskal algorithm

1) Problem description

We have briefly introduced the spanning tree problem in that part of the figure, so I won't repeat it here.

For a given weighted undirected graph, there are n vertices, and the design algorithm selects (n-1) edges to form the minimum cost spanning tree.

2) Algorithm idea

The idea of ​​greedy algorithm is still used, and the greedy criterion is given.

Greedy criterion: Choose an edge at each step. The basis for selecting an edge is to select an edge with the least cost and no loop from the remaining edges to join the selected edge set.

/**
Kruskal
在n个顶点的网络中寻找一颗最小成本生成树
令T是选定的边集,初始时T为空
令E是网络的边集
while(!E.empty() && |T|!= n-1)
{
    令(u, v)是E中一条成本最小的边
    从E中删除边(u,v)
    if((u,v)在T中不会产生环路)
        把边(u,v)加入T
}
if(|T| == n-1)
    T是一颗最小成本生成树
else
    图不是连通图,没有生成树

*/   

Fake code:
Insert picture description here

3) The correctness of the algorithm

Proving the correctness of the algorithm is divided into two parts. One is to prove that a spanning tree can be produced through the algorithm. When the algorithm fails, the graph is disconnected and there is no spanning tree; the other is that the generated tree has the least cost.

The first part:
first prove that the algorithm fails, that is, it cannot find an edge that joins the selected edge set and does not form a ring, and the algorithm ends. But in the last step T <n − 1 T<n-1T<n1. It means that there is a vertex that has not been searched, it has no edge to connect with any other vertices, the graph is not connected, and the spanning tree does not exist.

If the graph is connected, it is proved that the returned spanning tree is. According to the greedy criterion, as long as any edge is added to the final edge set T, a loop will be formed. In other words, in a connected graph, only the edges constituting the cycle are not selected. And a connected graph with a cycle is still a connected graph after removing the edges that constitute the cycle. A set of connected edges without loops is a spanning tree.

The second part :

I read this part of "Data Structures, Algorithms, and Applications" at the beginning, but didn't read it twice. After reading it several times, I didn't understand it. I opened the PPT and read it in English several times, but I continued to persuade me to leave without the teacher’s explanation in English. Finally, I opened my Chinese version of "Introduction to Algorithms".

In short, the explanation of the introduction to algorithms is the clearest and most basic. Because "Data Structure, Algorithms and Applications" does not involve the proof of basic theorems, it only focuses on verbal logic derivation, which I really find it difficult to understand.

I try to express it concisely, and please correct me if I don't understand it well.

The following argument is based on the figure below:
Insert picture description here

  • The minimum spanning tree is not unique, and there may be multiple minimum spanning trees in a weighted undirected graph.
  • Let set A be a subset of a certain minimum spanning tree, and A is empty at the beginning.
  • A ∪ ( u , v ) A\cup(u,v) A( u ,v ) is also a certain minimum spanning tree subset,(u, v) (u, v)( u ,v ) join set A, call(u, v) (u,v)( u ,V ) is the set Asafety edge.
  • Undirected graph G = (V, E) G = (V, E)G=( V ,E ) acut(S, V − S) (S, VS)(S,VS ) is a division of the set V, the edge(u, v) (u, v) in thefigure above( u ,v ) The endpoints on both sides are located in two sets, called(u, v) (u,v)( u ,v ) Across thecut. If there is no edge across the cut in set A, it is said that the cutrespectsset A. Among all the edges across, the edge with the smallest weight is called thelight edge. In the figure above, if(u, v) (u, v)( u ,The weight of v ) is the smallest among the three spanning edges, then(u, v) (u,v)( u ,v ) is the light side.
  • Theorem : Let A be a subset of a certain minimum spanning tree, if (S, V − S) (S,VS)(S,VS ) is a cut that respects the set A, and the edge(u, v) (u, v)( u ,v ) is again across the light side of the cut, then(u, v) (u,v)( u ,v ) is the safe edge of set A. A ∪ (u, v) A\cup(u,v)A( u ,v ) is also a certain minimum spanning tree subset.
  • The above theorem proves: Since the cut respects the set A, then A is divided into two connected components in the graph. If the two connected components want to form a minimum spanning tree, an edge across the cut must be added. This side must be the light side, otherwise the cost must be greater than the cost of choosing the light side.
  • At any time of the algorithm, A is a forest, and each connected component in A is a tree (may contain only one node).
  • To merge A from the forest into a tree, you need to add (V − 1) (V-1)( V1 ) A safe side. Each edge added by Kruskal's algorithm is a safe edge.

4) Data structure selection and time complexity

  • Since we need to select edges in a non-decreasing order, we use the minimum heap implementation. If there are e edges in the graph, then the initialization of the heap takes O (e) O (e)O ( e ) , the extraction time of an edge takesO (log (e)) O(log(e))O ( l o g ( e ) )
  • In order to determine the edge (u, v) (u, v)( u ,v ) Whether there will be a loop, it is necessary to check whether u and v belong to the same vertex set. We need to use union-find. Suppose we use the rank merge plus path compression algorithm to achieve, if there are e edges and n nodes in the graph, then at most 2e find() operations and (n-1) unite() operations are required, and the time complexity isO (e) O(e)O ( e )
  • Finally, storing n-1 edges requires a time complexity of O (n) O(n)O ( n )
  • The total time complexity is O (n + elog (e)) O(n+elog(e))O ( n+elog(e))

5) C++ implementation

/**
Kruskal
weightedEdge<T>类,定义了一个向数据类型T的类型转换,返回
的是边上的权。
*/

bool kruskal(weightedEdge<T>* spanningTreeEdges)
{
    
    
    int n = numberOfVertices();
    int e = numberOfEdges();

    weightedEdge<T>* edge = new weightedEdge[e+1];
    int k = 0;
    for(int i = 1; i<=n; i++)// 提取出所有的边
    {
    
    
        vertexIterator<T> *ii = new iterator(i);
        int j;
        T w;
        while((j = ii->next(w)) != 0) // 在这里给w赋值
        {
    
    
            if(i<j) // 防止重复
            {
    
    
                edge[++k] = weightedEdge<T>(i,j,w);
            }
        }
    }
    // 初始化最小堆
    minHeap<weightedEdge<T>> heap(1);
    heap.initialize(edge,e);

    // 声明并查集结构
    UnionFind uf(n);

    k=0;
    while(e > 0 && k < n - 1)
    {
    
    
        // 从最小堆中提取边
        weightedEdge<T> x = heap.top();
        heap.pop();
        e--;
        int a = uf.find(x.vertex1());
        int b = uf.find(x.vertex2());
        if(a != b) // 边不会形成环
        {
    
    
            spanningTreeEdges[k++] = x;
            uf.unite(a,b);
        }
    }
    return (k == n - 1);
}

Prim algorithm

1) Problem description

Same as Kruskal.

2) Algorithm idea

Using the idea of ​​greedy algorithm, give the greedy criterion.

Greedy criterion : from the remaining edges, select an edge with the least cost and add it to the selected edge set.

/**
//假设网络最少有一个顶点
令T是入选的边集,初始化T为空
令TV为是已在树中的顶点集,我们随机选择一个顶点,TV={1}
E为网络的边集

while(!E.empty() && T != n-1)
{
	令(u,v)是一条成本最小的边,且u∈TV,v∉TV
	if(没有这样的边)
		break;
	从E中删除边(u,v)
	把边(u,v)加入T
	把顶点v加入TV
}
if(T == n-1)
	T是一颗最小成本生成树
else
	图不是连通图,没有生成树
*/

3) The correctness of the algorithm

It is also proved that ① can span the tree, ② the tree generated is the minimum spanning tree.

The first part: similar to Kruskal. The
second part: same as Kruskal, the edges selected by the greedy criterion are also safe edges.

4) Data structure selection and time complexity of the algorithm

  • Because each step needs to select an edge with the least cost that can form a tree, the minimum heap is still used to maintain it.
  • The data structure that saves the order of the nodes can be implemented with an array
  • At the same time, it is also necessary to save the weight of the preorder node and the smallest edge among all edges of each node v and nodes in the tree, and two arrays are needed.
  • For each vertex, it needs to traverse all its adjacent nodes, and the time complexity is O (n 2) O(n^2)O ( n2)
  • The total negative complexity of the algorithm is O (n 2) O(n^2)O ( n2)

5) Pseudo code with data structure

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_41882686/article/details/107907338