DSAA之图论Kruskal算法（六）

1. 回顾

　　Kruskal算法需要使用并查集，所以记录该篇前需要回顾DSAA之THE DISJOINT SET ADT（一）和DSAA之THE DISJOINT SET ADT（二）。不相交集合的概念和实现都是比较简单。

2. Kruskal’s Algorithm 　　

A second greedy strategy is continually to select the edges in order of smallest weight and accept an edge if it does not cause a cycle.我们假定连通为一个等价关系，那么两个点不连通就一定不成环。在这里是指当两个点不存在连通这个关系时，选择这两个点所形成的边。如果这两个点已经连通了，那么添加这条边一定形成环

Formally, Kruskal’s algorithm maintains a forest – a collection of trees.

Initially, there are |V| single-node trees. Adding an edge merges two trees into one. When the algorithm terminates, there is only one tree, and this is the minimum spanning tree.

The algorithm terminates when enough edges are accepted. It turns out to be simple to decide whether edge (u,v) should be accepted or rejected.

The edges could be sorted to facilitate the selection, but building a heap in linear time is a much better idea. Then delete_mins give the edges to be tested in order. Typically, only a small fraction of the edges needs to be tested before the algorithm can terminate, although it is always possible that all the edges must be tried. 这里还是需要稍微想下，最小生成树的边为V-1，而无向图连通图的|E|>=|V|，所以算法可能遍历所有的边而终止。

　　刚开始学习不相交子集的时候，真的不知道会用在什么场合，现在图论的最小生成树中就用到了。Kruskal原理虽然不复杂，但如果没有不相交集合的概念，恐怕很难实现。
　　总结下上面的内容：首先Kruskal每次选择一条权值最小的边加到树中，如果该边的两个节点已经属于同一个树（集合），那么直接舍弃该边。否则，将该边的两个顶点合并（相当于两个子树合并成一个树）。经过 $|V|-1$ 次的合并（当然不代表delete_min了这么多次），最终S（不相交集合）只有一个树，该树就是最小生成树。

3. 伪代码实现

　　DSAA给出了伪代码的实现，如下：

void kruskal( graph G ){
    unsigned int edges_accepted;
    DISJ_SET S;
    PRIORITY_QUEUE H;
    vertex u, v;
    set_type u_set, v_set;
    edge e;
    //每个节点都是一个子集，（子集都是树）
    initialize( S );
    //heap以前记录过有O（n）的构建方式
    read_graph_into_heap_array( G, H );
    //这里就是O（n）,可以查看以前的记录篇，有比较具体的分析
    build_heap( H );
    //记录当前最小生成树的边数
    edges_accepted = 0;
    while( edges_accepted < NUM_VERTEX-1 ){
        //权重最小的边
        //特别的，heap中元素的数目一直在减少
        e = delete_min( H ); /* e = (u, v) */
        u_set = find( u, S );
        v_set = find( v, S );
        if( u_set != v_set ){
         /* accept the edge */
            edges_accepted++;
            set_union( S, u_set, v_set );
        }
   } 
}

4. 时间复杂度

The worst-case running time of this algorithm is $O(|E|log |E|)$ , which is dominated by the heap operations. Notice that since $|E| = O(|V|^2)$ , this running time is actually $O(|E| log |V|)$ .

　　虽然书中没有解释为啥，但是我们可以自己推一下：假设使用的不相交集合的merge时间复杂度为 $O(1)$ ，find（不考虑路径压缩）时间复杂度为 $O(logn)$ ，delete_min的时间复杂度为 $O(1)$ 。
　　kruskal算法的最坏时间复杂度为 $O(|V|+|E|*2log|V|)=O(|E|log|V|),　|E|>=|V|$ 。笔者在图论上一直使用伪代码，其实比较虚的。之后不会着急记录DFS算法，将稍微讨论下图的表示实现，及完整的kruskal算法的实现（因为其包含不相交集合和图）。等到DFS记录完之后，选择几个leetcode题来看下有些特殊的图可以用更加简洁的方式表示。