Algorithm-part3-week2

MST Review
Input: Undirected graph G = (V, E),edges costs c e .
Output: Min-cost spanning tree(no cycles, connected).
Assumptions : G is connected, distinct edge costs.
Cut Property: If e is the cheapest edge crossing some cut(A, B), then e belongs to the MST.
Kruskal’s MST Algorithm
Sort edges in order of increasing cost.
[Rename edges 1,2,…,m so that c 1 < c 2 < . . . < c m .]
T =
For i = 1 to m O(m) time
If T { i } has no cycles. O(n) time to check for cycle.
[ Use BFS or DFS in the graph(V,T)which contains n-1 edges.

Add i to T .

Return T .
Running time of straightforward implementation:
(m = # of edges, n= # of vertices] O(mlogn)+O(mn) = O(mn).
Plan: Data structure for O(1)-time cycle checks
O ( m l o g n ) time.

The Union-Find Data structure.
Maintain partition of a set of objects.
FIND(X): Return name of group that X belongs to.
UNION( C i , C j ): Fuse groups C i , C J into a single one.
Objects = vertices
Groups = Connected components chosen edges T.
Adding new edge(u,v) to T.
Motivation: O(1)-time cycle checks in Kruskal’s algorithm.
Idea#1: -Maintain one linked structure per connected component of(V,T).
Each component has an arbitrary leader vertex.
Invariant: Each vertex points to the leaders of its component.
Key point: Given edge(u,v), can check if u&v already in same component in O(1) time.
[if and only if leader pointers of u, v match,i,e.,FIND(u) = FIND(v) ] O(1)-time cycle checks!

Runing Time of Fast Implementation
Scored:
O(mlogn) time for sorting .
O(m) times for cycle checks[O(1) per iteration]
O(nlog(n)) time overall for leader pointer updates.

total (Matching Prim’s algorithm)

Clustering [unsupervised learning]
Informal goal: Given n “points”,[Web pages, images, genome fragments,etc] classfify into “coherent groups”;
Assumptions:
(1) As input, given a (dis) similarity measure
a distance d ( p , q ) between each point pair.
(2)Symmetric [d(p,q) = d(q,p)]
Examples: Euclidean distance, genome similarity, etc.
Goal: Same cluster “nearby”
Max-Spacing k-Clusterings
Assume: We know k := # of clusters desired.
[In practice, can experiment with a range of values]
Call points p & q separated is they’re assigned to different clusters.
Definition: The spacing of a k-clustering is m i n s e p a r a t e d p , q d ( p , q ) .
Problem statement: Given a distance measure d and k , compute the k c l u s t e r i n g with maximum spacing.
A Greedy Algorithm
Initially, each point in a sepatate cluster
Repeat until only k clusters.
Let p , q = closest pair of separated points.
(determines the current spacing)
Merge the cluster containg p & q into a single cluster.
Note: Just like Krusskal’s MST algorithm, but stopped early.
Points vertices, distances edges costs, point pairs. edges.
Called single-link clustering.

猜你喜欢

转载自blog.csdn.net/qq_31805127/article/details/80492809
今日推荐