Algorithm - union-find algorithm

Question : The input to the question is a list of integer pairs, where each integer represents an object of some type, a pair of integers pq can be understood as "p and q are connected".

When the program reads the integer pair pq from the input, if all the known integer pairs cannot show that p and q are connected, then this pair of integers is written to the output; if the known data can show that p and q are connected, then the program should ignore the pq pair and proceed to the next pair of integers in the input. We refer to this problem colloquially as the dynamic connectivity problem.

The application is as follows: the input integers may represent computers in a large computer network, and the integer pairs represent connections in the network. This program can determine whether we need to set up a new connection between p and q to communicate, or we can establish a communication line between the two through an existing connection.

API for union-find algorithm

public class UF  
UF (int N) Initialize N contacts with integer flags (0 to N-1)
void union (int p,int q) establish a connection between p and q
int find (int p) Identifier of the component where p (0 to N-1) is located
boolean connected (int p,int q) true if p and q exist in the same component
int count()     the number of connected components

 

We will discuss three different implementations, all of which determine whether two contacts exist in the same connected component based on the id[] array indexed by the contact.

1. quick-find algorithm

It is guaranteed that p and q are connected if and only if id[p] is equal to id[q], in other words, all contacts in the same connected component must all have the same value in id[]. The algorithm is as follows:

public int find(int p){
   return id[p];
}
public void union(int p,int q) {
	//将p和q归并到相同分量中
	int pID=find(p);
	int qID=find(q);
	
	//如果p和q已经在相同的分量之中则不需要采取任何行动
	if(pID==qID) return;
	
	//将p和分量重命名为q的名称
	for(int i=0;i<id.length;i++)
		if(id[i]==pID)id[i]=qID;
	count--;
}

Analysis: The speed of the find() operation is obviously very fast, thinking that he only needs to access the id[] array once. . But the quick-find algorithm generally cannot handle large problems, because each pair of input union() needs to scan the entire id[] array.

Each find() call only needs to access the array once, while the union() operation that merges two components accesses the array the number of times between (N+3) and (2N+1).

Suppose we use the quick-find algorithm to solve the dynamic connectivity problem, and finally get only one connected component, then we need to call union() at least N-1 times, that is, at least (N+3)*(N-1)~N² times to visit , it can be concluded that the running time of the quick-find algorithm is quadratic for general applications where only a few connected components are finally obtained .

2. quick-union algorithm

This algorithm focuses on improving the speed of the union() method , which is complementary to the quick-find algorithm.

When defining the data structure, we need the id[] element for each contact to be the name of another contact in the same component (and possibly itself) - we call this connection a link .

When implementing the find() method, we start from the given contact, get another contact from its link, and then link this contact to the third contact, so continue to follow the link until we reach a root contact . Different contacts, after a series of links, if they can reach the same root contact, it means that these two contacts exist in the same connected component.

Using the concept of a forest , with the root contact as the root node, the quick-union algorithm is easier to understand. The algorithm is as follows:

public int find(int p){
   //找出分量的名称,即根触点的名称
	
	while(p!=id[p])p=id[p];//存储的链接不等于本身,则继续追溯下一个触点
	return p;
}
public void union(int p,int q) {
	//将p和q归并到相同分量中,即将p和q的根触点统一
	int pRoot=find(p);
	int qRoot=find(q);
	
	//如果p和q已经在相同的分量之中则不需要采取任何行动
	if(pRoot==qRoot) return;
	
	//将p的根触点,指向q的根触点
	id[pRoot]=qRoot;
	
	count--;
}

Analysis: The quick-union algorithm seems to be faster than the quick-find algorithm, but it is difficult to analyze the algorithm cost because it depends on the characteristics of the input. In the best case, find() only needs to access the array once to get the component identifier where a touch is located; in the worst case, this requires 2N+1 array accesses. It is not difficult to conclude that the running time of the quick-union algorithm is linear in the order of (1*N) when constructing a use case with a best-case input such that the dynamic connectivity problem is solved ; and the worst-case running time is square . level (N*(2N+1)).

3. Weighted quick-union algorithm

The quick-union algorithm can be seen as an improvement of the quick-find algorithm, because it solves the main problem in the quick-find algorithm (union() operation is always linear, because each time it needs to traverse the entire array to change drop the value in a connected component).

But there are still problems with the quick-union algorithm, and we cannot guarantee that it will be much faster than the quick-find algorithm in any case. Because the quick-union algorithm mentioned earlier uses the concept of forest and tree, each find() needs to traverse to the root node layer by layer, so the running time is closely related to the depth of the node in the book. Therefore we need an improved method to reduce the depth of nodes.

The weighted quick-union algorithm can achieve this improvement because it always allows smaller trees to be connected to larger trees. Of course, this requires us to make corresponding improvements to the data structure, that is, adding instance variables to record the size of each tree, that is, the component size. The algorithm is as follows:

public class WeightedQuickUnionUF {
	private int[] id;		//父链接数组(由触点索引)
	private int[] sz;		//各根节点所对应的分量大小
	private int count;		//连通分量的数量
	public WeightedQuickUnionUF(int N) {
		count=N;
		id=new int[N];
		for(int i=0;i<N;i++)id[i]=i;
		sz=new int[N];
		for(int i=0;i<N;i++)id[i]=1;
	}
	public int count() {
		return count;
	}
	public boolean connected(int p,int q) {
		return find(p)==find(q);
	}
	public int find(int p) {
		while(p!=id[p])p=id[p];
		return p;
	}
	public void union(int p,int q) {
		int i=find(p);
		int j=find(q);
		if(i==j)return;
		//将小树的根节点连接到大树的根节点
		if (sz[i]<sz[j]) {
			id[i]=j;
			sz[j]+=sz[i];
		}
		else {
			id[j]=i;
			sz[i]+=sz[j];
		}
		count--;
	}
}

For dynamic connectivity problems, the weighted quick-union algorithm is the only algorithm among the three that can be used to solve large practical problems.

optimal algorithm

Let's talk about the conclusion first, the weighted quick-union algorithm of path compression is the optimal algorithm.

Path compression is to check each node and link them directly to the root node, that is, to achieve an almost complete flattening of the tree, which is very close to the ideal tree obtained by the quick-find algorithm.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325293639&siteId=291194637