Article directory
1. The principle of union search
Union search is mainly used to solve some element grouping problems . It manages a series of disjoint sets and supports two operations:
- Union : Combines two disjoint sets into one set
- Query (Find) : Query whether two elements are in the same set
Of course, such a definition is too academic, and after reading it, I am afraid I can't understand what it is used for. So let's take a look and check the most direct application scenario: the relative problem
Description
If a family member is too large, it is really not easy to judge whether two people are relatives. Given a relative relationship diagram, ask whether two people given at random are related.
Rule: x and y are relatives, y and z are relatives, then x and z are relatives too. If x and y are relatives, then all relatives of x are relatives of y, and all relatives of y are relatives of x.Input (Input)
The first line: three integers n, m, p, (n < = 5000, m < = 5000, p < = 5000), which respectively indicate that there are n people and m relatives, and ask about the relative relationship of p. The following m lines: each line has two numbers Mi, Mj, 1<=Mi, Mj<=N, indicating that Mi and Mj are related. Next p lines: each line has two numbers Pi, Pj, asking if Pi and Pj are related
Output (Output)
P lines, one 'Yes' or 'No' per line. Indicates that the answer to the ith query is "has" or "does not have" kinship.
Preliminary analysis thinks that this problem is a problem of judging whether two points are in the same connected subgraph in graph theory. We can build models that divide all people into disjoint sets, where people in each set are relatives of each other. To tell if two people are relatives, just look at whether they belong to the same set. Therefore, here you can consider using the union check set for maintenance.
2. The introduction of union search
The important idea of union search is to represent the set with an element in the set. I've seen an interesting metaphor that compares a collection to a gang, and the representative element is the gang leader. Let's use this analogy next to see how the merge set works.
In the beginning, all heroes fought their own way. Their respective leaders are naturally themselves. (For a set with only one element, the representative element is naturally the only one)
Now No. 1 and No. 3 compete, assuming No. 1 wins (it doesn't matter who wins here), then No. 3 recognizes No. 1 as the leader (merging the sets where No. 1 and No. 3 are located, No. 1 is the representative element)
Now No. 2 wants to compete with No. 3 (merge the set where No. 3 and No. 2 are located), but No. 3 said, don't fight with me, let me help the Lord to clean up you (merge the representative elements). Let's assume that No. 1 wins again this time, so No. 2 also recognizes No. 1 as the leader.
Now let's assume that No. 4, No. 5, and No. 6 also have some gang mergers, and the situation in the rivers and lakes becomes as follows:
Now let's say No. 2 wants to compare with No. 6, just like what I just said, call the gang leaders No. 1 and No. 4 to come out and fight Frame (help master really hard ah). After No. 1's victory, No. 4 recognized No. 1 as the leader, and of course his subordinates also surrendered.
Well, the metaphor is over. If you have a bit of graph theory foundation, I believe you have already noticed that this is a tree-like structure. To find the representative element of the set, you only need to visit the parent node (the circle pointed by the arrow in the figure) layer by layer. Go straight to the root node of the tree (the orange circle in the figure). The parent of the root node is itself. We can directly draw it as a tree:
In this way, we can write the simplest version of the union-find code.
3. Path compression
The simplest union and search efficiency is relatively low. For example, consider the following scenario:
Now we want to merge(2,3), so we find 1 from 2, fa[1]=3, so it becomes like this:
Then we find another element 4, and we need to execute merge(2,4):
from 2 Find 1, then find 3, then fa[3]=4, so it becomes like this:
everyone should have a feeling, this may form a long chain, as the chain gets longer and longer, we want to start from the bottom It gets harder and harder to find the root node.
How to solve it? We can use the path compression method. Since we only care about the root node corresponding to an element, we want the path from each element to the root node to be as short as possible, preferably only one step, like this:
In fact, this is also very good to achieve. As long as we are in the process of querying, we can set the parent node of each node along the way as the root node. We can save a lot of trouble the next time we look up.
But in fact, since path compression is only performed at query time, and only one path is compressed , the final structure of the union search may still be complicated.
For example, now we have a more complex tree that needs to be merged with a one-element set:
If we want to merge(7,8) at this time, if we can choose, should we set the parent node of 7 to 8, or set the parent node of 8 to 7?
Of course the latter. Because if the parent node of 7 is set to 8, the depth of the tree (the length of the longest chain in the tree) will be deepened, and the distance from each element in the original tree to the root node will become longer, and then we will find the root node. The path will be correspondingly longer. Although we have path compression, path compression also consumes time. And setting the parent node of 8 to 7 will not have this problem, because it does not affect unrelated nodes.
This inspires us: we should merge simple trees into complex trees, not the other way around. Because after merging in this way, the number of nodes whose distance to the root node becomes longer is relatively small.
4. Overall realization
#pragma once
#include<vector>
#include<algorithm>
using namespace std;
class UnionFindSet
{
public:
UnionFindSet(size_t n)
:_ufs(n, -1)
{
}
//合并
void Union(int x1, int x2)
{
int root1 = FindRoot(x1);
int root2 = FindRoot(x2);
//数据量小的集合往数据量大的集合合并
//if (abs(_ufs[root1]) < abs(_ufs[root2]))
//{
// swap(root1, root2);
//}
//如果root1和root2相等,说明在一个集合,就没必要进行合并了
if (root1 != root2)
{
//将root2合并到root1,root1集合的个数需要加上root2集合的个数
_ufs[root1] += _ufs[root2];
//将root1作为root2的根
_ufs[root2] = root1;
}
}
//找根节点
int FindRoot(int x)
{
int root = x;
while (_ufs[root] >= 0)
{
root = _ufs[root];
}
//路径压缩
//当前值到根路径上的所有值都进行压缩
while (_ufs[x] >= 0)
{
//保存当前数值的父亲
int parent = _ufs[x];
_ufs[x] = root;
x = parent;
}
return root;
}
bool Inset(int x1, int x2)
{
return FindRoot(x1) == FindRoot(x2);
}
//获取元素个数
size_t SetSize()
{
int count = 0;
for (size_t i = 0; i < _ufs.size(); ++i)
{
if (_ufs[i] < 0)
{
count++;
}
}
return count;
}
private:
vector<int> _ufs;
};