Algorithm analysis course design (4) Use the divide and conquer method to find the point pair and path with the distance between any two points in the tree less than K

Disclaimer

This article is only a personal study note, please refer to it cautiously. If there is any mistake, please criticize and correct it.

Reference article

The first article mainly looks at the center of gravity of the tree

The second article is exactly the same as this question

https://blog.csdn.net/a_forever_dream/article/details/81778649

https://blog.csdn.net/jackypigpig/article/details/69808594

Claim:

(1) Use pseudo code to describe the algorithm for finding the center of gravity of the tree.

(2) When the following tree is used as input, write the solution process and solution result for solving the above problem. It is required to write the change process of the main variables in the solution process.

(3) Write a program to solve the problem, and analyze the time complexity of the algorithm.

analysis

the entire process:

1. Find the center of gravity of the tree.

2. Calculate the distance array from each point to the center of gravity.

3. All point pairs passing through the center of gravity minus all point pairs passing through the center of gravity sub-nodes to get the legal point logarithm

4. Repeat operations 1, 2, and 3 for the child nodes of the center of gravity (recursion)

1. Find the center of gravity of the tree.

The center of gravity of the tree is also called the center of mass of the tree. That is, for a node of the tree, the maximum number of nodes of all subtrees after it is deleted (compared to deleting other nodes) is the smallest. as the picture shows:

To require the center of gravity of the tree, perhaps the first idea is to traverse all nodes once, treat each node as the center of gravity, calculate the number of nodes in each subtree, and then compare. But this violent law is undesirable. It will calculate the same path many times, and we can get the maximum number of subtree nodes with each point as the center of gravity by scanning the entire tree only once. The point divide and conquer method of the tree is used here. Divide and conquer, I understand it as a recursive call for post-order traversal, but recursion can only be recursed from one node as the starting point. How can we calculate the maximum number of subtree nodes starting from all points? Here we still take the above tree as an example. Use n to represent the number of nodes in the entire tree, where n=5; use size[i] to represent the number of nodes in the tree rooted at i; use max_child[i] to represent the maximum number of subtree nodes rooted at i; use min To update the minimum value in max_child[i], it is used to update the center of gravity, and the starting value of min is a large number.

According to the recursion of the subsequent traversal, we can first calculate the size[4]=1 of point 4, and the red part is n-size[4]=4 nodes. The reason for this division is that point 4 only has one parent node, so the red part can always be used as a child tree of it. Comparing the size of size[4] and n-size[4], the maximum number of subtree nodes rooted at point 4 is max_child[4]=4, and min is updated to 4 of max_child[4].

Then calculate the size[5]=1 of point 5, which is similar to point 4. The red part is n-size[5]=4 nodes, the maximum number of subtree nodes with point 5 as the root max_child[5]=4, min is unchanged.

Next, calculate point 2, which is its own node, plus the size of its subtree points 4 and 5. The green part is its subtree, which are size[4]=1 node, size[5]=1 node, and the red part is n-size[2]=2 nodes, so point 2 is The maximum number of subtree nodes of the root max_child[2]=2, and min is updated to 2.

Next is to calculate point 3, which is 1 node of itself. The red part is n-size[3]=4 nodes, so the maximum number of subtree nodes rooted at point 3 is max_child[3]=4, min is unchanged.

Next, calculate point 1, which is its own node, plus the size of its subtree points 2 and 3. The green part is two subtrees, size[2]=3 nodes, size[3]=1 nodes. The red part is gone, n-size[1]=0 nodes. Therefore, the maximum number of subtree nodes rooted at point 1 is max_child[1]=3, and min is unchanged.

Once min is updated, the center of gravity will also be updated to the corresponding node. It's just not written here, you can see the code.

So, it doesn't matter who you choose at the beginning of this recursion. Choosing any point can calculate the maximum number of subtree nodes rooted at each point, and then find the center of gravity.

The code for finding the center of gravity of the tree is as follows:

// 全局变量
int n=17; // 所有结点数
int size[n];// 以n为根的树的结点数
int max_child[n];// 以n为根的树的最大子树的结点数
int min;// max_child中最小的那个
int first[n+1],edge[(n-1)*2]// 顶点表,边表
int gravity=0;// 被选为重心的结点

// 传入参数
// start 代表当前结点
// parent 代表当前结点的父结点,这里是为了防止遍历start的子结点的时候把父结点也遍历进去了
void getGravity(int start, int parent)
{
    // size[start]代表当前结点的个数,初始为1是算上本身
    size[start]=1;
    // max_child[start]代表当前结点的最大子树结点数
    max_child[start]=0;
    // 遍历以start结点为起点的所有边(除去连接父结点的边)
    for(int i=first[start]; i; i=edge[i].next)
    {
        // end为当前边的终点(也是start点的子结点)
        int end=edge[i].end;
        // 如果这个终点已经被遍历或者这个终点是父结点,就跳过
        if(visited[end] || end==parent){
            continue;
        }
        // 继续遍历终点的子结点,这里其实就是遍历start的一棵子树的所有结点数
        getGravity(end, start);
        // 遍历完这棵子树的所有结点后,把子树的结点数加起来
        size[start]+=size[end];
        // 如果这颗子树的结点数大于max_child,就更新它
        if(size[end] > max_child[start]){
            max_child[start]=size[end];
        }
    }
    // 上面的循环是用来遍历start的每一棵子树并比较出最大子树结点
    // 接下来就是算“红色部分”,也就是n-size[i]部分的结点数,并比较出最终的最大子树结点
    if(n-size[start] > max_child[start]){
        max_child[start]=n-size[start];
    }
    // 从max_child中比较出最小的,以找出重心
    if(min < max_child[start]){
        min=max_child[x];
        gravity=start;
    }
}

The storage method for the tree here is to use the vertex table first to store each point, and first[i] represents the number of the first edge corresponding to the node numbered i. Each row of the edge table edge represents an edge, including the starting point, ending point, weight of this edge and the next edge with the same starting point. Because these edges are undirected, an edge is stored twice in the edge table.

The code to add tree nodes and edges is as follows:

int first[n+1],edge[(n-1)*2]// 顶点表,边表
int num=0;
void addNodeAndEdge(int start,int end,int weight)
{
    num++;// 编号,没错,要从1开始
    edge[num].start=start;// 起点
    edge[num].end=end;// 终点
    edge[num].weight=weight;// 权重
    edge[num].next=first[start];// 下一条相同起点的边
    first[start]=num;// 加入顶点
}

2. Calculate the distance array from each point to the center of gravity.

After selecting the center of gravity of the tree, we calculate the distance (ie weight) from all points to this center of gravity, also using a recursive method. This should be easy to understand, without too much explanation.

int t=0;
// start是传入的点,parent是start的父结点,weight是start和parent连线的权重
// parent在这里是为了防止遍历start的子结点的时候把父结点也遍历进去了
// 因为每个点到start的距离是自上而下地累加,所以传入weight
// 该递归函数的主要作用是返回每个点到重心的距离,所以一开始调用递归函数的时候
// start默认是重心,parent和weight默认是0。
void getDistance(int start, int parent, int weight)
{
    // dis数组保存了每个点到重心的距离(权重),为什么用t来做下标而不是点的编号呢
    // 因为后面的做法只用数点对的个数,不在乎是谁到谁
    // t是从1开始的
    dis[++t]=weight;
    for(int i=first[start]; i; i=edge[i].next)
    {
        int end=edge[i].end;
        // 如果这个终点已经被遍历或者这个终点是父结点,就跳过
        // 这点很重要,因为如果传入的start不是根结点而是根结点的子树的时候
        // 它就不会把根结点再遍历一次
        if(visited[end] || end==parent){
            continue;
        }
        getDistance(end, start, weight+edge[i].weight);
    }
}

In this step, we get the dis array, and the distance from each point to the center of gravity is stored in dis. Again, the subscript of dis has nothing to do with the node number. Take the previous example, as shown in the following figure:

3. All point pairs passing through the center of gravity minus all point pairs passing through the center of gravity sub-nodes to get the legal point logarithm

Next, consider the number of paths whose length is less than K. My first idea is to find the points less than or equal to K in dis, so that we select the point logarithm whose distance from the point to the center of gravity is less than K, and then calculate the point logarithm that passes through the center of gravity and the distance is less than K. But according to the second reference article, this is not the case. The second reference article in the citation section:

After (getting the dis array), the connected paths in this subtree will pass through the center of gravity and contribute to the answer (that is, the point pair ( i,j) ( i<j) whose distance is less than k ) will be like this:  dis[i]+dis[j] <= K and after removing the center of gravity, i and j  are not In the same Unicom block .

But obviously it’s a bit awkward to meet the condition of “not in the same interconnection block”, so there is a little trick: regardless of whether it is in the same interconnection block or not, calculate the number of matching paths of the current tree, and then get The number of, minus the point-to-path distance (passing the center of gravity) in the subtree rooted by the child node of the center of gravity is less than or equal to the number of K, and it will do.

It means that after we now have the distance from each point to the center of gravity, we sort dis from small to large (the sorting is to calculate the pairs of points smaller than K), and add them in pairs, and add all the points except the center of gravity. Click all combinations, such as:

dis[2]+dis[5] corresponds to 2——1——3;

dis[3]+dis[5] corresponds to 4——2——1——3;

But there is a situation that is not good, such as dis[3]+dis[4] corresponds to 4——2——1——2——5.

There is no need to pass the center of gravity point 1 between points 4 and 5. How can this be eliminated? First find out the characteristics of this point pair. It is easy to find that these pairs of points are all in a subtree. Both 4 and 5 are connected to the center of gravity through 2. In other words, if a point pair passes through the child node of the center of gravity, then they are illegal. With this judgment condition, we can eliminate it.

After understanding the elimination method, now let's talk about the work function. Its incoming parameters are the distance (weight) from the node start and start to the parent node.

It is used to calculate all combinations through the start point (legal and illegal are counted). In the first dfs function, we have a center of gravity. The work function will be called once by the center of gravity, the incoming start is the center of gravity, and the weight is 0, and all the point pairs passing through the center of gravity are calculated. Go back to dfs, traverse all the child nodes of the center of gravity, call work separately, the incoming start is the child node, and weight is the weight from the child node to the center of gravity, and calculate all the point pairs passing through the current child node, although it is After the point pair of the child node, the calculated distance is still to the center of gravity. Because whether it is a point pair passing through the center of gravity or a point pair passing through the child node, they must be compared with k, so the distance they calculate should be to the center of gravity.

// 传入的start要么是重心,weight=0
// 要么是重心的孩子结点,weight是重心与孩子结点的距离(权重)
int work(int start,int weight) {
    t=0;
    // 如果start是重心,算出 以重心为根的树中的结点 到start的距离,然后两两组合,选出加起来小于k的点对
    // 如果start是重心的孩子结点,算出 以该孩子结点为根 的子树中的结点 到重心的距离,然后两两组合,选出加起来小于k的点对
    // eg:重心是S的孩子结点是a,那么算出的点对数是 以a为根的树 的所有结点 两两组合,但是距离算的是 所有结点 到S的距离,因为仍然要判断小于k,和经过重心的点对要一致
    // getDistance需要传入weight就是为了 重心的孩子结点的 孩子结点的 dis是到重心的距离
    getDistance(start, 0, weight);
    // 得到dis数组后,对其进行从小到大排序
    // 注意,这里的t已经不是0了,它是全局变量,在getDistance里面遍历了start出发的所有结点
    sort(dis+1,dis+1+t);
    // pair_num表示经过重心的点对数量
    int pair_num=0;
    int i=1,j=t;
    // 这个while循环就是把两个dis相加的和小于等于K的点对数量计算出来
    while (i<j){
        while (i<j && dis[i]+dis[j]>K) 
            j--;
        pair_num+=j-i;
        i++;
    }
    return pair_num;
}

4. Repeat operations 1, 2, and 3 for the child nodes of the center of gravity (recursion)

The first time the dfs function often passes in a node at will to find the center of gravity of the entire tree, such as point 1 in the above figure. Here, it will use point 1 to call the work function to find all the point logarithms ans through point 1, including legal and illegal ones. Then traverse the child nodes of point 1 in the for loop, which are points 2 and 3 in the above figure. Use point 2 and point 3 to call work again to find all the point pairs passing through points 2 and 3 respectively. , And then subtracted by ans, you get a legal point pair.

You might think that 4 and 5 are point pairs that are less than k, but if you subtract them, it will disappear? So after subtracting, point 2 and point 3 are passed into dfs respectively for recursion, and the pair of point 4 and point 5 will be counted. ans is a global variable and will continue to accumulate and subtract after the first dfs.

// 起始点是start,递归调用会dfs所有的结点
void dfs(int start){
    // 以start为起点找到重心
    // 注意,虽然一开始我们说了从树的任何一个点开始遍历都能找出一个确定的重心,
    // 但是这里从start开始,如果它有父结点,就不要再遍历的,只找以它为根的树的重心
    getGravity(start,0);
    // 用这个重心算出所有跨过该重心的路径数
    ans += work(gravity,0);
    // 标记这个重心已访问
    visited[gravity]=1;
    // 从重心开始访问子结点
    for(int i=first[start]; i; i=edge[i].next){
        int end=edge[i].end;
        if (visited[end]){
            continue;
        }
        // 减去以子结点为根的树的所有跨过子结点的路径数
        ans -= work(end, edge[i].weight);
        // 从子结点开始继续递归
        dfs(end);
    }
    return;
}

Then put the main function:

int main()
{
    // 输入结点个数
    scanf("%d",&n);
    // 输入每个结点的信息
    for(int i=1;i<n;i++)
    {
        int start, end, weight;
        scanf("%d %d %d",&start,&end,&weight);
        addNodeAndEdge(start,end,weight);
        addNodeAndEdge(end,start,weight);
    }
    dfs(1);
    printf("%d\n", ans);
    return 0;
}

Finally, the overall code, which has not been tested, is mainly because there is no time, but I fully understand the code and it is enough for the test.

#include <stdio.h>
#define MAX 10000;

// 全局变量
int n; // 所有结点数
int size[n],max_child[n],min=MAX;// 以n为根的树的结点数,以n为根的树的最大子树的结点数
int first[n+1],edge[(n-1)*2]// 顶点表,边表
int visited[n+1];// 标记已访问过的点
int dis[];// 每个点到重心的距离
int gravity=0;// 重心
int num=0,t=0;// 结点编号
int ans=0;// 最终结果:小于等于K的点对数量

void addNodeAndEdge(int start, int end, int weight)
{
    num++;// 编号
    edge[num].start=start;// 起点
    edge[num].end=end;// 终点
    edge[num].weight=weight;// 权重
    edge[num].next=first[start];// 下一条相同起点的边
    first[start]=num;// 加入顶点
}

// 传入重心,以及重心和它的父结点之间的权重
int work(int start,int weight) {
    t=0;
    // 算出start到各个结点的距离
    getDistance(start, 0, weight);
    // 得到dis数组后,对其进行从小到大排序
    // 注意,这里的t已经不是0了,它是全局变量,在getDistance里面遍历了start出发的所有结点
    sort(dis+1,dis+1+t);
    // pair_num表示点对数量
    int pair_num=0;
    int i=1,j=t;
    // 这个while循环就是把两个dis相加的和小于等于K的点对数量计算出来
    while (i<j){
        while (i<j && dis[i]+dis[j]>K) j--;
        pair_num+=j-i;
        i++;
    }
    return pair_num;
}

void getDistance(int start, int parent, int weight)//fa表示x的父亲,z表示x到目标点的距离
{
    // dis数组保存了每个点到重心的距离(权重),为什么用t来做下标而不是点的编号呢
    // 因为后面的做法只用数点对的个数,不在乎是谁到谁
    dis[++t]=weight;
    for(int i=first[start]; i; i=edge[i].next)
    {
        int end=edge[i].end;
        // 如果这个终点已经被遍历或者这个终点是父结点,就跳过
        // 这点很重要,因为如果传入的start不是根结点而是根结点的子树的时候
        // 它就不会把根结点再遍历一次
        if(visited[end] || end==parent){
            continue;
        }
        getDistance(end, start, weight+edge[i].weight);
    }
}
// 传入参数
// start 代表当前结点
// parent 代表当前结点的父结点,这里是为了防止遍历start的子结点的时候把父结点也遍历进去了
void getGravity(int start, int parent)
{
    // size[start]代表当前结点的个数,初始为1是算上本身
    size[start]=1;
    // max_child[start]代表当前结点的最大子树结点数
    max_child[start]=0;
    // 遍历以start结点为起点的所有边(除去连接父结点的边)
    for(int i=first[start]; i; i=edge[i].next)
    {
        // end为当前边的终点(也是start点的子结点)
        int end=edge[i].end;
        // 如果这个终点已经被遍历或者这个终点是父结点,就跳过
        if(visited[end] || end==parent){
            continue;
        }
        // 继续遍历终点的子结点,这里其实就是遍历start的一棵子树的所有结点数
        getGravity(end, start);
        // 遍历完这棵子树的所有结点后,把子树的结点数加起来
        size[start]+=size[end];
        // 如果这颗子树的结点数大于max_child,就更新它
        if(size[end] > max_child[start]){
            max_child[start]=size[end];
        }
    }
    // 上面的循环是用来遍历start的每一棵子树并比较出最大子树结点
    // 接下来就是算“红色部分”,也就是n-size[i]部分的结点数,并比较出最终的最大子树结点
    if(n-size[start] > max_child[start]){
        max_child[start]=n-size[start];
    }
    // 从max_child中比较出最小的,以找出重心
    if(min < max_child[start]){
        min=max_child[x];
        gravity=start;
    }
}

// 递归地求每个树的经过重心的点对数量
// 起始点是start
void dfs(int start){
    // 以start为起点找到重心
    // 注意,虽然一开始我们说了从树的任何一个点开始遍历都能找出一个确定的重心,
    // 但是这里的意思是,从start开始,它的父结点就不要再遍历的,只找它和它的子树的重心
    getGravity(start,0);
    // 用这个重心算出所有跨过该重心的路径数
    ans += work(gravity,0);
    // 标记这个重心已访问
    visited[gravity]=1;
    // 从重心开始访问子结点
    for(int i=first[start]; i; i=edge[i].next){
        int end=edge[i].end;
        if (visited[end]){
            continue;
        }
        // 减去以子结点为根的树的所有跨过子结点的路径数
        ans -= work(end, edge[i].weight);
        // 以子结点为根的树继续求重心、所有跨过路径数
        dfs(end);
    }
    return;
}

int main()
{
    // 输入结点个数
    scanf("%d",&n);
    // 输入每个结点的信息
    for(int i=1;i<n;i++)
    {
        int start, end, weight;
        scanf("%d %d %d",&start,&end,&weight);
        addNodeAndEdge(start,end,weight);
        addNodeAndEdge(end,start,weight);
    }
    dfs(1);
    printf("%d\n", ans);
    return 0;
}

The time complexity is temporarily not calculated. . .

Guess you like

Origin blog.csdn.net/qq_33514421/article/details/112379820