The combination of dynamic programming and data structure-tree DP

This article originated from a personal public account: TechFlow, original is not easy, seek attention


Today is the 15th in the algorithm and data structure , and the 4th in the dynamic programming series.

I have been talking about the backpack problem in the previous articles. I don't know if you feel a bit tired. Although there are nine backpacks in the classic article, in addition to the contestants, we can understand that the monotonous optimization is already very good. Like the backpack problem with dependency, and the hybrid backpack problem, some swords go slanted, so I won't share much here. If you are interested, you can check the Baidu Backpack 9 by yourself. Today we look at an interesting question. Through this interesting question, let's take a look at the method of dynamic programming in a tree structure .

The meaning of this question is very simple. Given a tree, it is not necessarily a binary tree. The branches on the tree are weighted and can be regarded as length. What is the length of the longest link in the tree?

For example, if we draw a tree at hand, it may be ugly, do n’t blame it:

If we look at it with the naked eye and try to find the answer a little bit, the longest path should be the red one in the picture below:

But what if we let the algorithm work?

There is actually a very clever way to solve this problem. Let's not talk about it first. Let's take a look at how dynamic programming solves this problem.

Tree DP

Dynamic programming does not only work in arrays. In fact, dynamic programming can be used as long as it meets the conditions for state transition and no aftereffect of dynamic programming, regardless of the data structure. The same is true on the tree. After understanding this, there are only two questions left. The first question is what is the state, and the second question is how to transfer between states?

In the previous backpack problem, the state is the current volume of the backpack, and the transfer is our decision to take a new item. But this time we have to do dynamic programming on the tree, relatively speaking, the state and the corresponding transition will be hidden. It doesn't matter, I will sort out the ideas from the beginning and explain the derivation and thinking process little by little.

First of all, we all know that the transition between states is essentially a process of calculating the whole by local . We move through relatively easy substates and get the overall result. This is the essence of dynamic programming. To a certain extent, it is also close to the divide and conquer method, and there is a logical relationship between big problems and small problems. So when we are faced with a big problem, we can learn from the divide and conquer method and think about starting with small problems.

Therefore, let's look at the simplest situation from small to large, from micro to macro:

In this case, it is obvious that there is only one link, so the length is naturally 5 + 6 = 11, which is obviously also the longest length. There is no problem in this case. Let ’s make the situation a little more complicated. Let ’s add one more layer to the tree:

This picture is a bit more complicated, but the path is not difficult to find, it should be EBFH . The total length of the path is 12:

But if we change the path length, for example, if we lengthen the path of FG and FH, what result will we get?

Obviously the answer will change in this case, FGH is the longest.

This example is just to illustrate a very simple problem, that for a tree, the longest path above it does not necessarily pass through the root node . For example, in the previous example, if the path must pass through B, the longest length can only be constructed as 4 + 2 + 16 = 22, but if you can not pass through B, you can get the longest length of 31.

It may seem useless to draw this conclusion, but it is actually very helpful for us to sort out our ideas. Since we cannot guarantee that the longest path will definitely pass through the root of the tree, we cannot directly transfer the answer. What should we do?

Just answering this question is not enough, we still need to observe and think deeply.

Transfer process

Let us look at the following two pictures:

Did you find any patterns?

Since our data structure is tree-shaped, this longest path no matter which two nodes it connects, it can be guaranteed that it will pass through the root node of a subtree . Don't underestimate this humble conclusion, in fact it is very important. With this conclusion, we cut the entire path at the root node.

After the split, we got two links to the leaf node . The question is, there are many links from the root node to the leaf node. Why are these two?

It's simple, because these two links are the longest. So after adding up in this way , the longest link can be guaranteed . These two links are from the leaf node to A, so the longest link we get is the longest path of the subtree with A as the root node.

Our previous analysis stated that the longest path cannot be transferred, but the longest distance to the leaf can be transferred . Let's take an example:

The longest distance between F and leaves is obviously the larger of 5 and 6, B is a little more complicated, and D and E are both leaf nodes, which is easy to understand. It also has a child node F, which is not a leaf node for F, but we calculated the longest distance from F to the leaf node is 6, so the longest distance from B to the leaf node through F is 2 + 6 = 8. In this way, we get the state transition equation, but what we transfer is not the required answer but the longest distance and the second longest distance from the current node to the leaf node .

Because only the longest distance is not enough, because we need to add the longest distance of the root node to the longest path to get the longest path through the root node, because we said before, all paths must pass through the root node of a subtree . This is a nonsense to understand, but this condition is indeed very important. Since all links pass through the root node of at least one subtree, then we calculate the longest path of all subtrees through the root node. Isn't the longest one the answer?

Below we demonstrate this process:

The transfer process is marked with a pink pen in the figure above. For the leaf node, the longest distance and the second longest distance are both 0, and the main transfer process occurs at the intermediate node.

The process of transition is also easy to figure out. For the intermediate node i, we traverse all its child nodes j, and then maintain the maximum value and the second largest value, we write the state transfer equation:

M a x i = m a x ( d i s ( i , j ) + M a x j ) Max_i = max(dis(i, j) + Max_j)

S e c M a x i = S e c M a x ( d i s ( i , j ) + M a x j ) SecMax_i = SecMax(dis(i, j) + Max_j)

I want to understand the state transition, and the rest is the coding problem. It may be counterintuitive to do state transitions on the tree, especially when it is recursive, but in fact it is not difficult. Let's write the code and look at it. Let's first look at this part of the tree. In order to simplify the operation, we can regard all the node numbers in the tree as int . For each node, there will be an array to store all the edges connected to this node, including the parent node.

Since we only pay attention to the length of the link on the tree, and do not care about the structure of the tree, after the tree is built, the result of the root of the tree is the same regardless of which point is the whole . So we just find a node as the root node of the entire tree and recurse. To emphasize, this is a very important property, because in essence, the tree is an undirected acyclic fully connected graph. So no matter which node is the root node, the entire subtree can be connected.

We create a class to store node information, including id and two longest and second longest lengths. Let's look at the code, it should be much simpler than you think.

class Node(object):
    def __init__(self, id):
        self.id = id
        # 以当前节点为根节点的子树到叶子节点的最长链路
        self.max1 = 0
        # 到叶子节点的次长链路
        self.max2 = 0
        # 与当前节点相连的边
        self.edges = []

    # 添加新边
    def add_edge(self, v, l):
        self.edges.append((v, l))


# 创建数组,存储所有的节点
nodes = [Node(id) for id in range(12)]

edges = [(0, 1, 3), (0, 2, 1), (1, 3, 1), (1, 4, 4), \
(1, 5, 2), (5, 6, 5), (5, 7, 6), (2, 8, 7), (7, 9, 2), (7, 10, 8)]

# 创建边
for edge in edges:
    u, v, l = edge
    nodes[u].add_edge(v, l)
    nodes[v].add_edge(u, l)

Since we are only trying to convey ideas, a lot of object-oriented code is omitted, but it should be enough for us to understand the problem ideas.

Below, we look at the code for dynamic programming on the tree:

def dfs(u, f, ans):
    nodeu = nodes[u]
    # 遍历节点u所有的边
    for edge in nodes[u].edges:
        v, l = edge
        # 注意,这其中包括了父节点的边
        # 所以我们要判断v是不是父节点的id
        if v == f:
            continue
        # 递归,更新答案
        ans = max(ans, dfs(v, u, ans))
        nodev = nodes[v]
        # 转移最大值和次大值
        if nodev.max1 + l > nodeu.max1:
        	nodeu.max2 = nodeu.maxi1
            nodeu.max1 = nodev.max1 + l
        elif nodev.max1 + l > nodeu.max2:
            nodeu.max2 = nodev.max1 + l
    # 返回当前最优解
    return max(ans, nodeu.max1 + nodeu.max2)

It looks like a very complicated tree-shaped DP, in fact, the code is only a dozen lines, is it a bit surprisingly simple?

But it is still a topic that is often talked about. These dozen lines of code seem simple, but there are still some details, especially when it comes to recursive operations. For students who are not particularly familiar with recursion, it may be a bit difficult. It is recommended that you can manually check on the paper according to the previous figure.

Another approach

The article is not over yet, we still have a small egg. In fact, there is another method for this question. This method is very clever and is also introduced to everyone.

Earlier we said that because the tree records the connected state of the nodes, no matter which node is the root node, it will not affect the length and structure of the path in the entire tree. In this case, if we are imaginative, we can crush a tree, can it be seen as a string or a stick of wood connected together?

Let's look at the picture below:

We bring point C closer to point B , and it does not affect the structure of the tree. After all, this is an abstract architecture. We do not pay attention to the angle between the branches of the tree. We can imagine that we picked up point A , and the other points sag due to gravity, and finally they will be drawn into a straight line.

For example, in the picture above, we picked up point A, and BCD hung down. At this time, the lowest point is point D. Then we pick up point D, and the bottom point becomes point C, then the distance between DC is the longest link on the tree:

We sorted out the whole process. First, we randomly selected a point as the root of the tree, and then found the point farthest from it. The second time, we select this farthest point as the root of the tree and find the farthest point again. The distance between these two furthest points is the answer.

This approach is very intuitive, but I can't think of a method that can be rigorously proven. Thoughtful friends can leave me a message in the background. If you can't figure it out, you can try to connect them with a few ropes, and then carry them together to make an experiment. See if the two points obtained in this way are the two points farthest away on the tree.

Finally, let's look at the code:

def dfs(u, f, dis, max_dis, nd):
    nodeu = nodes[u]
    for edge in nodes[u].edges:
        v, l = edge
        if v == f:
            continue
        nodev = nodes[v]
        # 更新最大距离,以及最大距离的点
        if dis + l > max_dis:
            max_dis, nd = dis+l, nodev
        # 递归
        _max, _nd = dfs(v, u, dis+l, max_dis, nd)
        # 如果递归得到的距离更大,则更新
        if _max > max_dis:
            max_dis, nd = _max, _nd
    # 返回
    return max_dis, nd

# 第一次递归,获取距离最大的节点
_, nd = dfs(0, -1, 0, 0, None)
# 第二次递归,获取最大距离
dis, _ = dfs(nd.id, -1, 0, 0, None)
print(dis)

At this point, even if this interesting topic is finished, do you know that you have learned the two methods in the article? It may seem a bit confusing for the first time. It is normal to have a lot of problems, but the core principle is not difficult . Draw a picture and do a good calculation, you can definitely get the correct result.

Today's article is just that. If you feel something rewarded, please follow or repost it. Your effort is very important to me.

Insert picture description here

Published 101 original articles · Liked 54 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/TechFlow/article/details/105407065