Data Structure: Detailed Explanation of Tree Array

1. Background

   So why do we use tree arrays? 

In solving the problem of summing some intervals,  the simple description is that for a given array A, it is hoped that an update function can be designed to modify the value of one of the numbers, and then a sum function can be designed to calculate the array subscript and then given Sum of values ​​between parameters l and r. The key point is that these two functions may be called countless times, so it is necessary to ensure that the complexity of both functions is small.

The two methods we usually use for this kind of problem are: 

1> Violent algorithm
The update function uses the method of directly modifying the array. O(1),     for the sum function, return the sum of the nums elements between (including) the index left and the index right in the array nums, but the number of operands of this question is large, and it will time out. O(n)
2> prefix and algorithm
The sum function considers the method of prefix sum, that is, adopts the method of dynamic programming, and designs a new array sums to represent the prefix sum.
The basic idea is that sums [i] means A[0]+A[1]+A[2]+...+A[i-1]. For specific operations, we only need to set sums[i] = sums[i-1]+A[i-1].
If you want to calculate the interval sum of array subscripts from l to r, you only need to calculate sums[r+1]-sums[l]. It is hoped that this method can be used to simplify the computational complexity. However, the update function needs to be considered at this time. When we directly modify the value of the array A, we find that the sums array also needs to be modified.
Take a chestnut:
Original array: A= {1,2,3,4,5}
Prefix and array: sums={0,1,3,6,10,15}
Analysis: When we modify A[2] and change its original value from 3 to 2.
In this way, for the update function, our complexity will increase, which can be considered as O(n), even if the current sum function complexity becomes O(1), the total complexity of this question is still O(n) .
Therefore, when encountering such problems, it is first necessary to be able to quickly calculate the interval sum, and secondly, to ensure that after modifying the value of the array, the number of operations to modify the content of the relevant data structure should be as small as possible, thus leading to the tree array.

2. Tree array

A tree array is a data structure with log(n) query and modification complexity. It is mainly used for fast single-point modification of arrays and fast interval summation.
The diagram is as follows:

       The bottom row of green nodes is the original array A from left to right, and the black nodes with numbers are the nodes of the tree array C. We can see, for example, the C[8] node, its child nodes There are C[4], C[6], C[7], A[7] (the eighth green node) and C[4], C[6], C[7] have their own child nodes respectively point. According to this picture, the C[8] node can be expressed as:

C[8] = C[4] + C[6] +C[7] +A[7]
Change a picture to understand:
        All the digital nodes represented in this picture are the tree array C, and the bottom row also represents the original array A (including empty nodes) from left to right. The figure shows the update process of all node values ​​that need to be modified to modify the fifth element in the original array and the query process of calculating the sum of node values ​​with subscripts 0~14 in the original array.
<1> Update process
        According to the calculation of the tree array above, we can refer to this picture to know that C[5] is a child node of C[6], C[6] is a child node of C[8], and C[8] is The child node of C[16], when we modify A[4] because C[5] = A[4], C[5] will change, causing C[6] to change, and then causing C[8] Change, and then cause C[16] to change, so what is the value of the change, which is the new value of A[4] minus the old value of A[4] .
The idea here is a bit similar to the prefix sum mentioned earlier. The prefix sum is to modify sums[5] and all subsequent nodes (because these nodes contain the information of the A[4] node), but Here you only need to modify the information of the C[5], C[6], C[8] and C[16] nodes (because only these nodes contain the information of the A[4] node), in general Updating the data structure is faster than updating the node with prefixes and methods.
Then this is a drawing to see that C[5[, C[6], C[8] and C[16] need to be updated. So which nodes are modified specifically? Here we want to look at it from a binary point of view, redraw a binary diagram, and change the decimal system on all nodes to binary representation:

Correspondingly, we will find that after the value of C[101] changes, it will affect C[110], and then affect the value of C[1000]. So what is the relationship between 101, 110 and 1000?
It can be concluded that 110 is obtained by adding 1 to 101, 1000 is obtained by adding 10 to 110, and 10000 is obtained by adding 1000 to 1000.
Here we design a function lowbit(int x) to calculate a given subscript x, and return the number obtained by keeping only the lowest bit 1 of its binary subscript, which is an algorithm for why 110 can reach 101:
Take x=5 as an example:
public int lowbit(int x){
        return x & (-x);
    }
The original code, inverse code, and complement code of positive numbers are the same
Example: 5
Original Code: 0000 0101
Inverse code: 0000 0101
Complement code: 0000 0101
The original code, inverse code, and complement code of negative numbers (the first bit indicates the sign bit 1: negative number 0: positive number)
Example: -5
Original Code: 1000 0101
Negative code: 1111 1010 (reverse the original code except the sign bit)
Complement code: 1111 1011 (complement code +1)
Then -5 is 1111 1011
7 & -7 = 0000  0001 = 2
So the next thing we want to change after changing tr[5] is 5+2 = 7, which is tr[7]
Then after the value of A[4] (C[101]) has changed, we only need to calculate the lowbit for 101 to get 1, then modify the value of C[110], then calculate the lowbit of 110 to get 10, and then modify C[1000 ] value, repeating this until the subscript of the node currently to be modified is out of bounds. This way we get our update function:
 //更新操作
    public void update(int index,int val){
        for (int i = index; i < this.tr.length; i += lowbit(index)) {
            tr[i] += val;
        }
    }
The overall complexity is O(logn), because each lowbit results in a shift, which is equivalent to repeatedly dividing the array length n by 2 until 0.
<2> query process
        The query process is to calculate the sum of all values ​​from A[0] to A[i]. If I need to query the sum of A[0]~A[14], we will find that 8 nodes contain 0~A[14] if we observe the current graph. 7, 12 nodes contain 8~11, 14 nodes contain 12~13, 15 nodes contain 14
Then we only need to sum C[8], C[12], C[14], C[15]
The same is also used lowbit function
//求和
    public int sumRange(int index){
        int sum = 0;
        for (int i = index; i > 0; i-=lowbit(index)) {
            sum += this.tr[i];
        }
        return sum;
    }
The query process is equivalent to the process of finding the prefix sum, so with the prefix sum, we can get the answer of some of the interval sums through the difference of the prefix sum .
The complexity used here is O(logn), because the lowbit is shifted by one bit each time, which is equivalent to the complexity of repeatedly dividing the array length n by 2 until 0.

 

Guess you like

Origin blog.csdn.net/weixin_71243923/article/details/131128766