[Algorithm] Tree array data structure

Part.I Preliminary knowledge

Reference:
Simple and easy-to-understand detailed explanation of tree arrays

Chap.I some premises and concepts

  • The binary representation of negative numbers in the computer
  • Prefix sum: The prefix sum refers to the sum of all array elements before a subscript of an array (including itself). The prefix sum is divided into one-dimensional prefix sum and two-dimensional prefix sum. The prefix sum is an important preprocessing that can reduce the time complexity of the algorithm. For example, the formula for one-dimensional prefix sum: sum[i] = sum[i-1] + arr[i] ; sumis the prefix sum array, arrand is the content array. With the prefix and array, we can O(1)find the interval sum in the time complexity of .
  • suffix and:
  • Discretization: Map finite individuals in infinite space to finite space to improve the space-time efficiency of the algorithm. In layman's terms, discretization is to reduce the data correspondingly without changing the relative size of the data. Discretization can be performed when the data is only related to the relative size between them, and has nothing to do with the specific number. There are four numbers 1234567, 123456789, 12345678, 123456, we sort them first 123456<1234567<12345678<123456789 → 1<2<3<4; so the original data can be mapped as: 2, 4, 3, 1.

Chap.II lowbit function

Leaving aside its purpose, first understand how this function is calculated. As the name suggests, lowbitthe function of this function is to find the lowest bit in the binary representation of a certain number 1. For example, x = 6if its binary value is 110, then lowbit(x)it returns 2, because the last bit 1 means 2.

How to ask lowbit? There are generally two ways:

  • First eliminate the last digit 1( x & (x - 1), x-1will not affect lowbitthe left 1), and then subtract the last digit 1from the original number x - (x & (x - 1)). x = 24For example, the binary representation in an 8-bit computer is 00001100, x - 1the binary representation is 00001011, x & (x - 1)and the binary representation is 00001000, so x - (x & (x - 1))its binary representation 00000100is what we want lowbit.
  • According to "The method of expressing negative numbers by computer" (2's complement), the AND ( ) of the number itself and the reverse of the number x & -x. x = 24For example, the binary representation in an 8-bit computer is 00001100, -xthe binary representation is 11110100, and x & -xthe binary representation 00000100is what we want lowbit.

Part.II tree array

A tree array is a data structure, why construct such a data structure? This is because it has its unique advantages in solving certain problems. Consider such a problem: there is an narray with a length of a[n], and we want to perform some operations on it: such as "query" (query the sum of all elements in a certain interval), "update" (change the value of an element) . Now I want to do qan update and qa query. qThis update and qquery are interspersed!


If the original data structure is used, the time complexity of each "update" is O(1)(because I want to change ithe value directly a[i]=value), and the time complexity of each "query" is O(n)(because nthe sum of the numbers is required, it is necessary to do a nloop of length );


If a tree array is used, the time complexity of each "query" can be reduced to a minimum O(log(n)), but so is the time complexity of each "update" O(log(n)). why? The reasons will not be listed for the time being, and will be analyzed in detail later.


Chap.I idea of ​​tree array

The next picture first (from Zhihu @orangebird )
insert image description here
If it doesn’t work, then the next one (from CSDN@FlushHip )

insert image description here

The tree-like array structure (because its structure is like a number and it is an array, so it is called a tree-like array) is based on binary. Looking at the above picture, you can clearly grasp its idea, but why should it be divided in this way? This will use the above lowbit. Let's consider an 8array with a length of a, the new array is called c, and the new array is obtained from the old array through the organization in the above figure.

  • Query: For example, if I want to ask sum(1:7), the first 7binary representation is 111, ∑ i = 1 n = 7 ai = ( a 1 + a 2 + a 3 + a 4 ) + ( a 5 + a 6 ) + a 7 \sum\limits_ {i=1}^{n=7}{a_i}=(a_1+a_2+a_3+a_4)+(a_5+a_6)+a_7i=1n=7ai=(a1+a2+a3+a4)+(a5+a6)+a7, the way to write pseudocode (the array subscript is binary) is sum(001:111)=c[111]+c[110]+c[100], that is sum(1:7)=c[7]+c[7-lowbit(7)]+c[6-lowbit(6)], the time complexity is ⌈ log 2 ( n ) ⌉ \lceil log_2(n) \rceillog2( n )⌉ , ieO(log n).
  • Update: For example a[3], the value I want to change, the first 3binary representation is 0011, then for 8an array of length , I need to update c[3], c[4], c[8]; in other words, I need to update (binary subscript) a[0011], a[0100], a[1000]; that is, I need to update a[3], a[3+lowbit(3)], a[4+lowbit(4)]. Apparently, so is its time complexity O(log n).

The above explains why the time complexity of the "query" and "update" operations of the tree array is so high O(log n).
I had a question before: Then why not save aand at the same time c? If you want to do an update operation, do it directly on a, the time complexity is O(1); if you want to do a query operation, cdo it on , the time complexity is O(log n). Note that the operations of "query" and "update" are carried out alternately. When a"update" is performed on the Internet, c"new information" can only be reflected when subsequent "query" is performed after refactoring, but cthe time complexity of refactoring is O(n), If you do this, optimization will optimize loneliness.

Chap.II Construction of Tree Array

According to the above discussion, construct such a class, which contains the functions:

  • lowbit: get an integerlowbit
  • BIT: Constructor, vector<int>initialized according to
  • update: Update function, i>0add the first numberval
  • query: query function, returns mthe sum of the previous numbers
  • print: outputtree
class BIT {
    
    
private:
    int n;              // the length of the tree
    vector<int> tree;   // the data tree

public:
    int lowbit(int x) {
    
     return x & -x; }

    BIT(vector<int> a)
    {
    
    
        n=a.size();
        vector<int>  temp(n,0);
        tree=temp;
        for(int i=0;i<n;i++)
        {
    
    
            update(i+1,a[i]);
        }
    }

    /**
     * @brief  updata the tree array
     * @param[in] i         the index, >=1
     * @param[in] val       the value of the update, =now-origin
     * @return              none
     */
    void update(int i, int val)
    {
    
    
        for(;i<=n;tree[i-1]+=val,i+=lowbit(i));
    }

    /**
     * @brief  query the summary of the first m terms
     * @param[in] m         the index, >=1
     * @param[out] sum      the sum
     * @return              int
     */
    int query(int m)
    {
    
    
        int sum=0;
        for(;m>0;sum+=tree[m-1],m-=lowbit(m));
        return sum;
    }

    void print()
    {
    
    
        for (int i = 0; i < n; cout << tree[i] << "    ", i++);
        cout << endl;
    }
};

Call example:

int main()
{
    
    
    int test[7]={
    
    1,2,3,4,5,6,7};
    vector<int> origin(test, test + 7);
    BIT bt(origin);
    bt.print();                 // 打印 tree 的内容
    cout<<bt.query(5)<<endl;    // 输出前5项和
    bt.update(3,6);             // 第3项加6
    bt.print();                 // 打印更新后的 tree 的内容
    cout<<bt.query(5)<<endl;    // 输出更新后的前5项和
    getchar();
    return 0;
}
// ----------------- output ------------------
1    3    3    10    5    11    7
15
1    3    9    16    5    11    7
21

The above code can be downloaded for free: download address

Part.III Application of tree array

Chap.I LeetCode: 2426. The Number of Number Pairs That Satisfy the Inequality

That's right, it's because I encountered this question when I was brushing the questions 2426, that's why I wrote this note, and finally showed my fangs (RUA!!).


Sec.I topic description and analysis

First, the topic description is:

You are given two integer arrays with subscripts starting from 0 nums1and nums2the size of both arrays is , nand you are given an integer at the same time diff, count the number pairs that meet the following conditions (i, j):

  • 0 <= i < j <= n - 1
  • andnums1[i] - nums1[j] <= nums2[i] - nums2[j] + diff

Please return the number of pairs that satisfy the condition.

Problem-solving video: bilibili@林茶山艾府


Topic analysis (based on python):

  • First, transpose: nums1[i] - nums2[i] <= nums1[j] - nums2[j] + diff, so that nums[i] = nums1[i] - nums2[i] we only need to find all the data pairs that 0 <= i < j <= n - 1satisfy at that time .nums[i] <= nums[j] + diff(i, j)
  • Because nums[i]it is inevitable that there will be elements with the same value, we can use them setfor uniqueness and then sort them b.
  • Discretization: Construct a tree-like array bt(all elements are initialized to 0), the length of the tree-like array is equal to numthe number of different elements in len(set(nums))(equivalent to numsdividing into so many grades [regardless of the size of the data, only care about the relative size of the data, This is discretization], each element of the tree array stores the number of data at this level). The tree array has two main functions, one is add(x)( xadd one to the value of the index, the value here is the above A, but the tree array stores C, so more than one element needs to be changed), and the other is query(x)(find the index xThe sum of all data less than).
  • We use a pointer ito traverse nums, and fill the tree array during the traversal process bt. The tree array stores x=nums[i]the number of each "grade" element on the left. We first use it index=bisect_right(b, x + diff)to bfind x+diffthe minimum index value of the element greater than or equal to, and then use query(index)Count the sum of the number nums[i]of elements on the left greater than x+diffor equal to (that is, find the sum of all the numbers that nums[m] <= nums[i] + diffsatisfy and)m<im
  • Then use the maximum index value of all elements whose elements are less than or equal to (that is, find the index2=bisect_left(b, x)corresponding "grade" index), and then use the function to add it to the tree array to prepare for the next entry.bxxadd(index2)query(i+1)
  • Summing all of them query(index)gives us what we need

Note that although this question uses a tree array, the array does not store the element value, but the number of elements. In addition, the tree array is not constructed at once, but is gradually established during the process of traversing the query to add elements. Knowing these two points, it should be easy to understand after watching the video explanation. The author has tried to sort out this idea as much as possible, but looking back it is still a bit of a mouthful orz


Sec.II Code Implementation

The following is the C++ code implementation

class BIT {
    
    
private:
    int length=0;
    vector<int> tree;
public:
    BIT(int n)
    {
    
    
        length=n;
        vector<int> temp(n,0);
        tree=temp;
    }
    int lowbit(int x){
    
     return x & -x; }
    void add(int i)
    {
    
       // i=index+1,>=1
        while(i<=length){
    
     tree[i-1]++; i=i+lowbit(i); }
    }
    int query(int i)
    {
    
       // i=index+1,>=1
        int sum=0;
        while(i>0){
    
    
            sum+=tree[i-1];
            i-=lowbit(i);
        }
        return sum;
    }
};

class Solution {
    
    
public:
    long long numberOfPairs(vector<int>& nums1, vector<int>& nums2, int diff) {
    
    
        int n=nums1.size();
        vector<int> nums(n,0);
        for(int i=0;i<n;i++) {
    
     nums[i]=nums1[i]-nums2[i]; }
        vector<int> b(nums);
        sort(b.begin(),b.end());
        b.erase(unique(b.begin(),b.end()),b.end());
        BIT bt(b.size());
        long ans=0;
        for(int i=0;i<n;i++)
        {
    
    
            ans+=bt.query(upper_bound(b.begin(),b.end(),nums[i]+diff)-b.begin());
            bt.add(lower_bound(b.begin(),b.end(),nums[i])-b.begin()+1);
        }
        return ans;
    }
};

Noteworthy points:

  • upper_bound(b.begin(),b.end(),val)The function of the function is to find the smallest index iterator (which can be understood as a pointer) whose belement value is greater than or equal to in the container (the data is already ordered) , and return the element value of the index, which is the index valueval*upper_bound(xx)upper_bound(xx)-b.begin()
  • upper_bound(b.begin(),b.end(),val)The function of the function is to find the largest index iterator (which can be understood as a pointer) whose belement value is smaller than that in the container (the data is already ordered) , and the other uses are the samevalupper_bound

The following is the code implementation of python:

class BIT:
    def __init__(self,n: int):
        self.length=n
        self.tree=[0]*n
    def add(self, i: int):
        while(i<=self.length):
            self.tree[i-1]+=1
            i+=(i & -i)
    def query(self, i: int) -> int:
        sum=0
        while(i>0):
            sum+=self.tree[i-1]
            i-=(i & -i)
        return sum
 

class Solution:
    def numberOfPairs(self, nums1: List[int], nums2: List[int], diff: int) -> int:
        n=len(nums1)
        nums=[0]*n
        for i in range(n):
            nums[i]=nums1[i]-nums2[i]
        b=sorted(set(nums))
        bt=BIT(len(b))
        ans=0
        for i in range(n):
            ans+=bt.query(bisect_right(b,nums[i]+diff))
            bt.add(bisect_left(b,nums[i])+1)
        return ans

Chap.II LeetCode: 51. Reversed Pairs in Arrays

This question should be a classic one, after all, it has already been included in "Jianzhi Offer". It's actually very similar to the one above, but simpler than that one. So I won’t analyze it below, just post a solution


Here is the python based code:

class Solution:
    def reversePairs(self, nums: List[int]) -> int:
        b = sorted(set(nums))
        ans = 0
        n = len(b)
        bt = BIT(n)
        for x in nums:
            temp=n-bisect_left(b, x)
            ans += bt.query(temp-1)
            bt.add(temp)
        return ans

class BIT:
    def __init__(self,n: int):
        self.length=n
        self.tree=[0]*n
    def add(self, i: int):
        while(i<=self.length):
            self.tree[i-1]+=1
            i+=(i & -i)
    def query(self, i: int) -> int:
        sum=0
        while(i>0):
            sum+=self.tree[i-1]
            i-=(i & -i)
        return sum

Guess you like

Origin blog.csdn.net/Gou_Hailong/article/details/127717773