The principle of HDU2665 chairman tree solves the problem of the Kth largest value in the static interval. The summary has detailed illustrations and code explanations.

I just learned a little bit of the chair tree, and I want to write a detailed explanation about the chair tree by myself, mainly for the chair tree to solve the problem of the K- th largest value in the static (unmodified) interval. You can refer to HDU 2665 . The chairman tree algorithm to solve other problems will be added after you understand it yourself. Please point out if there are any errors below, thank you very much!


Thanks to the following blog post for explaining the chairman tree:

1. Chairperson Tree 1

2. Chairman Tree 2

3. Chairman tree 3

 

Prerequisite skills :

1. Line segment tree.

2. Prefix and.

3. How to use sort function, unique function and lower_bound function.

 

The chairman tree is also called a functional line segment tree, which is actually a data structure formed by the interconnection of multiple line segment trees. The name chairman tree has nothing to do with its functional hair. You can call it a dog egg tree if you like. Its time complexity and space complexity are both O(nlogn).

 

Closer to home, let's first look at the problem of finding the Kth largest value in a static interval . The usual practice is to sort the interval first, and then find the Kth largest value. Once the Kth largest value of multiple sets of intervals is calculated, the time complexity is very high. At this time, you have to use the chairman tree.

 

Let's simplify the problem first: find the Kth largest value of the entire interval . We can use the line segment tree to solve the problem. Let the i-th leaf node of the line segment tree indicate how many items are in the ith row in the original array, while other nodes indicate how many items are in the row from L to R, where L and R are controlled by it. Range [L,R]. For example, give you a set of numbers: 1, 2 , 2 , 2 , 4, 4, 8. Let you find the 5th smallest number, and you find that the 1st to 4th smallest numbers are 1, 3, 2 , 1 respectively One, it is obvious that the fifth smallest number is in the number represented by the red number 2, which is the number 4.


Now let's consider finding the Kth largest value of any interval. The knowledge of prefix sum is used here. For example, if you are given a set of numbers and let you find the sum in the interval [L,R], a simple method is to find the prefix sum sum[i]=a[1]+a[2] +…+a[i], and let sum[R]-sum[L-1] be the result. The same is true here. Assuming there are n numbers, we need to build n line segment trees with exactly the same structure (that is, the number and position of the nodes are the same, but the values ​​represented by the nodes are different) , that is, for each first i of the original array Values ​​build a line segment tree similar to the prefix sum, we let the i-th line segment tree store the largest value of 1~K in the interval [1, i], where K is the number of different numbers in the original array. According to the introduction in the previous paragraph, we can achieve it. If you want to calculate the Kth largest value of the interval [L,R], you only need to subtract the node corresponding to the L-1th line segment tree from the node of the Rth line segment tree, and then search on the new line segment tree.

Why is this feasible? As mentioned earlier, because all line segment trees are isomorphic, each node represents the same meaning, and it is the number of the L~Rth numbers among the first i numbers. When the i-th and j-1th line segment trees are subtracted, it indicates the number of numbers ranked L~R among the j-1~ith numbers. It may be a little convoluted to say that, let us look at the picture below.

 

1. Achievement

The figure below is for data: the tree built by 1, 4, 2, 3.


 

上图是初始化时的状况,这时候还没往树中插入任何元素。图中每个矩形块表示一个节点,其中间绿色的数字表示当前数形成的线段树中排第L~R 的数的个数,其中L、R是这个节点所能表示的范围 [L,R]。矩形的左右两端的数就是这个节点表示的范围L和R了。至于节点外的数字可以看作是节点的编号,从1开始,按照中根遍历的顺序编号。对于每个叶子节点下面的红色数字表示的是在原数组中排第几。第 i 个叶子节点自然是表示排第 i 了。

 


上图分别为插入第1个数和插入第2个数所形成的线段树。我们将上面两图的对应节点相减一下,是不是就得到了只插入第 2 个数时候形成的线段树呢?这里我想再强调一点,线段树的第 i 个叶子节点保存的不是数的值,而是在原数组中排第 i 的数有多少个,而其他节点表示的是排第 L~R 的数有多少个,其中L、R 就是这个节点所能表示的范围 [L,R]。

 

 

二、更新

上面说到了,如果原数组有n 个节点的话需要建立n 个线段树,用脚指头想都会觉得十分耗费空间。我们发现,第 i 个线段树是在第i-1 个线段树的基础上改变了一些值而来的。所以,我们可不可以共用那些没有改动的值呢?当然是可以的了。

 

 

如上图所示,插入第 1 个数形成的线段树和初始化时的线段树的改动的部分就是图中红线圈起来的部分。所以我们只需要在原线段树的基础上加上这些点即可,其他点共用即可。

 


如上图所示,红色的部分就是插入第 1 个数形成的线段树,它共用了前一个线段树的一部分。注意,这时候新节点的编号不是从1开始重新编号的。

 

 

又如上图所示,蓝色的部分就是插入第 2 个数形成的线段树,它又共用了前一个线段树的一部分。


正是通过考虑到插入一个数的时候只会更改log(n) 个节点,也就是树高个节点,所以这需要添加这些节点即可,这样一来就实现了压缩空间的目的。

 

 

三、查询

就如上面提到过的,如果要查询区间 [L,R] 先要让第R 颗线段树减去第L 颗线段树,然后在得到的新树中查找,其实这个过程可以一边相减一边查找,因为你要查找一个第K 大数,它所查找经过的节点路径是一定的。例如你要查找第K 大数,已经得到了相减后的新树,如果新树根节点的左子树中有num 个数,如果num>=k ,则说明要查找的数在左子树中,否则在右子树中,利用递归查找即可,当区间长度为 1 时就查找到了。

 

 

具体实现:

我们用L、R数组保存节点所能表示的范围 [L,R],sum数组表示排第第L~R 的数的个数。tol表示节点的编号,如果编号相同,则L、R、sum表示同一个节点。当然这里也可以用一个结构体保存一下。

a数组保存原数组,hash数组保存排序后的数组,T数组保存插入每个元素后形成的线段树的根节点的编号

如果原数组中有n 个不同的数,则我们建一个叶子节点有n 个的线段树就可以了。它们分别排第 1~n 。获取不同的数的个数可以用unique函数。查找当前数排第几可以用lower_bound函数。


总结一下:主席树就是对原数组的前 i 个数建一颗线段树保存前 i 个数的第 1~n 大值信息,其中 n 为原数组中不同数的个数。由于插入当前数时只改变了logn个节点的值,所以前一棵树可以重复利用,大大节省了空间。在查询时,利用前缀和的性质,区间 [L,R] 对应的第 R颗数减去第 L-1 棵树,得到这段区间内的第 1~n 大值信息,然后查找。如果左子树中的数的个数大于要查找的 K ,则结果在左子树中,否则在右子树中查找。


下面是HDU 2665的AC代码,如果注释有什么不正确的地方还请大家多多指正~

#include<stdio.h>
#include<string.h>
#include<iostream>
#include<algorithm>
#define MAXN 100010
using namespace std;

int tol;
//若tol值相同,则L、R、sum就表示同一个节点
//L为左端点的编号,R为右端点的编号,sum表示区间[L,R]内数的个数 
int L[MAXN<<5],R[MAXN<<5],sum[MAXN<<5];
int a[MAXN],T[MAXN],Hash[MAXN]; //T记录每个元素对应的根节点 

//建树函数,建立一颗空树
int build(int l,int r)
{ //参数表示左右端点 
    int mid,root=++tol;
    sum[root]=0; //区间内数的个数为0
    if(l<r)
    {
        mid=(l+r)>>1;
        L[root]=build(l,mid);   //构造左子树并将左端点编号存入L 
        R[root]=build(mid+1,r); //构造右子树并将右端点编号存入R
    }
    return root;
}

//更新函数
int update(int pre,int l,int r,int pos)
{//参数分别为:上一线段树的根节点编号,左右端点,插入数在原数组中排第pos 
	//从根节点往下更新到叶子,新建立出一路更新的节点,这样就是一颗新树了。
    int mid,root=++tol;
    L[root]=L[pre]; //先让其等于前面一颗树 
    R[root]=R[pre]; //先让其等于前面一颗树
    sum[root]=sum[pre]+1; //当前节点一定被修改,数的个数+1 
    if(l<r)
    {
        mid=(l+r)>>1;
        if(pos<=mid) L[root]=update(L[pre],l,mid,pos); //插入到左子树 
        else R[root]=update(R[pre],mid+1,r,pos); //插入到右子树 
    }
    return root;
}

//查询函数,返回的是第k大的数在原数组中排第几
int query(int u,int v,int l,int r,int k)
{ //参数分别为:两颗线段树根节点的编号,左右端点,第k大 
    //只会查询到相关的节点 
    int mid,num;
    if(l>=r) return l;
    mid=(l+r)>>1;
    num=sum[L[v]]-sum[L[u]]; //当前询问的区间中左子树中的元素个数
    //如果左儿子中的个数大于k,则要查询的值在左子树中 
    if(num>=k) return query(L[u],L[v],l,mid,k); 
    //否则在右子树中 
    else return query(R[u],R[v],mid+1,r,k-num);
}

int main()
{
    int i,n,m,t,d,pos;
    scanf("%d",&t);
    while(t--)
    {
        scanf("%d%d",&n,&m);
        for(i=1;i<=n;i++)
        {
            scanf("%d",&a[i]);
            Hash[i]=a[i];
        }
        sort(Hash+1,Hash+n+1);
        d=unique(Hash+1,Hash+n+1)-Hash-1; //d为不同数的个数
        tol=0; //编号初始化 
        T[0]=build(1,d); //1~d即区间 
        for(i=1;i<=n;i++)
        { //实际上是对每个元素建立了一颗线段树,保存其根节点
            pos=lower_bound(Hash+1,Hash+d+1,a[i])-Hash;
            //pos就是当前数在原数组中排第pos 
            T[i]=update(T[i-1],1,d,pos);
        }
        int l,r,k;
        while(m--)
        {
            scanf("%d%d%d",&l,&r,&k);
            pos=query(T[l-1],T[r],1,d,k);
            printf("%d\n",Hash[pos]);
        }
    }
    return 0;
}

Guess you like

Origin blog.csdn.net/zuzhiang/article/details/78173412