[Explanations] CF 1073 G. Another LCP

https://codeforces.com/problemset/problem/1073/G
meaning of problems: Given a string of length n, s, and q times with a query, query given every two sequences \ (\ {a_i \} \) (referred to length k) and \ (\ {b_j \} \) (referred to length l), output \ (\ Sigma_ {i = 1 } ^ {k} \ Sigma_ {j = 1} ^ {l} the LCP (S [a_i, n-], S [B_i, n-]) \) .

Data range: \ (. 1 \ n-Leq, \ Sigma K, \ Sigma L \ Leq 2E5 \) .

Ideas :

LCP should seek the most convenient or suffix array, a post-treatment height array, \ (LCP (i, j) = height [i + 1 \ sim j] \) is the minimum value (i, j denotes the i-th bit ordering sorting and j-th bit of the suffix. pretreatment with interval minimum value table ST can be done O (1) query.

But the data range \ (. 1 \ Leq \ Sigma K, \ Sigma L \ Leq 2E5 \) , obviously if violent interrogation, it is \ (O (k ^ 2) \) , the natural T fly.

According to this idea, in fact, this question can be transformed as follows:

Given \ (\ {a_i \} \ ) and \ (\ {b_j \} \) , required by the above sequence of points and the minimum interval of the composition.

So ...... tm This obviously is a data structure that, and strings have anything to do! ?

For this count problem if you can not enumerate violence, it must be the enumeration process can be combined, the same or similar parts add up to a one-time, reducing the number of enumeration.

According to this idea that, two sections have the same kind of minimum? In fact, as long as a minimum contain the same number can be. (Sense a bit, this sentence is not very rigorous

So, if we enumerate the minimum value of the entire range of large, then any interval containing a minimum of this position is this value.

Can we continue to enumerate the second smallest value? of course can! Just be sure to select interval does not contain the minimum number of position just that, then the minimum value of these intervals are the second smallest value.

And so on, from small to large enumeration, enumeration of the number counted every time before had not contain intervals, then multiplied by the current value is the contribution of this section. Finally, the contribution of each part add up to the final answer.

Then count the number of how each section?

Set the length of the interval length of \ (L \) , establish a set, is stored in all positions had been enumeration. Initial placed \ (0 \) and \ (. 1 + L \) . After the minimum value, the second smallest, third smallest ...... the position I , found in the set \ (I \) a first number of left \ (Mn \) and \ (I \) the first number to the right of \ (MX \) , statistics \ (\ {a_i \} \ ) sequence \ (mn <a_i \ leq i \) number and \ (\ {b_i \} \ ) sequence \ (i <b_i <mx \) number (the number of two, the occurrence of a position marker, and to find a prefix O (1) query), multiplied by two, that is:

  • Positions comprising i
  • Enumeration does not include previously had location
  • From the left point \ (a_i \) , right from the point \ (b_j \)

Section number (including only the location does not contain \ (I \) interval). If you want to enumerate the left point from \ (b_i \) , right from the point \ (a_j \) range, you can also get a similar analysis. As for the period including only the position i may be calculated separately.

So we solved the problem after conversion, the time complexity is \ (O (L \ log L) \) .

But this is still not here because \ (L \) is the length of the interval, but the problem is the number of said given endpoint \ (\ Leq 2E5 \) , if every two points only to the 2E5 and 1, then The worst complexity has become a \ (1E5 \ Times 2E5 \ log 2E5 \) .

So we have to do is ...... discrete, the unused sections are combined into points, simple ideas are as follows :( do not know, sense the main points

  • The current round of questioning all \ (\ {a_i \} \ ) and \ (\ {b_i \} \ ) are put in the same sequence, from small to large.
  • For two unequal original coordinates i and j (i <j), if i <j-1, then the i, [i + 1, j-1], j were built new point. For example, 3 and 200, becomes a point 3, in the interval 4-199 as a minimum point, a point 200.
  • If i = j-1, i and j establish it only two points. Such as 3 and 4, 3, and 4 clearly no intermediate number, no new add.

A new maximum at most plus points, the length of the interval between the two original end points so only \ (2 * 2E5 \) , and then apply the above approach, the complexity on the right.

Program uses the set and sorting operations, progressive complexity \ (O (n-\ log n-+ (\ Sigma K + \ Sigma L) \ log (\ Sigma K + \ Sigma L)) \) , but in fact with a lot of times sorting, so the big fly constant.

to sum up:

  • Write questions too slow, I wrote this day, debug year.
  • Details too much. There are two most troublesome areas, the LCP is converted to a range above a minimum value to ask questions, since the calculation method is not the same, must be divided into three parts. Of the LCP (i, i), previously statistics; of LCP (i, j) (i <j), should all i + 1 are then run again, for (i> where j), to give the j + 1 pitted again ...... another is discrete, hard to write. The last line of code faster 200 ......

AC Code: 841ms, 43836KB. Intermediate in some places (especially the discrete part) in order to facilitate understanding and allow yourself not to be confused, so writing is not the most optimized version, a lot of duplicate code.

#include<bits/stdc++.h>
#define LL long long 
using namespace std;
const int M=2e5+50,P=1e9+7;
char s[M];
int rk[M],sa[M],tp[M],tax[M],height[M],len,m=150,a[M],b[M];//rk(第一关键字)第i位的排名,sa是排名为i的位置,tp是第二关键字辅助用的,tax是桶数组;
void Rsort(){//用k*2排序的第二关键字和k排序的结果(rk,即k*2排序的第一关键字)得到k*2排序结果sa
    for(int i=1;i<=m;i++) 
        tax[i]=0;
    for(int i=1;i<=len;i++)//把第一关键字放到桶里
        tax[rk[tp[i]]]++;
    for(int i=1;i<=m;i++) 
        tax[i]+=tax[i-1];
    for(int i=len;i>=1;i--)//从第二关键字小的开始往sa里丢,先进入的在后面
        sa[tax[rk[tp[i]]]--]=tp[i];
}
void Suffix(){
    for(int i=1;i<=len;i++)
        rk[i]=s[i],tp[i]=i;//得到用2^0排序的rk,tp是随便搞的
    Rsort();//得到用2^0排序的sa
    for(int k=1;k<=len;k<<=1){
        //将用k排序的结果变成用k*2排序的第二关键字
        int num=0;
        for(int i=len-k+1;i<=len;i++)
            tp[++num]=i;
        for(int i=1;i<=len;i++)
            if(sa[i]>k)
                tp[++num]=sa[i]-k;
        Rsort(),swap(rk,tp);//用tp存下k排序的结果rk,下面用k*2排序的sa得到k*2排序的rk,用k排序的rk是为了找出相同部分
        rk[sa[1]]=num=1;
        for(int i=2;i<=len;i++)
            rk[sa[i]]=(tp[sa[i]]==tp[sa[i-1]]&&tp[sa[i]+k]==tp[sa[i-1]+k])?num:++num;//虽然sa下标不一样,但rk必须将排位相同的标记出来
        if(num==len)
            break;
        m=num;
    }
}
void getheight(){
    int k=0;
    for(int i=1,j;i<=len;i++){
        if(rk[i]==1)
            k=0;
        else{
            if(k)
                --k;
            j=sa[rk[i]-1];
            while(i+k<=len&&j+k<=len&&s[i+k]==s[j+k])
                ++k;
        }
        height[rk[i]]=k;
    }
}
int mi2[35],logg[M],n,f[M][35];
void initst(int n){
    mi2[0]=1,logg[0]=-1;
    for(int i=1;i<=30;++i)
        mi2[i]=mi2[i-1]*2;
    for(int i=1;i<=n;++i)
        logg[i]=logg[i/2]+1;
    for(int i=1;i<=n;++i)
        f[i][0]=height[i];
    for(int i=1;i<=logg[n];++i)
        for(int j=1;j<=n+1-mi2[i];++j)
            f[j][i]=min(f[j][i-1],f[j+mi2[i-1]][i-1]);
}
int query(int l,int r){
    int lg=logg[r-l+1];
    return min(f[l][lg],f[r-mi2[lg]+1][lg]);
}
int id[M],numb[M],numa[M];
vector<int> nh;
bool cmp(int x,int y){
    return nh[x]<nh[y];
};
LL sum(int *a,int la,int *b,int lb){//nh放的是各坐标对应的height最小值 
    nh.clear();
    nh.push_back(0);
    int pa=1,pb=1,p=0,ls;
    for(int i=1;i<=la;++i)
        a[i]+=1;//p记录当前最大坐标,ls记录上一次放进去的原坐标,用于判断是否要新建中间的区间 
    if (a[pa]<=b[pb])
        numb[++p]=0,numa[p]=1,nh.push_back(height[a[pa]]),ls=a[pa],++pa;
    else
        numb[++p]=1,numa[p]=0,nh.push_back(height[b[pb]]),ls=b[pb],++pb;
    while (pa<=la&&pb<=lb)
        if (a[pa]<=b[pb]){
            if (ls==a[pa]){
                numa[p]=1,++pa;
                continue;
            }
            else if (ls==a[pa]-1)//只需要插入a[pa] 
                numb[++p]=0,numa[p]=1,nh.push_back(height[a[pa]]),ls=a[pa],++pa;
            else//否则先插入上一个区间,再插入a[pa]
                nh.push_back(query(ls+1,a[pa]-1)),numb[++p]=0,numa[p]=0,
                numb[++p]=0,numa[p]=1,nh.push_back(height[a[pa]]),ls=a[pa],++pa;
        }
        else{
            if (ls==b[pb]){
                numb[p]=1,++pb;
                continue;
            }
            else if (ls==b[pb]-1)//只需要插入b[pb] 
                numb[++p]=1,numa[p]=0,nh.push_back(height[b[pb]]),ls=b[pb],++pb;
            else//否则先插入上一个区间,再插入b[pb]
                nh.push_back(query(ls+1,b[pb]-1)),numb[++p]=0,numa[p]=0,
                numb[++p]=1,numa[p]=0,nh.push_back(height[b[pb]]),ls=b[pb],++pb;
        }
    while (pa<=la)
        if (ls==a[pa]){
            numa[p]=1,++pa;
            continue;
        }
        else if (ls==a[pa]-1)//只需要插入a[pa] 
            numb[++p]=0,numa[p]=1,nh.push_back(height[a[pa]]),ls=a[pa],++pa;
        else//否则先插入上一个区间,再插入a[pa]
            nh.push_back(query(ls+1,a[pa]-1)),numb[++p]=0,numa[p]=0,
            numb[++p]=0,numa[p]=1,nh.push_back(height[a[pa]]),ls=a[pa],++pa;
    while (pb<=lb)
        if (ls==b[pb]){
            numb[p]=1,++pb;
            continue;
        }
        else if (ls==b[pb]-1)//只需要插入b[pb] 
            numb[++p]=1,numa[p]=0,nh.push_back(height[b[pb]]),ls=b[pb],++pb;
        else//否则先插入上一个区间,再插入b[pb]
            nh.push_back(query(ls+1,b[pb]-1)),numb[++p]=0,numa[p]=0,
            numb[++p]=1,numa[p]=0,nh.push_back(height[b[pb]]),ls=b[pb],++pb;
    for (int i=1;i<=p;++i)
        id[i]=i,numa[i]+=numa[i-1],numb[i]+=numb[i-1];
    sort(id+1,id+p+1,cmp);
    set<int> st;
    st.insert(0),st.insert(p+1);
    LL ans=0;
    for (int i=1;i<=p;++i){
        int &t=id[i];
        set<int>::iterator mn1=st.lower_bound(t);
        --mn1;
        int mn=*mn1,mx=*(st.upper_bound(t));
        ans+=(numa[t]-numa[mn])*1LL*(numb[mx-1]-numb[t-1])*nh[t];
        st.insert(t);
    }
    for(int i=1;i<=la;++i)
        a[i]-=1;
    return ans;
}
int main(){
    int n,q;
    scanf("%d%d",&len,&q); 
    scanf("%s",s+1);
    Suffix(),getheight();
    initst(len);
    for (int z=1;z<=q;++z){
        int k,l,c;
        scanf("%d%d",&k,&l);
        for (int i=1;i<=k;++i)
            scanf("%d",&a[i]);
        for (int i=1;i<=l;++i)
            scanf("%d",&b[i]);
        LL ans=0;
        int p1=1,p2=1;
        while (p1<=k&&p2<=l)//统计相等的 
            if (a[p1]==b[p2])
                ans+=len-a[p1]+1,++p1,++p2;
            else if (a[p1]>b[p2])
                ++p2;
            else
                ++p1;
        for (int i=1;i<=k;++i)
            a[i]=rk[a[i]];
        for (int i=1;i<=l;++i)
            b[i]=rk[b[i]];
        sort(a+1,a+k+1);
        sort(b+1,b+l+1);
        ans=ans+sum(a,k,b,l)+sum(b,l,a,k);
        printf("%lld\n",ans);
    }
    return 0;
}

Guess you like

Origin www.cnblogs.com/diorvh/p/11821333.html