BZOJ4650 [NOI2016] Excellent split [suffix array]

topic

If a string can be split into the form AABBAABB, where AA and BB are any non-empty strings, we say that splitting of the string
is excellent. For example, for the string aabaabaa, if A=aabA=aab, B=aB=a, we find
a way to split the string into AABBA ABB. A string may not have a good split, or there may be more than one good split. For example, if we make A=aA=a, B=baa
B=baa, the above string can also be represented by AABBAABB; however, the string abaabaa does not have excellent splitting. Now given a string SS of length
nn, we need to find the total number of excellent splits in all splits of all its substrings. A substring here refers
to a contiguous segment of a string. The following matters need attention: the same substrings appearing in different positions are considered to be different substrings, and their excellent splits will be
credited to the answer. In a split, A=BA=B is allowed. For example cccc exists split A=B=cA=B=c. The string itself is also a substring of it.

input format

Each input file contains multiple sets of data. The first line of the input file has only one integer TT, which represents the number of groups of data. Guarantee 1≤T≤101≤T≤10. The next
line TT, each line contains a string SS consisting of only English lowercase letters, the meaning is as described in the title.

output format

Output TT lines, each containing an integer indicating how many of all the splits of all substrings of the string SS are good splits in total.

input sample

4

aabbbb

cccccc

aabaabaabaa

bbaabaababaaba

Sample output

3

5

4

7

hint

We use S[i,j]S[i,j] to denote the substring (counting from 11) from the iith character to the jjth character of the string SS. In the first set of data,

A total of 33 substrings have excellent splits: S[1,4]=aabbS[1,4]=aabb, excellent splits are A=aA=a, B=bB=b; S[3,6]= bbbbS[3,6]

=bbbb, excellent splits are A=bA=b, B=bB=b; S[1,6]=aabbbbS[1,6]=aabbbb, excellent splits are A=aA=a, B=bbB =bb. and remaining

The substring below does not have a good split, so the answer for the first set of data is 33. In the second set of data, there are two types, a total of 44 substrings with excellent

Split: For substring S[1,4]=S[2,5]=S[3,6]=ccccS[1,4]=S[2,5]=S[3,6]=cccc, Their excellent splits are the same, both are A=cA=c,

B=cB=c, but since the positions of these substrings are different, it needs to be calculated 33 times; for the substring S[1,6]=ccccccS[1,6]=cccccc, it is an excellent split

There are 22 types: A=cA=c, B=ccB=cc and A=ccA=cc, B=cB=c, which are different splits of the same substring, and are also included in the answer. So the second group

The answer for the data is 3+2=53+2=5. In the third set of data, S[1,8]S[1,8] and S[4,11]S[4,11] each have 22 excellent splits, of which S[1

,8]S[1,8] is the example in the problem description, so the answer is 2+2=42+2=4. In the fourth set of data, S[1,4]S[1,4], S[6,11]S[6,11], S[7

,12]S[7,12], S[2,11]S[2,11], S[1,8]S[1,8] each has 11 excellent splits, S[3,14] S[3,14] has 22 excellent splits,

So the answer is 5+2=75+2=7.

answer

We set \(f[i]\) to be the number of \(AA\) strings ending with \(i\) Let \(g[i]\) be \(AA\ starting with \(i\) ) number of strings

那么
\[ans = \sum\limits_{i = 2}^{n - 2} f[i] * g[i + 1]\]

So we just need to find all \(AA\) strings
. According to the routine of the suffix array, in order to find all \(AA\) strings, we enumerate the length of \ (A\) \(L\) , and then every \(L\) Set up a monitoring point, as shown in the figure: The circled one is the substring of \(len = 3\)

governed by the monitoring point in the middle So, if there is a \(2 * L\) of length \(len = 3 \ ) (AA\) string, then there must be one and only one adjacent monitoring point in the adjacent \(A\) , we will enumerate the two adjacent monitoring points, compare their lcp and the previous lcp size, You can determine which strings they govern and those that can match

Specifically, find SA once for the positive and negative strings respectively [or combine them together], so that \(O(1)\) ask lcp
and then use a differential array to maintain \(f[i]\) and \(g[i] \)
The final statistical answer is done

Time complexity \(O(nlogn + \sum\limits_{L = 1}^{n} \frac{n}{L}) = O(nlogn + n * \sum\limits_{L = 1}^{n } \frac{1}{L}) = O(nlogn)\)

#include<iostream>
#include<cstdio>
#include<cmath>
#include<cstring>
#include<algorithm>
#define LL long long int
#define Redge(u) for (int k = h[u],to; k; k = ed[k].nxt)
#define REP(i,n) for (int i = 1; i <= (n); i++)
#define cls(s) memset(s,0,sizeof(s))
using namespace std;
const int maxn = 100005,maxm = 100005,INF = 1000000000;
inline int read(){
    int out = 0,flag = 1; char c = getchar();
    while (c < 48 || c > 57){if (c == '-') flag = -1; c = getchar();}
    while (c >= 48 && c <= 57){out = (out << 3) + (out << 1) + c - 48; c = getchar();}
    return out * flag;
}
char s[maxn];
int N,n,m,sa[maxn],rank[maxn],height[maxn],t1[maxn],t2[maxn],bac[maxn];
int mn[maxn][18],bin[30],Log[maxn];
void getsa(){
    int *x = t1,*y = t2; m = 1000;
    for (int i = 0; i <= m; i++) bac[i] = 0;
    for (int i = 1; i <= n; i++) bac[x[i] = s[i]]++;
    for (int i = 1; i <= m; i++) bac[i] += bac[i - 1];
    for (int i = n; i; i--) sa[bac[x[i]]--] = i;
    for (int k = 1; k <= n; k <<= 1){
        int p = 0;
        for (int i = n - k + 1; i <= n; i++) y[++p] = i;
        for (int i = 1; i <= n; i++) if (sa[i] - k > 0) y[++p] = sa[i] - k;
        for (int i = 0; i <= m; i++) bac[i] = 0;
        for (int i = 1; i <= n; i++) bac[x[y[i]]]++;
        for (int i = 1; i <= m; i++) bac[i] += bac[i - 1];
        for (int i = n; i; i--) sa[bac[x[y[i]]]--] = y[i];
        swap(x,y);
        x[sa[1]] = p = 1;
        for (int i = 2; i <= n; i++)
            x[sa[i]] = (y[sa[i]] == y[sa[i - 1]] && y[sa[i] + k] == y[sa[i - 1] + k] ? p : ++p);
        if (p >= n) break;
        m = p;
    }
    for (int i = 1; i <= n; i++) rank[sa[i]] = i;
    for (int i = 1,k = 0; i <= n; i++){
        if (k) k--;
        int j = sa[rank[i] - 1];
        while (s[i + k] == s[j + k]) k++;
        height[rank[i]] = k;
    }
    for (int i = 1; i <= n; i++) mn[i][0] = height[i];
    REP(j,17) REP(i,n){
        if (i + bin[j] - 1 > n) break;
        mn[i][j] = min(mn[i][j - 1],mn[i + bin[j - 1]][j - 1]);
    }
}
int lcp(int a,int b){
    int l = rank[a],r = rank[b];
    if (l > r) swap(l,r); l++;
    int t = Log[r - l + 1];
    return min(mn[l][t],mn[r - bin[t] + 1][t]);
}
int pre_lcp(int a,int b){
    int l = rank[N - a + 1],r = rank[N - b + 1];
    if (l > r) swap(l,r); l++;
    int t = Log[r - l + 1];
    return min(mn[l][t],mn[r - bin[t] + 1][t]);
}
LL f[maxn],g[maxn];
void solve(){
    memset(f,0,sizeof(f));
    memset(g,0,sizeof(g));
    for (int L = 1; L <= (n >> 1); L++){
        for (int a = L,b = a + L,l,r,lenl,lenr,len; b <= n; a += L,b += L){
            lenl = min(pre_lcp(a,b),L);
            lenr = min(lcp(a,b),L);
            len = lenl + lenr - 1;
            l = a - lenl + 1; r = l + len - L;
            if (l <= r) g[l]++,g[r + 1]--;
            l = b - lenl + L; r = l + len - L;
            if (l <= r) f[l]++,f[r + 1]--;
        }
    }
    REP(i,n) g[i] += g[i - 1],f[i] += f[i - 1];
    //REP(i,n) printf("%lld",f[i]); puts("");
    //REP(i,n) printf("%lld",g[i]); puts("");
    LL ans = 0;
    for (int i = 2; i < n - 1; i++){
        ans += f[i] * g[i + 1];
    }
    printf("%lld\n",ans);
}
int main(){
    bin[0] = 1; for (int i = 1; i <= 25; i++) bin[i] = bin[i - 1] << 1;
    Log[0] = -1; for (int i = 1; i < maxn; i++) Log[i] = Log[i >> 1] + 1;
    int T = read();
    while (T--){
        cls(s); cls(t1); cls(t2);
        scanf("%s",s + 1); n = strlen(s + 1);
        s[n + 1] = '#';
        for (int i = 1; i <= n; i++) s[n + 1 + i] = s[n - i + 1];
        N = n = n << 1 | 1;
        getsa();
        n >>= 1;
        solve();
    }
    return 0;
}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324941006&siteId=291194637