Distinct Substrings (suffix array)

Title

Find the number of different substrings of a string

Ideas

The prefix substring of all suffix substrings of the string is the number of all substrings of the modified string. The total number of substrings=len*(len+1)/2, then how to find the repeated substrings? That is, the sum of the height array, each substring must be the prefix of a certain suffix, then the original problem is equivalent to finding the number of different prefixes between all suffixes.

The following is excerpted from the URL of Luo Suiqian's "Suffix Array-A Powerful Tool for String Processing" paper on github

If all suffixes are calculated in the order of suffix(sa[1]), suffix (sa[2]), suffix(sa[3]),…, suffix (sa[n]), it is not difficult to find out. For each new The added suffix suffix(sa[k]), it will generate n-sa[k]+1 new prefixes. But there is height [k] which is the same as the prefix of the preceding string. So suffix(sa[k]) will "contribute" n-sa[k]+1- height [k] different substrings. After adding up, it is the answer to the original question. The time complexity of this approach is 0(n)


#pragma GCC optimize(2)
#include<bits/stdc++.h>

using namespace std;

typedef long long ll;
typedef unsigned long ul;
typedef unsigned long long ull;
#define pi acos(-1.0)
#define e exp(1.0)
#define pb push_back
#define mk make_pair
#define fir first
#define sec second
#define scf scanf
#define prf printf
typedef pair<ll,ll> pa;
//const ll INF=0x3f3f3f3f3f3f3f3f;
const ll maxn=2e4+7;
ll T,N,K,rank[maxn],sa[maxn],height[maxn],tmp[maxn];
string s;
bool cmp(ll i,ll j){
    
    
	if(rank[i]!=rank[j])
	return rank[i]<rank[j];
	ll r1=i+K<=N?rank[i+K]:-1;
	ll r2=j+K<=N?rank[j+K]:-1;
	return r1<r2;
}
void do_sa(){
    
    
	ll i,j;
	for(i=0;i<=N;i++){
    
    
		sa[i]=i;
		rank[sa[i]]=(i!=N?s[i]:-1);
	}
	for(K=1;K<=N;K<<=1){
    
    
		sort(sa,sa+1+N,cmp);
		tmp[sa[0]]=0;
		for(i=1;i<=N;i++){
    
    
			tmp[sa[i]]=tmp[sa[i-1]]+(cmp(sa[i-1],sa[i])?1:0);
		}
		for(i=0;i<=N;i++)
		rank[i]=tmp[i];
	}
	return ;
}
void get_height(){
    
    
	ll i,j,k=0;
	for(i=0;i<N;i++){
    
    
		if(k)
		k--;
		else
		k=0;
		j=sa[rank[i]-1];
		while(s[i+k]==s[j+k])
		k++;
		height[rank[i]]=k;
	}
	return ;
}
int main()
{
    
    
//  freopen(".../.txt","w",stdout);
//  freopen(".../.txt","r",stdin);
	ios::sync_with_stdio(false);
	cin>>T;
	ll i,j,k;
	while(T--){
    
    
		cin>>s;
		N=s.length();
		do_sa();
		get_height();
		ll res=N*(N+1)/2;
		for(i=1;i<=N;i++)
//		cout<<height[i]<<' ';
		res-=height[i];
		cout<<res<<endl;
	}
	return 0;
}

Guess you like

Origin blog.csdn.net/weixin_43311695/article/details/107675926