POJ 1743 Musical Theme (suffix array learning + solution to the problem)

Meaning of the title: A piece of music composed of N notes, each note is an integer from 1 to 88, now to find the longest theme, that is, a substring of the sequence of notes, the theme must meet the following three condition:

1. It is composed of at least five notes.
2. The theme appears at least twice in the music (the two themes do not have to be exactly the same, as long as one can be transposed by the other, it will be regarded as the same theme, transposed: sequence of topics The same number is added to or subtracted from each character in
each of the two topics ) 3. There must be no overlapping parts for every two themes.
If there is no requirement for installation, it is obvious that the longest repeating substring that cannot overlap is used, and the suffix array is used It can be done, but we observe the definition of transposition and after some processing, the problem can be transformed into the longest repetitive substring that cannot be repeated. Suppose one substring is a, b, c, d, e. Another substring is a+k, b+k, c+k, d+k, e+k. Obviously they are the same subject, we find that They get the same difference string for every two adjacent characters.
a,b,c,d,e -> ba,cb,dc,ed
a+k,b+k,c+k,d+k -> ba,cb,dc,ed
so the original sequence can be adjacent to each other Combine the difference values ​​to get a new sequence, and then find the suffix array. But because the number of characters is reduced by 1 after being transformed into a new sequence. The theme we are looking for becomes at least four characters, but there is a small problem, such as
1 1 1 1 1 1 1 1 1 ----- >0 0 0 0 0 0 0 0 This obviously does not meet the requirements, so at least one character difference between the two topics is required, so that this problem can be solved

The "non-overlapping longest repeated substring" solution ( taken from Luo Suiqian's "Suffix Array-A Powerful Tool for Processing Strings" ):

Dichotomize the answer first and turn the question into a decisive question: determine whether there are two substrings of length k that are the same and do not overlap. The key to solving this problem is to use the height array (height is explained in the code). Divide the sorted suffixes into several groups, where the height value between the suffixes of each group is not less than k. For example, if the string is "aabaaaab", when k=2, the suffixes are divided into 4 groups. As shown in the figure, it is

easy to see that the two suffixes that have the hope of becoming the longest common prefix no less than k must be in the same group. Then for each group of suffixes, it is only necessary to determine whether the difference between the maximum value and the minimum value of the sa value of each suffix is ​​not less than k. If there is a group that satisfies, it means it exists, otherwise it does not exist. The time complexity of the entire approach is O(nlogn)

#pragma GCC optimize(2)
//#include<bits/stdc++.h>
#include<cstdio>
#include<iostream>
#include<algorithm>
#include<cstring>
using namespace std;

typedef long long ll;
typedef unsigned long ul;
typedef unsigned long long ull;
#define pi acos(-1.0)
#define e exp(1.0)
#define pb push_back
#define mk make_pair
#define fir first
#define sec second
//#define scf scanf
//#define prf printf
typedef pair<ll,ll> pa;
const ll INF=0x3f3f3f3f3f3f3f3f;
const ll maxn=2e5+7;
ll height[maxn],sa[maxn],rank[maxn],tmp[maxn],r[maxn],K,N;
bool cmp(ll i,ll j){
    
    //i,j是后缀开始的位置 
	if(rank[i]!=rank[j])
	return rank[i]<rank[j];
	ll r1=i+K<=N?rank[i+K]:-1;//i+K==N 相当于空串 
	ll r2=j+K<=N?rank[j+K]:-1;
	return r1<r2;
}
//倍增法求后缀数组 
void do_sa(){
    
    
	ll i,j;
	//给rank[]和sa[]数组赋值, 
	for(i=0;i<=N;i++){
    
    
		sa[i]=i;
//		rank[i]=(i==N?-1:r[i]);//包含空串 
		rank[i]=r[i];
	}
	//倍增法求后缀树组 
	for(K=1;K<=N;K<<=1){
    
    
		sort(sa,sa+1+N,cmp);
		tmp[sa[0]]=0;//最小一定是空串 
		for(i=1;i<=N;i++)
		tmp[sa[i]]=tmp[sa[i-1]]+(cmp(sa[i-1],sa[i])?1:0);
		//sa[i]和sa[i-1]的顺序千万别弄反了,和上面的cmp函数的定义是对应的,如果写反,当r1==r2是不等价的 
		for(i=0;i<=N;i++)
		rank[i]=tmp[i];
	}
	return ;
}
//height[i]数组存储从i处开始的后缀字符串和排名小于当相邻的位置j处开始的后缀
//字符串子串的最长公共子串长度 
void get_height(){
    
    
	ll i,j,k=0;
	for(i=0;i<N;i++){
    
    //
		if(k)
		k--;
		else
		k=0;
		j=sa[rank[i]-1];
		while(r[i+k]==r[j+k])
		k++;
		height[rank[i]]=k;
	}
	return ;
}
bool check(ll mid){
    
    
	ll i,j;
	ll maxx=-INF,minn=INF;
	for(i=1;i<=N;i++){
    
    
		if(height[i]>=mid){
    
    
			minn=min(minn,min(sa[i],sa[i-1]));
            maxx=max(maxx,max(sa[i],sa[i-1]));
            if(maxx-minn>mid) return true;
		}
		else{
    
    
			maxx=-INF;
			minn=INF;
		}
	}
	return false;
}
int main()
{
    
    
//  freopen(".../.txt","w",stdout);
//  freopen(".../.txt","r",stdin);
//	ios::sync_with_stdio(false);
	while(scanf("%lld",&N)&&N){
    
    
		ll i,j,k;
//		memset(height,0,sizeof(height));
//		memset(rank,0,sizeof(rank));
//		memset(tmp,0,sizeof(tmp));		
//		memset(sa,0,sizeof(sa));
//		memset(r,0,sizeof(r));
		//求差值串,r[]相当于字符串 
		for(i=0;i<N;i++){
    
    
			scanf("%lld",&r[i]);
			if(i)
			r[i-1]=r[i]-r[i-1]+88;
		}
		N--;
		r[N]=0;
		do_sa();
		get_height();
		ll L=0,R=N/2,mid,res=0;
		while(L<=R){
    
    
			mid=(R-L)/2+L;
			if(check(mid)){
    
    
				L=mid+1;
				res=max(res,mid); 
			}
			else{
    
    
				R=mid-1;
			}
		}
		if(res<4){
    
    
			printf("0\n");
			continue;
		}
		printf("%lld\n",res+1);	
	}
	return 0;
}

The "Challenge Programming Contest" used by the suffix array template used above, the screenshot of the explanation of the suffix array in the book is as follows

、
When the cmp function is called, the incoming order of the two parameters is reversed, which leads to the wrong sorting of suffix substrings, which makes me depressed for a long time...

Guess you like

Origin blog.csdn.net/weixin_43311695/article/details/107645737