String series-SA

Retire before the first year of high school.

example

UOJ#35. Suffix sorting
This is a template question.

Read in a length of nnn is a string composed of lowercase English letters. Please sort all non-empty suffixes of this string in lexicographic order from smallest to largest, and then output the position of the first character of the suffix in the original string in order. Position number is1 11 tonnn

In addition, in order to further prove that you do have the superpower to sort suffixes, please output n − 1 n-1n. 1 integers indicate the length of the longest common prefix suffix adjacent sorted.

Input format
One line, one length is nnn is a string containing only lowercase English letters.

Output format
First line nnn integers,iii integers indicate the rank isiiThe position of the first character of the suffix of i in the original string.

Second row n − 1 n-1n1 integer,iii integers indicate the rank isiii and rank isi + 1 i + 1i+The length of the longest common prefix of the suffix of 1 .

Sample 1
input
ababa

output
5 3 1 4 2
1 3 0 2

The
result of explanation after sorting is:

a
aba
ababa
ba
baba
Restrictions and Conventions
1 ≤ n ≤ 1 0 5 1 \leq n \leq 10^51n105
time limit:1 s 1\texttt{s}1 s
space limit:256 MB 256\texttt{MB}256MB

TO

A thing that doesn’t use a lot of eggs,
SA can do, and SAM can basically do it, except for the O(1) suffix LCP,
but there are some problems that can’t be done just like that.

Some definitions:
st: string
rank[i]: lexicographical ranking of st[i...n] (obviously different)
sa[i]: the first letter of rank i (inverse array of rank), that is, sa [rank[i]]=i
hi[i] (ie height): LCP length of st[sa[i-1]…n] and st[sa[i]…n], hi[i]=h[sa [i]]
h[i]: LCP length of st[sa[rank[i]-1]…n] and st[i…n], h[i]=hi[rank[i]]
h[i] That is to say, the suffix ending in i and the LCP length of the previous rank of i

rank

The first is the method of rank.
Violence must be used to distribute stars. Consider multiplying and seeking ranks.
Insert picture description here
Each time the adjacent length is 2 k segments and two segments of rank, use a two-dimensional bucket row to find a new rank (heavier, but in the end it must not be heavy). )
Specific: First rank the ones place, then rank the tenth place, because of the nature of the adjacency list, it is necessary to mention the opposite

height

It is inconvenient to directly find the height, so the introduction of the
h array has a very obvious and important property:

h[i]≥h[i-1]-1

Proof:
Insert picture description here
Obviously h[i] is at least h[i-1] and delete i-1, that is, at least h[i-1]-1 is
simple and natural

With this property, the h array can be calculated linearly, and then the height can be calculated
(it is possible that sa[rank[i]-1]>i, so neither of them can exceed the boundary)

The nature of height

The nature of height can be used to make trouble
with the LCP=min(height[i+1…j]) of st[sa[i]…n] and st[sa[j]…n].
Proof:
Let st[sa[i] …N] and st[sa[j]…n] LCP length is x, then x≥min(height[i+1…j])
If x>min(height[i+1…j]), then and The lexicographic order is continuously violated, so x≤min(height[i+1…j]) In
summary, x=min(height[i+1…j])


Corresponding to the original string, the LCP length of st[i...n] and st[j...n] is min(height[rank[i]+1...rank[j]])(rank[i]<rank[j] )
However, this question does not use this nature
to fill in the hole later

code

#include <algorithm>
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <cstdio>
#define fo(a,b,c) for (a=b; a<=c; a++)
#define fd(a,b,c) for (a=b; a>=c; a--)
#define max(a,b) (a>b?a:b)
#define min(a,b) (a<b?a:b)
using namespace std;

int n,i,j,k,l,len;
int A[100001];
int pre[100001];
int Ls[100001];
int st[100001];
int h[100001]; //h[i]=hi[rank[i]]
int hi[100001]; //the LCP of sa[i-1] and sa[i]   hi[i]=h[sa[i]]
int rank[200001];
int sa[200001];
int Rank[200001];
int Bz[26];
char ch;

int main()
{
    
    
//	freopen("a.in","r",stdin);
//	freopen("b.out","w",stdout);
//	freopen("UOJ35.in","r",stdin);
	
	ch=getchar();
	while (ch>='a' && ch<='z')
	{
    
    
		st[++n]=ch-'a';
		ch=getchar();
		
		Bz[st[n]]=1;
	}
	
	fo(i,0,25)
	Bz[i]+=Bz[i-1];
	
	fo(i,1,n)
	rank[i]=Bz[st[i]];
	
	k=1;
	while (k<=n)
	{
    
    
		fo(i,1,n)
		{
    
    
			pre[i]=Ls[rank[i+k]];
			Ls[rank[i+k]]=i;
		}
		l=n;
		fd(i,n,0)
		{
    
    
			while (Ls[i])
			{
    
    
				A[l--]=Ls[i];
				Ls[i]=pre[Ls[i]];
			}
		}
		
		fo(i,1,n)
		{
    
    
			pre[A[i]]=Ls[rank[A[i]]];
			Ls[rank[A[i]]]=A[i];
		}
		l=n;
		fd(i,n,0)
		{
    
    
			while (Ls[i])
			{
    
    
				A[l--]=Ls[i];
				Ls[i]=pre[Ls[i]];
			}
		}
		
		j=0;
		fo(i,1,n)
		Rank[i]=rank[i];
		fo(i,1,n)
		{
    
    
			if (i==1 || Rank[A[i]]!=Rank[A[i-1]] || Rank[A[i]+k]!=Rank[A[i-1]+k])
			++j;
			
			rank[A[i]]=j;
		}
		
		k+=k;
	}
	fo(i,1,n)
	sa[rank[i]]=i;
	
	fo(i,1,n)
	if (rank[i]>1)
	{
    
    
		h[i]=max(h[i-1]-1,0);
		
		while (i+h[i]-1<n && sa[rank[i]-1]+h[i]-1<n && st[i+h[i]]==st[sa[rank[i]-1]+h[i]])
		++h[i];
		
		if (h[i] && st[i+h[i]-1]!=st[sa[rank[i]-1]+h[i]-1])
		--h[i];
	}
	
	fo(i,2,n)
	hi[i]=h[sa[i]];
	
	fo(i,1,n)
	printf("%d ",sa[i]);
	printf("\n");
	fo(i,2,n)
	printf("%d ",hi[i]);
	printf("\n");
}

Reference

https://www.cnblogs.com/heyujun/p/10300582.html
https://www.cnblogs.com/cjyyb/p/8335194.html
https://blog.csdn.net/cold_chair/article/details/62909232

Guess you like

Origin blog.csdn.net/gmh77/article/details/100049069