On the suffix array SA

This blog is not going to say how detailed, online blog about suffix array is much better than I speak, this blog post is to deepen the impression I wrote it myself.

To share with you so much, let me selfish for a while it ~

Reference: The dalao's blog

First, on a number of variables required SuffixArray definition:

1. sa [i] = j, i-th name suffixes start from j

** ** index is stored

2. rnk [i] = j, i from the beginning of the suffix j is the first name

Sa ** is the reciprocal operation, a value stored * *

3. tp [i] = j, the second keyword begins with suffix i j

** understood as the second keyword SA, it is stored subscript **

Insert explain the first and second keywords Keywords:

We want to sort all suffixes, how to row it?

Initially, we suffix of each character stored only its own, so it's the size of the suffix is its ASCII code.

We each character as i-tuple (S [i], i), if we direct throw pair <int, int> which then std :: sort,

Such a time complexity of O (log ^ 2 n), is clearly not good enough.

So you need to use radix sort RadixSort, do not understand their own Baidu.

Reuse multiplication method, we can make our sorting time complexity is reduced to O (logn).

So we have to sort the first two letters of each suffix, the first letter of the relative relationship has been.

The second letter of the i-th suffix, is the first i + 1 suffix in the first letter, we take advantage of this relationship relative relationship of the second letter will know.

Our tp array is used to record it, rnk [i] represents the last round in the i-th suffix rankings.

To quote the immortal attack here, I think the talk is in place:

For a lengthw suffix, you can understand the image of:

Before the first keyword for W 2 string formed of characters, the keywords for the secondW2 character string formed

String sorting then the first four letters of each suffix consisting of the first eight, the first 16 ... SA which is seeking a process of doubling method.

RadixSort given code:

void RadixSort(int a[],int b[]){//基数排序 
    for(int i=0;i<=m;i++)tax[i]=0;
    for(int i=1;i<=n;i++)tax[a[i]]++;
    for(int i=1;i<=m;i++)tax[i]+=tax[i-1];
    for(int i=n;i>=1;i--)sa[tax[a[b[i]]]--]=b[i];
}

RadixSort really can not understand it does not matter, the code is very short

Then give the code requirements of SA:

bool cmp(int *r,int a,int b,int k){
    return r[a]==r[b]&&r[a+k]==r[b+k];
}
void getSA(int a[],int b[]){
    for(int i=1;i<=n;i++)
        m=max(m,a[i]=s[i]-'0'),b[i]=i;
    RadixSort(a,b);
    for(int p=0,j=1;p<n;j<<=1,m=p){
        p=0;
        for(int i=1;i<=j;i++)b[++p]=n-j+i;
        for(int i=1;i<=n;i++)if(sa[i]>j)b[++p]=sa[i]-j;
        RadixSort(a,b);
        int *t=a;a=b;b=t;
        a[sa[1]]=p=1;
        for(int i=2;i<=n;i++)
            to [the [i]] = cmp (b, [i], the [i- 1 ], j) p ++ p;
    }
}

On the interpretation of the code, there is time to refill the pit. This algorithm konjac still a lot to learn ... SA on a rough understanding of what good

Guess you like

Origin www.cnblogs.com/light-house/p/11784966.html