【CodeForces】528D Fuzzy Search

Topic description

Leonid works for a small and promising start-up that works on decoding the human genome. His duties include solving complex problems of finding certain patterns in long strings consisting of letters ‘A’, ‘T’, ‘G’ and ‘C’.
Let’s consider the following scenario. There is a fragment of a human DNA chain, recorded as a string S. To analyze the fragment, you need to find all occurrences of string T in a string S. However, the matter is complicated by the fact that the original chain fragment could contain minor mutations, which, however, complicate the task of finding a fragment. Leonid proposed the following approach to solve this problem.
Let’s write down integer k ≥ 0 — the error threshold. We will say that string T occurs in string S on position i (1 ≤ i ≤ |S| - |T| + 1), if after putting string T along with this position, each character of string T corresponds to the some character of the same value in string S at the distance of at most k. More formally, for any j (1 ≤ j ≤ |T|) there must exist such p (1 ≤ p ≤ |S|), that |(i + j - 1) - p| ≤ k and S[p] = T[j].
For example, corresponding to the given definition, string “ACAT” occurs in string “AGCAATTCAT” in positions 2, 3 and 6.
write picture description here
Note that at k=0 the given definition transforms to a simple definition of the occurrence of a string in a string.
Help Leonid by calculating in how many positions the given string T occurs in the given string S with the given error threshold.

Topic

Given two strings S and T, find out where T "matches" in S. S[i] "matches" T[j] means that there is S[x]==T[j] (jk≤x≤j+k).

input format

The first line contains three integers |S|,|T|,k (1 ≤|T| ≤|S|≤ 200 000, 0≤k ≤200000) — the lengths of strings S and T and the error threshold.
The second line contains string S.
The third line contains string T.
Both strings consist only of uppercase letters ‘A’, ‘T’, ‘G’ and ‘C’.

output format

Print a single number — the number of occurrences of T in S with the error threshold k by the given definition.

sample input

10 4 1
AGCAATTCAT
ACAT

Sample output

3


answer

Considering that there are only ACGT4 letters, we can discuss them separately.
Seems to be able to be discussed together.


code

#include <cstdio>
#include <cmath>
#include <complex>
#include <algorithm>
#include <iostream>
#define db double
#define cd complex<db>
using namespace std;
const int Q=1048576;
const double pi=3.1415926535897932;
cd w[Q],ti[Q];
int n;
void fly(cd a[],int flag)
{
    int i,j,l,now;
    for(i=j=0;i<n;i++){
        if(i<j)swap(a[i],a[j]);
        for(l=(n>>1);(j^=l)<l;l>>=1);
    }
    w[0]=1;
    for(now=1;now<n;now<<=1)
    {
        cd ha=exp(cd(0,pi*(db)flag/(db)now));
        for(i=1;i<now;i++)w[i]=w[i-1]*ha;
        for(j=0;j<n;j+=(now<<1))
            for(l=0;l<now;l++)
            {
                cd p=a[j+l],q=a[now+j+l]*w[l];
                a[j+l]=p+q,a[j+l+now]=p-q;
            }
    }
    if(flag==1)return;
    cd temp=1.0/(db)n;
    for(i=0;i<n;i++)a[i]*=temp;
}
void FFT(cd a[],cd b[])
{
    for(int i=0;i<n;i++)ti[i]=b[i];
    fly(a,1),fly(ti,1);
    for(int i=0;i<n;i++)a[i]*=ti[i];
    fly(a,-1);
}
int gg(cd x)
{return (int)floor(x.real()+0.5);}
cd a[5][Q],b[5][Q],ano[Q];
int main()
{
    char o;
    int ans=0,i,c,d,k,j;
    scanf("%d%d%d",&c,&d,&k);
    for(n=1;n<=max(c+2*k,c+d);n<<=1);
    for(i=0;i<n;i++)
        for(j=1;j<=4;j++)
            a[j][i]=b[j][i]=0;
    for(i=1;i<=c;i++)
        while(true){
            o=getchar();
            if(o=='A'){
                a[1][i]=1;
                break;
            }
            if(o=='T'){
                a[2][i]=1;
                break;
            }
            if(o=='G'){
                a[3][i]=1;
                break;
            }
            if(o=='C'){
                a[4][i]=1;
                break;
            }
        }
    for(i=1;i<=d;i++)
        while(true){
            o=getchar();
            if(o=='A'){
                b[1][d-i+1]=1;
                break;
            }
            if(o=='T'){
                b[2][d-i+1]=1;
                break;
            }
            if(o=='G'){
                b[3][d-i+1]=1;
                break;
            }
            if(o=='C'){
                b[4][d-i+1]=1;
                break;
            }
        }
    for(i=0;i<=2*k;i++)ano[i]=1;
    for(i=2*k+1;i<n;i++)ano[i]=0;
    for(i=1;i<=4;i++){
        FFT(a[i],ano);
        for(j=1;j<=c;j++)
            if(gg(a[i][j+k])>0)a[i][j]=1;
            else a[i][j]=0;
        for(j=c+1;j<n;j++)a[i][j]=0;
        FFT(a[i],b[i]);
    }
    for(i=1;i+d<=c+1;i++)
        if(gg(a[1][i+d])+gg(a[2][i+d])+gg(a[3][i+d])+gg(a[4][i+d])==d)ans++;
    printf("%d",ans);
    return 0;
}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325651662&siteId=291194637