DNA sequence HDU - 1560 (IDA*)

DNA sequence

HDU - 1560

The twenty-first century is a biology-technology developing century. We know that a gene is made of DNA. The nucleotide bases from which DNA is built are A(adenine), C(cytosine), G(guanine), and T(thymine). Finding the longest common subsequence between DNA/Protein sequences is one of the basic problems in modern computational molecular biology. But this problem is a little different. Given several DNA sequences, you are asked to make a shortest sequence from them so that each of the given sequence is the subsequence of it.

For example, given "ACGT","ATGC","CGTT" and "CAGT", you can make a sequence in the following way. It is the shortest but may be not the only one.


Input The first line is the test case number t. Then t test cases follow. In each case, the first line is an integer n ( 1<=n<=8 ) represents number of the DNA sequences. The following k lines contain the k sequences, one per line. Assuming that the length of any sequence is between 1 and 5. Output For each test case, print a line containing the length of the shortest sequence that can be made from these sequences. Sample Input
1
4
ACGT
ATGC
CGTT
CAGT
Sample Output
8
题意就是给出N个DNA序列,要求出一个包含这n个序列的最短序列是多长

code:

#include <iostream>
#include <cstdio>
#include <cstring>
#include <algorithm>
using namespace std;
int n,deep;
char c[10] = "ACGT";
struct node{
    char s[10];
    int len;
}a[10];
int pos[10];//记录第i个序列正在使用第几个位置
//具体来说也就是对第i的字符串比如ACGT,pos[i]=2代表前两个字符AC已经匹配好了,下次匹配要看第三个字符了
//其实它也代表了每个串已经匹配了的长度
int get_h(){
    int ans = 0;
    for(int i = 1; i <= n; i++){
        ans = max(ans,a[i].len-pos[i]);//找出当前情况下最长的未被匹配的长度为估测长度
    }
    return ans;
}
int dfs(int step){
    if(step + get_h() > deep)//当前长度+估测长度比deep还大的话就没有继续往下搜索的必要了
        return 0;
    if(!get_h())
        return 1;
    int temp[10];
    for(int i = 0; i < 4; i++){
        int flag = 0;
        for(int j = 1; j <= n; j++){
            temp[j] = pos[j];//先将pos保存起来
        }
        for(int j = 1; j <= n; j++){
            if(a[j].s[pos[j]] == c[i]){//如果这个串的这个位置匹配上了c[i]字符,那么说明可以指向下一位字符了,下次匹配这个字符串的时候看它的下一位字符了
                flag = 1;
                pos[j]++;
            }
        }
        if(flag){//这个字符串有符合的字符,就继续往下搜,尽量是一个字符串全匹配完
           if(dfs(step+1))
              return 1;
           for(int j = 1; j <= n; j++){//回溯还原pos
              pos[j] = temp[j];
           }

        }
    }
    return 0;
}
int main(){
    int t,maxn;
    cin >> t;
    while(t--){
        cin >> n;
        maxn = 0;
        for(int i = 1; i <= n; i++){
            cin >> a[i].s;
            a[i].len = strlen(a[i].s);
            maxn = max(maxn,a[i].len);
            pos[i] = 0;
        }
        deep = maxn;
        while(1){
            if(dfs(0)) break;
            deep++;
        }
        cout << deep << endl;
    }
    return 0;
}

模拟一下样例的话就是

红色即代表pos数组代表的位置 

  初始        1            2             3            4              5             6              7              8

ACGT     ACGT      ACGT      ACGT      ACGT      ACGT       ACGT       ACGT      ACGT 

ATGC     ATGC      ATGC      ATGC      ATGC      ATGC       ATGC       ATGC      ATGC 

CGTT     CGTT       CGTT      CGTT       CGTT      CGTT        CGTT       CGTT       CGTT 

CAGT     CAGT      CAGT      CAGT      CAGT      CAGT       CAGT       CAGT       CAGT  

1:ans:A

2:ans:AC

3:ans:ACA

4:ans:ACAG

5:ans:ACAGT

6:ans:ACAGTG

7:ans:ACAGTGC

8:ans:ACAGTGCT

长度为8

猜你喜欢

转载自blog.csdn.net/codeswarrior/article/details/80329633