2016 7th Blue Bridge Cup Java Programming Undergraduate Group B Final Base (Programming Topics)

2016 7th Blue Bridge Cup Java Programming Undergraduate Group B Final Individual Problem Solution Summary:

https://blog.csdn.net/daixinliangwyx/article/details/90169154

 

Question 5

Title: Base

Test address: https://www.docpp.com/oj/problem1835.html

Biologists are working on n species.
The DNA sequence of the i-th species is s[i], and the j-th base is s[i][j], and the base must be one of A, T, G, and C.
Biologists want to find some commonalities in some of these organisms, and they are now focusing on those contiguous sequences of bases of length k that occur in at least m organisms. To be precise, the sequence that scientists care about is represented by a 2m tuple (i1,p1,i2,p2....im,pm), which
satisfies:
1<=i1<i2<....<im<=n;
and For all q(0<=q<k), s[i1][p1+q]=s[i2][p2+q]=....=s[im][pm+q].

Now, given the DNA sequences of all living things, tell the scientist how many 2m-tuples are of interest. Two 2m-tuples are considered different tuples if any of their positions differ.

[Input format]
The first line of input contains three integers n, m, k, and the two integers are separated by a space. The meaning is as described in the title.
Next n lines, each line contains a string representing the DNA sequence of an organism.
DNA sequences are numbered from 1 to n, and the bases in each sequence are numbered sequentially from 1. The length of DNA sequences of different organisms may vary.

[Output format]
Output an integer, indicating the number of concerned tuples.
The answer may be large, and you need to output the remainder of dividing the answer by 1000000007.

【Sample input】
3 2 2
ATC
TCG
ACG

[Sample output]
2

Another example:
【Sample input】
4 3 3
AAA
AAAA
AAA
AAA

[Sample output]
7


[Data scale and convention]
For 20% of the data, k<=5, the total length L of all strings satisfies L<=100
for 30% of the data, L<=10000
for 60% of the data, L<=30000
for 100% data, n<=5, m<=5, 1<=k<=L<=100000
to ensure that all DNA sequences are not empty and only contain four letters 'A' 'G' 'C' 'T'

Resource convention:
peak memory consumption < 256M
CPU consumption < 1000ms

Please output strictly according to the requirements, and do not superficially print superfluous content like: "Please enter...".

All code is placed in the same source file, after debugging, copy and submit the source code.
Note: Do not use the package statement. Do not use features of jdk1.7 and above.
Note: The name of the main class must be: Main, otherwise it will be processed as invalid code.


Solution: The meaning of the title should be well understood, that is, how many combinations of DNA sequences are there. In each combination, each DNA sequence contains the same k-long continuous base sequence substring (the position of the substring in the same DNA sequence is different. The same types will not be counted as only one type, see the explanation of Example 2 below for details).

For example, example 1: These two combinations are (1,2), (2,3): 1 string of TC and 2 strings of TC, 2 strings of CG and 3 strings of CG.

Another example is Example 2: These 7 combinations are (1,2,3), (1,2,4), (1,3,4), (1,2,3), (1,2,4) , (2,3,4), (2,3,4), please explain these 7 kinds in detail. As for the repeated combinations here, it is because the positions of the same substrings of k length are different in a certain string:

(1,2,3): 1 string of "AAA", 2 strings of "AAA" at [0,3], and 3 strings of "AAA";

(1,2,3): "AAA" for 1 string, "AAA" for 2 strings [1,4], and "AAA" for 3 strings;

(1,2,4): 1 string of "AAA", 2 strings of "AAA" at [0,3], and 4 strings of "AAA";

(1,2,4): "AAA" for 1 string, "AAA" for 2 strings [1,4], and "AAA" for 4 strings;

(2,3,4): 2 strings of "AAA" at [0,3], 3 strings of "AAA", 4 strings of "AAA";

(2,3,4): 2 strings of "AAA" at [1,4], 3 strings of "AAA", 4 strings of "AAA";

(1,3,4): 1 string of "AAA", 3 strings of "AAA", 4 strings of "AAA".

If it is done, the range of n and m is very small, and a violent search is enough. First, m DNA sequences are found, and then the first DNA sequence in the m is taken and traversed to find a substring with a length of k, and look at the following m- Whether a DNA sequence contains this substring can be counted.

Code:

import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.*;

public class Main {
    public static InputReader in = new InputReader(new BufferedInputStream(System.in));
    public static PrintWriter out = new PrintWriter(System.out);
    public static int n, m, k;
    public static long ans, tmp, mod = 1000000007;
    public static String str;
    public static String[] s = new String[10];
    public static int[] a = new int[10];

    public static void main(String[] args) {
        n = in.nextInt();
        m = in.nextInt();
        k = in.nextInt();
        for (int i = 1; i <= n; i++)
            s[i] = in.nextLine();
        ans = 0;
        dfs(1, 1);
        out.println(ans%mod);
        out.flush();
        out.close();
    }

    static void dfs(int kk, int p) {
        if (kk > m) {
            int len = s[a[1]].length();
            for (int i = 0; i < len-k+1; i++) {
                str = s[a[1]].substring(i, i+k);
                tmp = 1;
                for (int j = 2; j <= m; j++) {
                    tmp = (tmp * getStrCount(s[a[j]], str)) % mod;
                    if (tmp == 0) break;//这些DNA序列里遇到有不包含str子序列的,后面的DNA序列就不需要继续查找了,直接break
                }
                ans = (ans + tmp) % mod;
            }
            return;
        }
        for (int i = p; i <= n; i++) {
            a[kk] = i;
            dfs(kk+1, i+1);
        }
    }

    static long getStrCount(String s1, String s2) {
        long sum = 0;
        String tmps = s1;
        int index = tmps.indexOf(s2);
        while (index != -1) {
            sum++;
            tmps = tmps.substring(index+1);
            index = tmps.indexOf(s2);
        }
        return sum;
    }

    static class InputReader {
        public BufferedReader reader;
        public StringTokenizer tokenizer;

        public InputReader(InputStream stream) {
            reader = new BufferedReader(new InputStreamReader(stream), 32768);
            tokenizer = null;
        }

        public String next() {
            while (tokenizer == null || !tokenizer.hasMoreTokens()) {
                try {
                    tokenizer = new StringTokenizer(reader.readLine());
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
            }
            return tokenizer.nextToken();
        }

        public String nextLine() {
            String str = null;
            try {
                str = reader.readLine();
            } catch (IOException e) {
                e.printStackTrace();
            }
            return str;
        }

        public int nextInt() {
            return Integer.parseInt(next());
        }

        public long nextLong() {
            return Long.parseLong(next());
        }

        public Double nextDouble() {
            return Double.parseDouble(next());
        }

        public BigInteger nextBigInteger() {
            return new BigInteger(next());
        }

        public BigDecimal nextBigDecimal() {
            return new BigDecimal(next());
        }

    }
}

Evaluation results:

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326491790&siteId=291194637