2018ICPC北京区域 Approximate Matching(AC自动机+DP)

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/Cymbals/article/details/84143547

String matching, a common problem in DNA sequence analysis and text editing, is to find the occurrences of one certain string (called pattern) in a larger string (called text). In some cases, the pattern is not required to be exactly in the text, and minor differences are acceptable (due to possible typing mistakes). When given a pattern string and a text string, we say pattern P is approximately matched within text S, if there is a substring of S which is at most one letter different from P. Note that the length of this substring and the pattern must be identical. For example, pattern “abb” is approximately matched in text “babc” but not matched in “bbac”.

It is easy to check if a pattern is approximately matched in a text. So your task is to count the number of all text strings of length m in which the given pattern can be approximately matched, and both of the patterns and texts are binary strings in order not to handle big integers.

Input
The first line of input is a single integer T (1 ≤ T ≤ 666), the number of test cases. Each test case begins with a line of two integers n,m (1 ≤ n,m ≤ 40), denoting the length of pattern string and text string. Then a single line of binary string P follows, which denotes the pattern. Note that there will be at most 15 test cases in which n ≥ 16.

Output
For each test case, output a single line with one integer, representing the answer.

给出了一个“相似”的概念:两个串最多有一个位置不同称之为相似,然后给出一个长度为n的串,问有多少个长度为m的串可以取出一个长度为n的子串,与给出串相似,并且说明了给出的是01串。

据说是一类模板题,我是第一次见,这题刷新了我对ac自动机的印象,ac自动机nb!

由于是01串,而且串长极小(40),因此可以考虑暴力枚举与给出串有一个位置不同的串,建立ac自动机,然后再在这上面dp。

之所以可以这样做,是利用了字典树的性质:只要加入字典树的两个串有任何一个位置不同,他们的终点一定不同,所以dp过程中只要碰到终态,就可以对结果+1。

选用ac自动机是因为他建成trie图之后前进(走next)和后退(跳fail)有着极强的统一性,拿来dp非常好写。实际上用广义Sam应该也是可以dp的,但是比较难写。

然后有一个神奇的操作,可以对ac自动机上每一个串的终态对dp数组里的一个不会用到的地方连边,在dp过程中,碰到终态的就全部会汇总到这里。

最后是dp[i][j],i表示第要求串的长度,j是ac自动机的每个状态,一开始dp[0][root] = 1,把这个1往后推,推到最后就会因为终态的连边跳到答案收集点。

不过这样搞可能会把自动机的next边搞出环,在getFail跑bfs时会死循环,要判一下。

ac自动机,没有fail也能跑的自动机.jpg

#include<bits/stdc++.h>
using namespace std;
typedef long long ll;

const int maxn = 50005;
int t, n, m;
char s[maxn];

struct AC_Automaton {
	int next[maxn][2];
	int fail[maxn];
	ll dp[41][2005];
	int sz, root;

	int newNode() {
		for(int i = 0; i < 2; i++) {
			next[sz][i] = -1;
		}
		fail[sz] = -1;
		return sz++;
	}

	void init() {
		sz = 1;
		memset(dp, 0, sizeof(dp));
		root = newNode();
		next[0][0] = next[0][1] = 0;
	}

	void add() {
		int p = root, c;
		for(int i = 0, len = strlen(s); i < len; i++) {
			c = s[i] - '0';
			if(i == len - 1) {
				next[p][c] = 0;
				return;
			}
			if(next[p][c] == -1) {
				next[p][c] = newNode();
			}
			p = next[p][c];
		}
	}

	void getFail() {
		queue<int> q;
		fail[root] = root;
		for(int i = 0; i < 2; i++) {
			if(~next[root][i]) {
				fail[next[root][i]] = root;
				q.push(next[root][i]);
			} else {
				next[root][i] = root;
			}
		}
		while(!q.empty()) {
			int p = q.front();
			q.pop();
			for(int i = 0; i < 2; i++) {
				if(~next[p][i]) {
					fail[next[p][i]] = next[fail[p]][i];
					if(next[p][i]) {
						q.push(next[p][i]);
					}
				} else {
					next[p][i] = next[fail[p]][i];
				}
			}
		}
	}

	void build() {
		init();
		add();
		char xorer = '0' ^ '1';
		for(int i = 0; s[i]; i++) {
			s[i] ^= xorer;
			add();
			s[i] ^= xorer;
		}
		getFail();
	}

	void solve() {
		build();
		ll ans = 0;
		dp[0][root] = 1;
		for(int i = 0; i < m; i++) {
			for(int j = 0; j < sz; j++) {
				dp[i + 1][next[j][0]] += dp[i][j];
				dp[i + 1][next[j][1]] += dp[i][j];
			}
		}
		printf("%lld\n", dp[m][0]);
	}

} ac;

int main() {
	scanf("%d", &t);
	while(t--) {
		scanf("%d%d%s", &n, &m, s);
		ac.solve();
	}
	return 0;
}

猜你喜欢

转载自blog.csdn.net/Cymbals/article/details/84143547