Training little cats

Time Limit: 2000MS Memory Limit: 65536K
Total Submissions: 16336 Accepted: 3998

Description

Facer’s pet cat just gave birth to a brood of little cats. Having considered the health of those lovely cats, Facer decides to make the cats to do some exercises. Facer has well designed a set of moves for his cats. He is now asking you to supervise the cats to do his exercises. Facer’s great exercise for cats contains three different moves:
g i : Let the ith cat take a peanut.
e i : Let the ith cat eat all peanuts it have.
s i j : Let the ith cat and jth cat exchange their peanuts.
All the cats perform a sequence of these moves and must repeat it m times! Poor cats! Only Facer can come up with such embarrassing idea.
You have to determine the final number of peanuts each cat have, and directly give them the exact quantity in order to save them.

Input

The input file consists of multiple test cases, ending with three zeroes “0 0 0”. For each test case, three integers n, m and k are given firstly, where n is the number of cats and k is the length of the move sequence. The following k lines describe the sequence.
(m≤1,000,000,000, n≤100, k≤100)

Output

For each test case, output n numbers in a single line, representing the numbers of peanuts the cats have.

Sample Input

3 1 6
g 1
g 2
g 2
s 1 2
g 3
e 2
0 0 0

Sample Output

2 0 1

题目大概意思：

给出一个长度为 $n(n≤10^2)$ 的整数序列，初始值为 $0$ ，进行 $k(k≤10^2)$ 次操作，操作分为 $3$ 种：

令第 $i$ 个数的值增加 $1$
令第 $i$ 个数的值变为 $0$
令第 $i$ 个数与第 $j$ 个数的值互换

将这 $k$ 次操作依次执行并循环 $m(m≤10^9)$ 遍，求最终序列种每个数的值。

分析：

先考虑 $m=1$ 的简单情况，只需模拟出整个操作过程，即可在 $O(k)$ 的时间复杂度内得出结果。可 $m$ 的值高达 $10^9$ ，如果简单模拟，则时间复杂度高达 $O(mk)$ ，无法在时间限制内得出答案。接下来我们考虑如何降低时间复杂度：

扫描二维码关注公众号，回复： 11164639 查看本文章

如果我们可以把原序列变看作一个列向量 $v$ ，把每次操作对序列的修改转化为 $v$ 左乘一个矩阵 $M$ ，那么最终结果就可以表示为：

$(\prod_{i=1}^{m}{(\prod_{j=1}^{k}M_j)})·v$

对于第 $2$ 种操作和第 $3$ 种操作，容易发现它们分别对应于初等矩阵中的 $D_i{(0)}$ 与 $P_{i,j}$ ，可对于第一种操作，由于最初的列向量 $v$ 是零向量，而任何矩阵乘以零向量得到的结果仍是零向量，因此，我们必须向 $v$ 中引入非 $0$ 项来解决这一问题。容易发现，如果我们令 $v_0=1$ ，则第 $1$ 种操作就可以用初等矩阵 $T_{0,j}(1)$ 来表示了。于是，最终的结果就可以表示成这样的形式：

$(\prod_{i=1}^{m}{(\prod_{j=1}^{k}M_j)})·v=(\prod_{i=1}^{m}{(\prod_{j=1}^{k}M_j)})· \left[ \begin{matrix} 1\\ 0\\ 0\\ \vdots\\ 0\\ \end{matrix} \right]$

可是如果直接按照此式进行计算，则需要进行 $mk$ 次矩阵乘法，但根据矩阵乘法的结合律，如果我们可以先算出 $\prod_{j=1}^{k}M_j=R$ 则只需计算 $R^m$ ，而 $R^m$ 可以运用矩阵快速幂算法，进行 $O(log_2{m})$ 次矩阵乘法计算出。

可是每次矩阵乘法的时间复杂度是 $O(n^3)$ ，这样的话，计算 $R^m$ 的时间复杂度就达到了 $O(n^3·\log_2m)$ ，计算 $R$ 的时间复杂度更是高达 $O(n^3k)$ ，加之题目是多组数据，还是无法在时间限制内解决问题。

我们考虑一下 $R$ 的结构。初始时， $R$ 是单位矩阵 $E_{n+1}$ ，对于第 $2$ 种操作时左乘一个初等矩阵 $D_i{(0)}$ 和第 $3$ 种操作时左乘一个初等矩阵 $P_{i_j}$ ，不会使非零元增多，而第 $1$ 种操作时左乘一个初等矩阵 $T_{0,j}(1)$ ，只可能使矩阵的第 $0$ 列的非零元增多。因此，最终矩阵 $R$ 中的非零元不会超过 $2n+1$ 个。

考虑到 $R$ 的大部分元素都是 $0$ ，是一个稀疏矩阵，那么是否有办法根据这一性质来加速矩阵的乘法过程呢？答案是有的：

我们先来看矩阵乘法的公式：

$(AB)_{i,j}=\sum_{k=1}^{n}{A_{i,k}B_{k,j}}$

矩阵 $A$ 中的每个元素 $A_{i,k}$ 都会出现 $n$ 次，如果 $A_{i,j}$ 不是零元，则会进行 $n$ 次对乘法结果有贡献的计算，否则，将会进行 $n$ 次对乘法结果无影响的运算。

如果式中 $A$ 是稀疏矩阵，那么这个和式中将会有很多对乘法结果无影响的元素，如果我们能够避免计算这些元素，就可以加速稀疏矩阵的乘法了。

对于 $A$ 中的非零元素 $A_{i,k}$ ，它只会对元素 $\{(AB)_{i,j}|j≤n\}$ 产生贡献，因此我们可以改变循环的嵌套关系，只关心那些会对矩阵乘法结果产生贡献的 $A$ 中的元素，对于 $A$ 中的零元，则不再计算它们。

在这道题中，还可以证明 $R$ 的幂仍然是稀疏矩阵，这样一来，枚举矩阵 $R$ 中的元素是 $O(n^2)$ 的时间复杂度，而由于只对非零元进行 $n$ 次运算，故优化后的矩阵乘法的时间复杂度为 $O(n^2+vn)$ ，其中 $v$ 是 $R$ 中的非零元个数。由于 $R$ 中的非零元个数是 $O(n)$ ，故矩阵乘法的复杂度将降低为 $O(n^2)$ .

于是，算法的总时间复杂度就降低为了 $O(n^2k+n^2·\log_2{m})$ .

事实上，由于左乘初等矩阵只会改变矩阵中的常数个行，即 $O(n)$ 个元素，因此在计算 $R$ 时，只需根据初等矩阵的性质将矩阵中的对应的行进行修改即可。这样，计算 $R$ 的时间复杂度就降低为 $O(nk)$ 了，而总复杂度也降低为 $O(nk+n^2·\log_2{n})$ ，完全可以在时间限制内解决问题。

下面贴代码：

#include <cstdio>
#include <cstring>
using namespace std;

typedef long long ll;

const int MAX_N = 102;

struct Mat
{
	ll v[MAX_N][MAX_N];
};

void mult(Mat& t, const Mat& s, const int n);
void qpow(Mat& t, int n, int p);

ll tmpmem[MAX_N];

int main()
{
	Mat R;
	int n, m, k, x, y;
	char ch;

	while (~scanf("%d%d%d", &n, &m, &k) && n)
	{
		memset(&R, 0, sizeof(R));
		for (int i = 0; i <= n; ++i)
		{
			R.v[i][i] = 1;
		}
		while (k--)
		{
			scanf(" \n%c", &ch);
			switch (ch)
			{
			case 'g':
				scanf("%d", &x);
				++R.v[x][0];
				break;
			case 'e':
				scanf("%d", &x);
				memset(R.v[x], 0, sizeof(R.v[x]));
				break;
			case 's':
				scanf("%d%d", &x, &y);
				memcpy(tmpmem, R.v[x], sizeof(R.v[x]));
				memcpy(R.v[x], R.v[y], sizeof(R.v[x]));
				memcpy(R.v[y], tmpmem, sizeof(R.v[y]));
				break;
			}
		}
		qpow(R, n, m);
		for (int i = 1; i <= n; ++i)
		{
			printf("%lld%c", R.v[i][0], i == n ? '\n' : ' ');
		}
	}
	return 0;
}

// 优化的稀疏矩阵乘法
void mult(Mat& t, const Mat& s, const int n)
{
	static Mat m;

	memset(&m, 0, sizeof(m));
	for (int i = 0; i <= n; ++i)
	{
		for (int k = 0; k <= n; ++k)
		{
			if (t.v[i][k])
			{
				for (int j = 0; j <= n; ++j)
				{
					m.v[i][j] += t.v[i][k] * s.v[k][j];
				}
			}
		}
	}
	memcpy(&t, &m, sizeof(t));
}

// 将矩阵t赋值为t的p次幂
void qpow(Mat& t, int n, int p)
{
	Mat tmp;
	memcpy(&tmp, &t, sizeof(tmp));
	memset(&t, 0, sizeof(t));
	for (int i = 0; i <= n; ++i)
	{
		t.v[i][i] = 1;
	}

	while (p)
	{
		if (p & 1)
		{
			mult(t, tmp, n);
		}
		mult(tmp, tmp, n);
		p >>= 1;
	}
}

xhxhxhxhx

原创文章 42 获赞 22 访问量 3043

关注私信

POJ3735 Training little cats - 矩阵快速幂 -稀疏矩阵乘法优化