tf.nn.conv2d 你不知道的那些事儿

在处理图像类的数据集的时候，每张图片通常是用一个向量存储的，那么此时问题就来了：当我们在reshape的时候，到底该怎么填写维度呢？举个例子吧！

现有两张RGB三通道的图片，假设第一张的三个通道对应的像素值矩阵如下，第二张将（48~95）同样按此排列。

\begin{aligned} R = [\begin{matrix} 0 & 1 & 2 & 3 \\ 4 & 5 & 6 & 7 \\ 8 & 9 & 10 & 11 \\ 12 & 13 & 14 & 15 \end{matrix}] G = [\begin{matrix} 16 & 17 & 18 & 19 \\ 20 & 21 & 22 & 23 \\ 24 & 25 & 26 & 27 \\ 28 & 29 & 30 & 31 \end{matrix}] B = [\begin{matrix} 32 & 33 & 34 & 35 \\ 36 & 37 & 38 & 39 \\ 30 & 41 & 42 & 43 \\ 44 & 45 & 46 & 47 \end{matrix}] \end{aligned}

$\begin{align*} R=\begin{bmatrix} 0&1 & 2 &3\\4&5 & 6&7 \\8&9&10&11\\12&13&14&15 \end{bmatrix} G=\begin{bmatrix} 16&17 & 18 &19\\20&21 & 22&23 \\24&25&26&27\\28&29&30&31 \end{bmatrix} B=\begin{bmatrix} 32&33 & 34 &35\\36&37 & 38&39 \\30&41&42&43\\44&45&46&47 \end{bmatrix} \end{align*}$

我们保存为如下形式：

x = [[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
 [48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
  72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95]]

 #每个向量表示一张图片

同时，有一尺寸为w=[2,2,3,1]的卷积核（为了计算方便，全设为1），则经过卷积计算后结果中的第一个值应该为：

0 * 1 + 1 * 1 + 4 * 1 + 5 * 1 + 74 + 138 = 222

$0*1+1*1+4*1+5*1+74+138=222$

当我们用tf.nn.conv2d来计算卷积的时候，首先要做的就是将向量表示的图片，还原成矩阵形式来表示，那么reshape的方式到底是一下哪种呢？

1. input = x.reshape(2,4,4,3)
2. input = x.reshape(2,3,4,4)

第一感觉，肯定是第一种；因为在处理MNIST数据集的时候就是这样做的。但事实却是，在本文的情境下这是错的。what? 难道是第二种？此处先卖个关子，让我们用代码来证明第一种是错的。

我们知道图片的维度为4,4,3，此处为两张图片，则形状为[2,4,4,2]；经过卷积核为[2,2,3,1](无padding，步长为1)卷积后的形状为[2,3,3,1]

x = np.arange(0, 96, 1).reshape(2, 48)
y = x.reshape(2, 4, 4, 3)
y = tf.convert_to_tensor(y, dtype=tf.float32)
w = tf.constant(value=1, shape=[2, 2, 3, 1], dtype=tf.float32)
conv = tf.nn.conv2d(y, w, padding='VALID', strides=[1, 1, 1, 1])
print(conv)
with tf.Session() as sess:
    print(sess.run(conv))


结果：
Tensor("Conv2D:0", shape=(2, 3, 3, 1), dtype=float32)
[[[[ 102.]
   [ 138.]
   [ 174.]]

  [[ 246.]
   [ 282.]
   [ 318.]]

  [[ 390.]
   [ 426.]
   [ 462.]]]
# 为了排版，省去了第二张图片的结果

可以发现，能进行卷积计算，结果的形状也符合预期，可就是原本为222的地方，现在却变成了102，问题出在哪儿呢？

我们不妨输出两种reshape后的结果来看看(仅输出第一张图)：

y = x.reshape(2, 4, 4, 3)
print(y[0])
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]]

  [[12 13 14]
  [15 16 17]
  [18 19 20]
  [21 22 23]]

 [[24 25 26]
  [27 28 29]
  [30 31 32]
  [33 34 35]]

 [[36 37 38]
  [39 40 41]
  [42 43 44]
  [45 46 47]]]

y = x.reshape(2, 3, 4, 4)
print(y[0])

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]
  [24 25 26 27]
  [28 29 30 31]]

 [[32 33 34 35]
  [36 37 38 39]
  [40 41 42 43]
  [44 45 46 47]]]

发现没有，好像第二种reshape方式才对啊，三个矩阵刚好对应三个通道。事实也确是这样，可偏偏tf.nn.conv不依你。在Tenforflow中，输入tf.nn.conv的图片，必须将通道这个维度放在最后，也就是[4,4,3]这种形式。这下该肿么办？此时np.transpose大喊一声，哪里逃，看我的！

于是正确的写法就该是：
input = tf.reshape(2,3,4,4).transpose(0,2,3,1)

注：0,1,2,3分别表示4个维度，此处表示将第2个维度换到最后，即把第三个通道放到最后一个维度。

x = np.arange(0, 96, 1).reshape(2, 48)
y = x.reshape(2, 3, 4, 4).transpose(0, 2, 3, 1)
print(y.shape)
y = tf.convert_to_tensor(y, dtype=tf.float32)
w = tf.constant(value=1, shape=[2, 2, 3, 1], dtype=tf.float32)
conv = tf.nn.conv2d(y, w, padding='VALID', strides=[1, 1, 1, 1])
print(conv)
with tf.Session() as sess:
    print(sess.run(conv))

结果：

(2, 4, 4, 3)
Tensor("Conv2D:0", shape=(2, 3, 3, 1), dtype=float32)

[[[[222.]
   [234.]
   [246.]]

  [[270.]
   [282.]
   [294.]]

  [[318.]
   [330.]
   [342.]]]]

以上才是正确的reshape姿势！

注：

tf.reshpe,tf.tranpose同np.reshape,np.tranpose ；
在处理MNIST数据集（仅有一个颜色通道）的时候用第一种方式reshape 即可；
在处理CIFAR数据集的时候就需要用本文说得方式了；

随便多分析一下：

x = np.arange(0, 96, 1).reshape(2, 48)
y = x.reshape(2, 3, 4, 4)
print(y[0])

y = x.reshape(2, 3, 4, 4).transpose(0, 2, 3, 1)
print(y[0])

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]
  [24 25 26 27]
  [28 29 30 31]]

 [[32 33 34 35]
  [36 37 38 39]
  [40 41 42 43]
  [44 45 46 47]]]
------------------------------------------
[[[ 0 16 32]
  [ 1 17 33]
  [ 2 18 34]
  [ 3 19 35]]

 [[ 4 20 36]
  [ 5 21 37]
  [ 6 22 38]
  [ 7 23 39]]

 [[ 8 24 40]
  [ 9 25 41]
  [10 26 42]
  [11 27 43]]

 [[12 28 44]
  [13 29 45]
  [14 30 46]
  [15 31 47]]]

可以看到，第一种形式[3,4,4]每个矩阵表示一个颜色通道；第二种形式[4,4,3] ，每一列表示一个颜色通道，一共三列

tf.nn.conv2d 你不知道的那些事儿

猜你喜欢