Tensorflow卷积神经网络之conv1d和conv2d的解读笔记

1. 问题

在看《Tensorflow实战google深度学习框架》的卷积神经网络章节的时候。文中提到tensorflow卷积层的接口 conv1d和conv2d等;而且示例也没看懂,和预想的理论计算的结果不一致。接下来,分析tensorflow的conv1d和conv2d的源码。

2. conv1d

2.1 source code

def conv1d(value, filters, stride, padding,
           use_cudnn_on_gpu=None, data_format=None,
           name=None):
  r"""Computes a 1-D convolution given 3-D input and filter tensors.

  Given an input tensor of shape
    [batch, in_width, in_channels]
  if data_format is "NHWC", or
    [batch, in_channels, in_width]
  if data_format is "NCHW",
  and a filter / kernel tensor of shape
  [filter_width, in_channels, out_channels], this op reshapes
  the arguments to pass them to conv2d to perform the equivalent
  convolution operation.

  Internally, this op reshapes the input tensors and invokes `tf.nn.conv2d`.
  For example, if `data_format` does not start with "NC", a tensor of shape
    [batch, in_width, in_channels]
  is reshaped to
    [batch, 1, in_width, in_channels],
  and the filter is reshaped to
    [1, filter_width, in_channels, out_channels].
  The result is then reshaped back to
    [batch, out_width, out_channels]
  \(where out_width is a function of the stride and padding as in conv2d\) and
  returned to the caller.

  Args:
    value: A 3D `Tensor`.  Must be of type `float32` or `float64`.
    filters: A 3D `Tensor`.  Must have the same type as `input`.
    stride: An `integer`.  The number of entries by which
      the filter is moved right at each step.
    padding: 'SAME' or 'VALID'
    use_cudnn_on_gpu: An optional `bool`.  Defaults to `True`.
    data_format: An optional `string` from `"NHWC", "NCHW"`.  Defaults
      to `"NHWC"`, the data is stored in the order of
      [batch, in_width, in_channels].  The `"NCHW"` format stores
      data as [batch, in_channels, in_width].
    name: A name for the operation (optional).

  Returns:
    A `Tensor`.  Has the same type as input.

  Raises:
    ValueError: if `data_format` is invalid.
  """
  with ops.name_scope(name, "conv1d", [value, filters]) as name:
    # Reshape the input tensor to [batch, 1, in_width, in_channels]
    if data_format is None or data_format == "NHWC":
      data_format = "NHWC"
      spatial_start_dim = 1
      strides = [1, 1, stride, 1]
    elif data_format == "NCHW":
      spatial_start_dim = 2
      strides = [1, 1, 1, stride]
    else:
      raise ValueError("data_format must be \"NHWC\" or \"NCHW\".")
    value = array_ops.expand_dims(value, spatial_start_dim)
    filters = array_ops.expand_dims(filters, 0)
    result = gen_nn_ops.conv2d(value, filters, strides, padding,
                               use_cudnn_on_gpu=use_cudnn_on_gpu,
                               data_format=data_format)
    return array_ops.squeeze(result, [spatial_start_dim])

在接口形参中:
1)value:是输入层的tensor张量,它的shape维度为[batch, in_width, in_channels];batch个人理解的是机器学习中的批处理,不知道对不对;in_width是输入层张量的width,也可称为这层神经元的个数;in_channels是输入层张量的通道数,也可称作为每个神经元的向量空间维度(不知道理解的对不对)。比如某一层有个3个神经元,每个神经元的通道数为2; a = [a1,a2,a3]
[
[1,-1],
[-1,2],
[0,2]
]
2)filters:是卷积层的滤波器。它的shape维度为[filter_width, in_channels, out_channels];filter_width是滤波器矩阵的行数;in_channels是滤波器的矩阵的列数;out_channels是输出通道数(这块儿还没弄明白)。
3)stride:滤波器移动步数,每次做运算的时候。
4)padding:运算时的填充方式。有两个值,SAME表示添加全0,且具体的添加方式为Right和Bottom处;非Left和Top处;VALID表示不填充。

以上,形参说明中,已经对输入的input tensor和滤波器filter作了一定的要求。

2.2 示例

import tensorflow as tf
import numpy as np

M = np.array([
        [1,-1,0],
        [-1,2,1],
        [0,2,-2]
    ])
M = np.asarray(M, dtype='float32')
M = M.reshape(1, 3, 3)
kernel = np.array([
    [1, -1, 0],
    [0, 2, 0]
])
kernel = np.asarray(kernel, dtype='float32')
kernel = kernel.reshape(2, 3, 1)
conv1d = tf.nn.conv1d(M, kernel, 2, 'SAME')

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(conv1d))

结果:
[[[ 6.]
[-2.]]]

和理论计算的一样:
[[1, -1, 0],
[-1,2,1]]
. *
[
[1, -1, 0],
[0, 2, 0]
]
为1x1+(-1)x(-1)+0x0+0x(-1)+2x2+0x1=6

[[0,2,-2],
[0,0,0]]
.*
[[1, -1, 0],
[0, 2, 0]]
为 0x1+2x(-1)+(-2)x0+0x0+0x2+0x0=-2

3. conv2d

3.1 source code

def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None,
           data_format=None, name=None):
  r"""Computes a 2-D convolution given 4-D `input` and `filter` tensors.

  Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
  and a filter / kernel tensor of shape
  `[filter_height, filter_width, in_channels, out_channels]`, this op
  performs the following:

  1. Flattens the filter to a 2-D matrix with shape
     `[filter_height * filter_width * in_channels, output_channels]`.
  2. Extracts image patches from the input tensor to form a *virtual*
     tensor of shape `[batch, out_height, out_width,
     filter_height * filter_width * in_channels]`.
  3. For each patch, right-multiplies the filter matrix and the image patch
     vector.

  In detail, with the default NHWC format,

      output[b, i, j, k] =
          sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *
                          filter[di, dj, q, k]

  Must have `strides[0] = strides[3] = 1`.  For the most common case of the same
  horizontal and vertices strides, `strides = [1, stride, stride, 1]`.

  Args:
    input: A `Tensor`. Must be one of the following types: `half`, `float32`.
      A 4-D tensor. The dimension order is interpreted according to the value
      of `data_format`, see below for details.
    filter: A `Tensor`. Must have the same type as `input`.
      A 4-D tensor of shape
      `[filter_height, filter_width, in_channels, out_channels]`
    strides: A list of `ints`.
      1-D tensor of length 4.  The stride of the sliding window for each
      dimension of `input`. The dimension order is determined by the value of
        `data_format`, see below for details.
    padding: A `string` from: `"SAME", "VALID"`.
      The type of padding algorithm to use.
    use_cudnn_on_gpu: An optional `bool`. Defaults to `True`.
    data_format: An optional `string` from: `"NHWC", "NCHW"`. Defaults to `"NHWC"`.
      Specify the data format of the input and output data. With the
      default format "NHWC", the data is stored in the order of:
          [batch, height, width, channels].
      Alternatively, the format could be "NCHW", the data storage order of:
          [batch, channels, height, width].
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `input`.
    A 4-D tensor. The dimension order is determined by the value of
    `data_format`, see below for details.
  """
  result = _op_def_lib.apply_op("Conv2D", input=input, filter=filter,
                                strides=strides, padding=padding,
                                use_cudnn_on_gpu=use_cudnn_on_gpu,
                                data_format=data_format, name=name)
  return result

在接口形参中:
1)input:神经网络某层的神经元 tensor 张量。其shape是[batch, in_height, in_width, in_channels],batch参考conv1d的介绍;in_height是二维张量的高,即行数;in_width是二维张量的宽,即列数;in_channels神经元的通道数。
2)filter:神经网络某卷积层的滤波器。其shape是[filter_height, filter_width, in_channels, out_channels],filter_height是滤波器的行数;filter_width是滤波器的列数;in_channels是滤波器的通道数。
3)strides:卷积的步长;是一个4D向量 [strides[0],strides[1],strides[2],strides[3]]。一般情况下,strides[0] = strides[3] = 1;最普遍的情况是,进行卷积的水平移动和垂直移动步长一样,即strides[1]=strides[2]。
4)padding:参考conv1d的介绍。

3.2 示例

import tensorflow as tf
import numpy as np
M = np.array([
        [1,-1,0],
        [-1,2,1]
    ])
M = np.asarray(M, dtype='float32')
M = M.reshape(1, 2, 3, 1)

kernel = np.array([
    [1, -1],
    [0, 2]
])
kernel = np.asarray(kernel, dtype='float32')
kernel = kernel.reshape(2, 2, 1, 1)
conv2d = tf.nn.conv2d(M, kernel, [1, 1, 1, 1], 'VALID')

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(sess.run(conv2d))

结果:
[[[[ 6.]
[ 1.]]]]
计算方式和conv1d一样。这里的水平和垂直移动步长都是1。

猜你喜欢

转载自blog.csdn.net/duanyuwangyuyan/article/details/108360077