Tensorflow的CNN卷积

1.卷积

tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)

input是一个四维的tensor，其shape为(batch, in_height, in_width, in_channels）具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，注意这是一个4维的Tensor，要求类型为float32

filter是卷积核kernel也是一个四维的tensor，其shape为（filter_height, filter_width, in_channels, out_channels）具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]，比如kernel的shape = （7,7,64,128）里面的元素就是权重总共7x7x64x128个参数，可以为任意大小，要求类型与参数input相同。

strides是一个长度为4的一维向量，不是一个tensor，每个元素大小对应在图像每一维的步长。因为对图像只在 in_height, in_width,这两个维度设置strides，所以strides = [1, stride, stride, 1]。里面的stride为我们待设置的步长

padding：只能是"SAME","VALID"其中之一。.默认填充方式为Valid也就是不填0.

当选择SAME模式的时候，输出的大小=cell ( h/s, w/s )，只和strides有关

当选择VALID模式的时候，输出大小=floor( (h-f_h)/s +1, (w-f_w)/s +1 )不仅和strides有关而且和filter大小有关.

use_cudnn_on_gpu:bool类型，默认是True,使用cudnn加速

返回一个四维的tensor

2.池化

tf.nn.max_pool(value, ksize, strides, padding, name=None)

value：是一个四维的tensor，需要池化的输入，一般池化层接在卷积层后面，所以输入通常是feature map，依然是[batch, height, width, channels]这样的shape，注意这里数据类型必须是tf.float32，不然会出错

ksize：池化窗口的大小，是一个向量不是一个tensor，一般是[1, height, width, 1]没有参数，因为我们不想在batch和channels上做池化，所以这两个维度设为了1

strides：是一个四维向量不是一个tensor，和卷积类似，窗口在每一个维度上滑动的步长，一般也是[1, stride,stride, 1]

padding：和卷积类似，只可以取'VALID' 或者'SAME'.

当选择SAME模式的时候，输出的大小=cell ( h/s, w/s )，只和strides有关

当选择VALID模式的时候，输出大小=floor( (h-f_h)/s +1, (w-f_w)/s +1 )不仅和strides有关而且和filter大小有关.

返回一个四维的Tensor，类型不变，shape仍然是[batch, height, width, channels]这种形式

从参数可以看出来，返回的tensor的chanel和输入保持不变，w和h只和strides有关，其他参数不会改变他们的大小。

3.正则化dropout

tf.nn.dropout(x, keep_prob, noise_shape=None, seed=None, name=None)

tf.nn.dropout是TensorFlow里面为了防止或减轻过拟合而使用的函数，它一般用在全连接层，卷积部分一般不会用到dropout,输出层也不会使用dropout。Dropout就是在不同的训练过程中随机扔掉一部分神经元。也就是让某个神经元的激活值以一定的概率p，让其停止工作，这次训练过程中不更新权值，也不参加神经网络的计算。但是它的权重得保留下来（只是暂时不更新而已），因为下次样本输入时它可能又得工作了。但在测试及验证中：每个神经元都要参加运算，但其输出要乘以概率p。

x：输入的tensor，类型为tf.float32

keep_prob: float32类型的常数，每个元素被保留下来的概率

其他参数基本不用

返回一个tensor

x=tf.nn.dropout(x,0.5)

4.增加偏差

tf.nn.bias_add(value, bias, data_format=None, name=None)

value：是一个tensor，数据类型`float`, `double`, `int64`, `int32`, `uint8`, `int16`, `int8`, `complex64`, or `complex128`.

bias：也是一个tensor，但是是一维的，它的长度等于value的最后一维的元素个数。使用broadcast方法。数据类型和value保持一致

data_format: A string. 'NHWC' and 'NCHW' are supported.

name：A name for the operation (optional).

Returns: A `Tensor` with the same type as `value`.包括数据类型和shape

tf.add(x,y,name=None)

x: 是一个tensor，数据类型和上面的一样

y：也是一个tensor，数据类型和x保持一致，使用图broadcast方法。与上面的区别在于上面是这个函数的特例，这里的y的维度可以和x最后一维的元素个数不一致

name: A name for the operation (optional).

上面两个函数的区别可以看一下代码

import tensorflow as tf
 
a=tf.constant([[1,1],[2,2],[3,3]],dtype=tf.float32) # shape= （3,2）shape是两维的，最后一维有两个元素 
b=tf.constant([1,-1],dtype=tf.float32)                # shape是一维的，只有两个元素，等于上面最后一维的元素个数
c=tf.constant([1],dtype=tf.float32)                  #  shape 是一维的，只有一个元素
 
with tf.Session() as sess:
    print('bias_add:')
    print(sess.run(tf.nn.bias_add(a, b)))
    #执行下面语句错误,因为c的元素个数不等于a最后一维的大小也就是2
    #print(sess.run(tf.nn.bias_add(a, c)))
 
    print('add:')
    print(sess.run(tf.add(a, c)))

运行结果如下：

#输出结果
bias_add:
[[ 2. 0.]
[ 3. 1.]
[ 4. 2.]]
add:
[[ 2. 2.]
[ 3. 3.]

[ 4. 4.]]

tf.add_n(inputs,name=None)

函数是实现一个列表的元素的相加。就是输入的对象是一个列表，列表里的元素可以是向量，矩阵等但没有广播功能

例子：

import tensorflow as tf;  
import numpy as np;  
  
input1 = tf.constant([1.0, 2.0, 3.0])  
input2 = tf.Variable(tf.random_uniform([3]))  
output = tf.add_n([input1, input2])  
  
with tf.Session() as sess:  
    sess.run(tf.initialize_all_variables())  
    print (sess.run(input1 + input2))  
    print (sess.run(output))
# 输出
[ 1.30945706  2.29760814  3.81558323]
[ 1.30945706  2.29760814  3.81558323]

5.占位符

placeholder，占位符，在tensorflow中类似于函数参数，运行时必须传入值。

tf.placeholder(dtype, shape=None, name=None)

dtype：数据类型。常用的是tf.float32,tf.float64等数值类型

shape：数据形状。默认是None，就是一维值，也可以是多维，比如[2,3], [None, 3]表示列是3，行不定

name：名称。

返回： tensor

例子：

x = tf.placeholder(tf.float32, shape=(1024, 1024))
y = tf.matmul(x, x) # 矩阵的乘法
 
with tf.Session() as sess:
  print(sess.run(y))  # ERROR: 此处x还没有赋值.
 
  rand_array = np.random.rand(1024, 1024)
  print(sess.run(y, feed_dict={x: rand_array}))  # Will succeed.

6.反卷积

tf.nn.conv2d_transpose(value, filter, output_shape, strides, padding="SAME", data_format="NHWC", name=None)

除去name参数用以指定该操作的name，与方法有关的一共六个参数：
第一个参数value：指需要做反卷积的输入图像，它要求是一个Tensor
第二个参数filter：卷积核，它要求是一个Tensor，具有[filter_height, filter_width, out_channels, in_channels]这样的shape，具体含义是[卷积核的高度，卷积核的宽度，卷积核个数，图像通道数]
第三个参数output_shape：反卷积操作输出的shape，细心的同学会发现卷积操作是没有这个参数的，那这个参数在这里有什么用呢？下面会解释这个问题
第四个参数strides：反卷积时在图像每一维的步长，这是一个一维的向量，长度4
第五个参数padding：string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式
第六个参数data_format：string类型的量，'NHWC'和'NCHW'其中之一，这是tensorflow新版本中新加的参数，它说明了value参数的数据格式。'NHWC'指tensorflow标准的数据格式[batch, height, width, in_channels]，'NCHW'指Theano的数据格式,[batch, in_channels，height, width]，当然默认值是'NHWC'

开始之前务必了解卷积的过程，参考我的另一篇文章：http://blog.csdn.net/mao_xiao_feng/article/details/53444333
首先定义一个单通道图和3个卷积核

注意：tf.nn.conv2d中的filter参数，是[filter_height, filter_width, in_channels, out_channels]的形式，而tf.nn.conv2d_transpose中的filter参数，是[filter_height, filter_width, out_channels，in_channels]的形式，注意in_channels和out_channels反过来了！因为两者互为反向，所以输入输出要调换位置.

卷积操作
x1.shape = [4,7,7,3]
k1.shape = [3,3,3,21]
y = tf.nn.conv2d(x1,k1,strides=[1,2,2,1],padding='SAME')
y.shape = [4,4,4,21]
反卷积操作
y.shape = [4,4,4,21]
k2.shape = [3,3,7,21]
x2 = tf.nn.conv2d_transpose(y,k2,strides=[1,2,2,1],output_shape=[4,7,7,7],padding="SAME")
x2.shape = [4,7,7,7]
在这里注意反卷积的k最后两个参数，对于反卷积的k可以理解为和out_shape进行卷积时的参数排布。

看起来，tf.nn.conv2d_transpose的output_shape似乎是多余的，因为知道了原图，卷积核，步长显然是可以推出输出图像大小的，那为什么要指定output_shape呢？因为，例如shape分别为[1，6，6，3]和[1，5，5，3]的图经过卷积得到了相同的大小，[1，3，3，1]。那么[1，3，3，1]的图反卷积后得到什么呢？产生了两种情况。所以这里指定output_shape是有意义的，当然随意指定output_shape是不允许的，会报错。当卷积核大于原图时，默认用VALID方式（用SAME就无意义了）

1.卷积

2.池化

`tf.nn.max_pool(value, ksize, strides, padding, name=None)`

3.正则化dropout

4.增加偏差

tf.nn.bias_add(value, bias, data_format=None, name=None)

tf.add(x,y,name=None)

tf.add_n(inputs,name=None)

5.占位符

6.反卷积

tf.nn.conv2d_transpose(value, filter, output_shape, strides, padding="SAME", data_format="NHWC", name=None)

猜你喜欢