卷积神经网络原理及实现

卷积神经网络的主要结构是卷积层＋池化层，该算法在图像上有较好的效果

小知识：图片有彩色图片和黑白图片，颜色都是有RGB三种颜色调和而成，所以彩色图片有三层通道，黑白图片有一层通道

咱们拿黑白图片说事：

简单来讲一个图片可以看作是一个矩阵24*24的，来一个卷积核( 这个是自己指定大小数值随机的小矩阵，假如2*2)，与前面那个24*24的相乘，先与24*24左上角2*2的小矩阵相乘，得出一个数值作为这个过程结果矩阵的左上角数值，之后卷积和往右移动(移动的不长stride自己设置一般设置1或2)，不断重复之间操作(卷积核小矩阵从图片的左上角一直滑动到右下角)，这个过程叫做一次卷积过程，这个过程得到的结果是一个矩阵，为了降低维度，采用池化操作，一般采用均值池化或最大池化，假如你采取2*2平均池化，前面的卷积结果为4*4 ，2*2池化的意思就是在4*4的左上角2*2当中取均值当作结果的左上角结果，右上角2*2矩阵的均值作为结果的右上角值，左下角和右下角同理，如果采用最大值池化就是取最大值不是取均值；假如池化得到的结果是2*2，之后用tf.reshape给它变成flat，就是变成1维的( 原来是2*2，这回编程1*4)，之后作为全连接神经网络的输入得到分类结果

这里面主要学习的参数就是卷积核，不断通过反向传递学习卷积核里面参数，知道结果收敛或达到实现设定好的阈值，上面说的只是进行一个卷积池化操作而已，还可以在后面再添加卷积池化操作，之后连接全连接层

比较详细的解释可以参考帖子：

https://blog.csdn.net/laingliang/article/details/53073591

https://blog.csdn.net/qq_33414271/article/details/79337141

代码：

#encoding='utf-8'
"""
Description:以cifar10_input数据为例,
cifar数据和代码下载地址：git clone https://github.com/tensorflow/model.git
代码下载／model/tutorials/image/CIFAR10文件夹即是操作区域，建一个.py文件
.py文件里面代码写上：
#encoding='utf-8'
import cifar10
cifar10.maybe_download_and_extract()
运行文件，即可获取数据
"""
import cifar10_input
import tensorflow as tf
import numpy as np
batch_size = 128
data_dir = '/tmp/cifar10_data/cifar-10-batches-bin'
print("begin")
images_train,labels_train = cifar10_input.inputs(eval_data = False,
data_dir = data_dir,
batch_size = batch_size)
images_test,labels_test = cifar10_input.inputs(eval_data = True,
data_dir = data_dir,
batch_size = batch_size)
print("begin data")
def weight_variable(shape):
initial = tf.truncated_normal(shape,stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1,shape=shape)
return tf.Variable(initial)
def conv2d(x,w):
return tf.nn.conv2d(x,w,strides=[1,1,1,1],padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
def avg_pool_6x6(x):
return tf.nn.avg_pool(x,ksize=[1,6,6,1],strides=[1,6,6,1],padding='SAME')
x = tf.placeholder(tf.float32,[None,24,24,3])
y = tf.placeholder(tf.float32,[None,10])
w_conv1 = weight_variable([5,5,3,64])
b_conv1 = bias_variable([64])
x_image = tf.reshape(x,[-1,24,24,3])
h_pool1 = max_pool_2x2(tf.nn.relu(conv2d(x_image,w_conv1))+b_conv1)
w_conv2 = weight_variable([5,5,64,64])
b_conv2 = bias_variable([64])
h_pool2 = max_pool_2x2(tf.nn.relu(conv2d(h_pool1,w_conv2))+b_conv2)
w_conv3 = weight_variable([5,5,64,10])
b_conv3 = bias_variable([10])
h_conv3 = max_pool_2x2(tf.nn.relu(conv2d(h_pool2,w_conv3))+b_conv3)
h_pool3 = avg_pool_6x6(h_conv3)
h_pool3_flat = tf.reshape(h_pool3,[-1,10])
y_conv = tf.nn.softmax(h_pool3_flat)
cross_entropy = -tf.reduce_sum(y*tf.log(y_conv))
train_step = tf.trainable_variables.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1),tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
sess = tf.Session()
sess.run(tf.global_variables_initializer())
tf.train.start_queue_runners(sess=sess)
for i in range(15000):
image_batch,label_batch = sess.run([images_train,labels_train])
label_b = np.eye(10,dtype=float)[label_batch]
train_step.run(feed_dict={x:image_batch,y:label_b},session=sess)
if i%200 == 0:
train_accuracy = accuracy.eval(feed_dict={x:image_batch,y:label_b},session=sess)
print("step %d,training accuracy %g"%(i,train_accuracy))

卷积神经网络原理及实现

猜你喜欢