教你用TensorFlow做图像识别

弱者用泪水安慰自己，强者用汗水磨练自己。

上一篇文章里面讲了使用TensorFlow做手写数字图像识别，这篇文章算是它的进阶篇吧，在本篇文章中将会讲解如何使用TensorFlow识别多种类图片。本次使用的数据集是CIFAR-10，这是一个比较经典的数据集，可以去百度一下它的官网，它包含60000张32X32的彩色图像，其中训练集50000张，测试集10000张。里面一共是10类的图片，分别是airplane、automobile、bird、cat、deer、dog、frog、horse、ship和truck。

第一步我们需要下载TensorFlow Models库，你可以去github上面下载也可以使用git指令下载

git clone https://github.com/tensorflow/models.git

导入库，定义batch_size、训练轮数max_steps，以及下载CIFAR-10的路径

from tensorflow.models.tutorials.image.cifar10 import cifar10, cifar10_input
import tensorflow as tf
import numpy as np
import time

max_steps=3000
batch_size=128
data_dir='/cifar10_data'

定义初始化weight的函数，使用tf.truncated_normal截断的正太分布，给weight加一个L2的loss，L2正则化可以帮助我们筛选出最有效的特征。使用w1控制L2 loss的大小，使用tf.nn.l2_loss函数计算weight的L2 loss,再使用tf.multiply让L2 loss乘以w1,得到最后的weight loss，使用tf.add_to.collection把weight loss统一存到一个collection并命名为losses，以后计算神经网络总体的loss会用。

def variable_with_weight_loss(shape,stddev,w1):
    var = tf.Variable(tf.truncated_normal(shape,stddev=stddev))
    if w1 is not None:
        weight_loss=tf.multiply(tf.nn.l2_loss(var),w1,name='weight_loss')
        tf.add_to_collection('losses',weight_loss)
    return var

使用cifar10来下载数据集，再使用cirfar10_input中的distorted_inputs函数产生训练需要使用的数据，包括特征及其对应的label,这里返回的是已经封装好的tensor，每次执行都会生成一个batch_size的数量的样本。里面使用了数据增强，包括随机的水平翻转、随机剪切一款24X24大小的图片、设置随机的亮度和对比度以及对数据进行标准化，如果你想了解更多，可以看看我之前写的文章，因为数据增强需要的计算量很大，所以该方法内部创建了16个独立的线程来进行工作，使用TensorFlow.queue进行调度

cifar10.maybe_download_and_extract()

images_train,labels_train=cifar10_input.distorted_inputs(data_dir=data_dir,batch_size=batch_size)

再使用cifar10_input.inputs来生成测试数据。创建holder，包含特征和label,因为batch_size在之后定义网络被用到了，所以数据尺寸中的第一个值需要被预先设定，大小为24X24，颜色通道为3。

images_test,labels_test=cifar10_input.inputs(eval_data=True,data_dir=data_dir,batch_size=batch_size)

image_holder=tf.placeholder(tf.float32,[batch_size,24,24,3])
label_holder=tf.placeholder(tf.int32,[batch_size])

开始创建第一个卷积层，先使用之前写好的variable_with_weight_loss函数创建卷积核的参数并初始化。第一个卷积层使用5X5的卷积核，3个颜色通道，64个卷积核，设置weight初始化参数的标准差为0.05。不对第一层卷积进行L2正则，所以w1设为0.使用tf.nn.conv2d函数对输入数据进行卷积操作，stride设为1，padding模式为SAME。把这层的bias全部初始化为0，再将卷积的结果加上bias,最后使用一个ReLU激活函数进行非线性化。在ReLU之后使用尺寸3X3，步长为2X2的最大池化层处理数据，然后使用tf.nn.lrn函数，该函数可以使反馈比较大的值更大，反馈比较小的值更小。

weight1=variable_with_weight_loss(shape=[5,5,3,64],stddev=5e-2,w1=0.0)
kernel1=tf.nn.conv2d(image_holder,weight1,[1,1,1,1],padding='SAME')
bias1=tf.Variable(tf.constant(0.0,shape=[64]))
conv1=tf.nn.relu(tf.nn.bias_add(kernel1,bias1))
pool1=tf.nn.max_pool(conv1,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME')
norm1=tf.nn.lrn(pool1,4,bias1=1.0,alpha=0.01/9.0,beta=0.75)

第二层卷积步骤和第一层差不多，不同的是bias值全部初始化为0.1，最后再调换最大池化层和lrn层的位置。

weight2=variable_with_weight_loss(shape=[5,5,64,64],stddev=5e-2,w1=0.0)
kernel2=tf.nn.conv2d(norm1,weight2,[1,1,1,1],padding='SAME')
bias2=tf.Variable(tf.constant(0.1,shape=[64]))
conv2=tf.nn.relu(tf.nn.bias_add(kernel2,bias2))
norm2=tf.nn.lrn(conv2,4,bias=1.0,alpha=0.01/9.0,beta=0.75)
pool2=tf.nn.max_pool(norm2,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME')

连接一个全连接层，将之前的输出结果flatten，使用tf.reshape函数将每个样本变成一维向量。使用get_shape获取数据扁平化后的长度。再使用variable_with_weight_loss函数对全连接层的weight进行初始化，这里的隐藏节点数为384，正太分布分标准差设为0.04，bias值初始化为0.1。需要注意的是我们不希望全连接层过拟合，所以设置了一个非零的weight loss值为0.04，让这一层所有的参数被L2正则约束。最后使用ReLU激活函数进行非线性化。

reshape=tf.reshape(pool2,[batch_size,-1])
dim=reshape.get_shape()[1].value
weight3=variable_with_weight_loss(shape=[dim,384],stddev=0.04,w1=0.004)
bias3=tf.Variable(tf.constant(0.1,shape=[384]))
local3=tf.nn.relu(tf.matmul(reshape,weight3)+bias3)

再来一层全连接，把隐藏节点数降低一半

weight4=variable_with_weight_loss(shape=[384,192],stddev=0.04,w1=0.004)
bias4=tf.Variable(tf.constant(0.1,shape=[192]))
local4=tf.nn.relu(tf.matmul(local3,weight4)+bias4)

创建最后一层，先创建weight，将其正太分布标准差设为上一隐含层的节点数的导数，并且不计入L2正则。

weight5=variable_with_weight_loss(shape=[192,10],stddev=1/192.0,w1=0.0)
bias5=tf.Variable(tf.constant(0.0,shape=[10]))
logits=tf.add(tf.matmul(local4,weight5),bias5)

接下来计算CNN的loss，计算softmax和cross_entropy_loss，使用tf.reduce_mean对cross_enteopy计算均值，再用tf.add_to_collection把cross_entropy的loss添加到整体losses的collection中。最后使用tf.add_n将全部loss求和

def loss(logits,labels):
    labels=tf.cast(labels,tf.int64)
    cross_entropy=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=labels,name='cross_entropy_per_example')
    cross_entropy_mean=tf.reduce_mean(cross_entropy,name='cross_entropy')
    tf.add_to_collection('losses',cross_entropy_mean)
    return tf.add_n(tf.get_collection('losses'),name='total_loss')

将logits节点和label_placeholder传入loss函数，得到最后的loss.

优化器使用adam，学习率设置为1e-3.

使用tf.nn.in_top_k函数求输出结果中top_k的准确率，默认使用top_1，也就是输出分数最高的那一类的准确率。

使用tf.InteractiveSession创建默认Session,初始化所有参数。

启动线程。

loss=loss(logits,label_holder)
train_op=tf.train.AdamOptimizer(1e-3).minimize(loss)
top_k_op=tf.nn.in_top_k(logits,label_holder,1)
sess=tf.InteractiveSession()
tf.global_variables_initializer().run()
tf.train.start_queue_runners()

开始正式训练，在每一个step的训练过程中，先用session的run方法执行image_train,labels_train的计算，获得一个batch的训练数据，再将这个batch的数据传入train_op和loss的计算。记录每一个step所消耗的时间，没10个step会打印一下loss,训练速率以及训练一个batch所消耗的时间。没有gpu会跑的比较慢。

for step in range(max_steps):
    start_time=time.time()
    image_batch,label_batch=sess.run([images_train,labels_train])
    _,loss_value=sess.run([train_op,loss],feed_dict={image_holder:image_batch,label_holder:label_batch})
    duration=time.time()-start_time
    if step % 10==0:
        examples_per_sec=batch_size/duration
        sec_per_batch=float(duration)
        format_str=('step %d,loss=%.2f(%.1f examples/sec; %.3f sec/batch)')
        print(format_str%(step,loss_value,examples_per_sec,sec_per_batch))

接下来评测模型再测试集上的准确率，像训练那样一个batch一个batch进行测试，记录正确的数量，最后求得准确率并打印。

num_examples=10000
import math
num_iter=int(math.ceil(num_examples/batch_size))
true_count=0
total_sample_count=num_iter*batch_size
step=0
while step < num_iter:
    image_batch,label_batch=sess.run([images_test,images_test])
    predictions=sess.run([top_k_op],feed_dict={image_holder:image_batch,label_holder:label_batch})
    true_count+=np.sum(predictions)
    step+=1
precision=true_count/total_sample_count
print('precision @ 1=%.3f'%precision)

教你用TensorFlow做图像识别

猜你喜欢