实战AlexNet图像识别---猫狗大战

第一步：数据集的加工
第二步：图片数据集转化为TensorFlow专用格式
第三步：图片地址数据集转化为TensorFlow专用格式
第四步：模型的搭建及处理

代码全部来自OpenCV+TensorFlow 深度学习与计算机视觉实战

在我的博客里面介绍了很多神经网络的模型结构，基于再多的理论都不如一次实践来的令人印象深刻的想法，这里摘取了王晓华著的深度学习与计算机视觉实战里面最后一个章节的猫狗大战，并尽量详细的对这一过程进行了注释，帮助大家更好的理解网络结构是如何利用代码实现的这一过程，同时更具体的理解图像识别的整个建模过程。

数据集的加工
大家都知道，深度学习虽说是训练更好的网络模型，但是实际上很大一部分工作是在在数据的处理工作。这里就先给大家介绍这个demo中的数据处理的过程，其实数据处理的方法很多，这只是其中一种。

第一步：数据集的加工

数据集中的数据杂乱无章，规格各不相同，但是我们的模型对数据的要求是统一的，因此我们首先就是要确保输入模型的图片大小统一，具体代码如下：

#导入相关模块
import cv2
import os
#重新设置文件路径
def rebuild(dir):
	for root, dirs, files in os.walk(dir):
		for file in files:
			filepath = os.path.join(root, file)
			try:
			# 读取文件，剪裁文件大小并重新写入文件
				image = cv2.imread(filepath)
				dim = (227, 227)
				resized = cv2.resize(image, dim)
				path = 'C:\\cat_and_dog\\dog_r\\' + file
				cv.imwrite(path, resized)
			except:
			#删除异常文件
				print(filepath)
				os.remove(filepath)
		cv2.waitKey(0)

这里导入的是图片集的根目录，os对数据集所在的文件夹进行读取，之后的一个for循环重建了图片数据所在的路径，在图片被重构后重新写入了给定的位置。
这里需要注意的是，这个代码中对数据的读写是在一个try区域中，因为在整个数据集中不可避免地会出现坏的图片，当执行发现异常时，最简单的办法就是跳过出问题的图片继续执行下去，因此在except模块中使用了os.remove函数对图片进行删除。

第二步：图片数据集转化为TensorFlow专用格式

Tensorflow的专用格式就是TFRecord格式

def get_file(file_dir):
#设置了两个字典images和temp，将文件里的名字放入images里，将文件夹的名字存入temp
	images = []
	temp = []
	for root, sub_folders, files in os.walk(file_dir):
		#image_directories
		for name in files:
			images.append(os.path.join(root, name))
		#get 10 sub-folder names
		for name in sub_folders:
			temp.append(os.path.join(root, name))
		print(files)

	#assign 10 labels based on the folder names，这里设置了新的标签labels，并将temp里面的文件夹标签按照0或者1形式存入
	labels = []
	for one_folder in temp:
		n_img = len(os.listdir(one_folder))
		letter = one_folder.split('\\')[-1]

		if letter == 'cat':
			labels = np.append(labels, n_img*[0])
		else:
			labels = np.append(labels, n_img*[1])

	#shuffle 设置相应的图片列表和标签列表
	temp = np.array([images, labels])
	temp = temp.transpose()
	np.random.shuffle(temp)

	image_list = list(temp[:, 0])
	label_list = list(temp[:, 1])
	label_list = [int(float(i)) for i in label_list]

	return image_list, label_list

上述代码首先对数据集文件的位置进行读取，之后根据文件夹名称的不同将处于不同文件夹中的图片标签设置为0或者1，如果有更多分类，则依据这个格式设置更多的标签。之后使用创建的数组对所读取的文件位置和标签进行保存，而Numpy对数组的调整重构了存储有对应文件位置和文件标签的矩阵，并将其返回。

第三步：图片地址数据集转化为TensorFlow专用格式

对于数据容量不大的数据集，我们可以将其整体转换为TensorFlow专用格式输入到模型中进行训练。但是如果数据集过于庞大，这个转换过程将是一个浩大的工程，会耗费巨大资源从而引起一系列问题。
因此在工程上，除了直接将数据集转化为专用数据格式外，还有一种常用的方法是将需要读取的数据地址集转化为专用的格式，每次直接在其中读取生成batch后的地址，将地址读取后直接在模型内部生成包含一定数量的图片格式的TFRecord。

def get_batch(image_list, label_list, img_width, img_height, batch_size, capacity):
	image = tf.cast(image_list, tf.string)
	label = tf.cast(label_list, tf.int32)

	input_queue = tf.train.slice_input_producer([image, label])

	label = input_queue[1]
	image_contents = tf.read_file(input_queue[0])
	image = tf.image.decode_jpeg(image_contents, channels = 3)

	image = tf.image.resize_image_width_crop_or_pad(image, img_width, img_height)
	image = tf.image.per_image_standardization(image) #将图片标准化
	image_batch, label_batch = tf.train.batch([image, label], batch_size = batch_size, num_threads = 64, capacity = capacity)
	label_batch = tf.reshape(label_batch, [batch_size])
	return image_batch, label_batch

在这里get_batch(image_list, label_list, img_width, img_height, batch_size, capacity)函数中有6个参数，前两个分别为图片列表和标签列表（图片列表和标签列表的生成方式在前文的代码段中已经说明）。 img_width和img_height分别为生成图片的大小，这里按照模型的需求指定。batch_size和capacity分别是每次生成的图片数量和内存中存储的最大数据容量，这里根据不同硬件配置。

第四步：模型的搭建及处理

第一步导入相应模块

#导数相应的数据库
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import time
import create_and_read_TFRecord2 as reader2
import os

# 猫狗大战的数据集下载地址为 http://www.kaggle.com/c/dogs-vs-cats
#这里的文件地址就是我们第一步写入的文件地址，这里面我们已经把图片的大小剪裁为了227*227
X_train, y_train = reader2.get_file('c:\\cat_and_dog_r')

image_batch, label_batch = reader2.get_batch(X_train, y_train, 227, 227, 200, 2048)

#使用batch_norm对数据集进行正则化处理
def batch_norm(inputs, is_training, is_conv_out = True, decay = 0.999):
	scale = tf.Variable(tf.ones([inputs.get_shape()[-1]]))
	beta = tf.Variable(tf.zeros([inputs.get_shape()[-1]]))
	pop_mean = tf.Variable(tf.zeros([inputs.get_shape()[-1]]), trainable = False)
	pop_var = tf.Variable(tf.noes([inputs.get_shape()[-1]]), trainable = False)

	if is_training:
		if is_conv_out:
			batch_mean, batch_var = tf.nn.monents(inputs,[0,1,2])
		else:
			batch_mean, batch_var = tf.nn.monents(inputs,[0])

		train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1-decay))
		train_var = tf.assign(pop_var, pop_var * decay +batch_var * (1-decay))
		with tf.control_dependencies([train_mean, train_var]):
			return tf.nn.batch_normalization(inputs, batch_mean, batch_var, beta, scale, 0.001)
	else:
		return tf.nn.batch_normalization(inputs, pop_mean, pop_var, beta, scale, 0.001)

第二步设置模型参数

with tf.device('/cpu:0'):
	learning_rate = 1e-4
	training_iters = 200
	batch_size = 200
	display_step = 5
	n_classes = 2
	n_fcl = 4096
	n_fc2 = 2048

第三步构建模型

	x = tf.placeholder(tf.float32, [None, 227, 227, 3])
	y = tf.placeholder(tf.int32, [None, n_classes])

	W_conv = {'conv1': tf.Variable(tf.truncated_normal([11, 11, 3, 96], stddev = 0.0001)),
         		 'conv2': tf.Variable(tf.truncated_normal([5, 5, 96, 255], stddev = 0.01)),
         		 'conv3': tf.Variable(tf.truncated_normal([3, 3, 256, 384], stddev = 0.01)),
        		 'conv4': tf.Variable(tf.truncated_normal([3, 3, 384, 384], stddev = 0.01)),
       		 'conv5': tf.Variable(tf.truncated_normal([3, 3, 384, 256], stddev = 0.01)),
          	 'fc1': tf.Variable(tf.truncated_normal([13 * 13 * 256, n_fc1], stddev = 0.1)),
          	 'fc2': tf.Variable(tf.truncated_normal([n_fc1, n_fc2], stddev = 0.1)),
          	 'fc3': tf.Variable(tf.truncated_normal([n_fc2, n_classes], stddev = 0.1))}

	b_conv = {'conv1': tf.Variable(tf.constant(0.0, dtype = tf.float32, shape = [96])),
			 'conv2': tf.Variable(tf.constant(0.1, dtype = tf.float32, shape = [256])),
			 'conv3': tf.Variable(tf.constant(0.1, dtype = tf.float32, shape = [384])),
			 'conv4': tf.Variable(tf.constant(0.1, dtype = tf.float32, shape = [384])),
			 'conv2': tf.Variable(tf.constant(0.1, dtype = tf.float32, shape = [256])),
			 'fc1': tf.Variable(tf.constant(0.1, dtype = tf.float32, shape = [n_fc1])),
			 'fc2': tf.Variable(tf.constant(0.1, dtype = tf.float32, shape = [n_fc2])),
			 'fc3': tf.Variable(tf.constant(0.0, dtype = tf.float32, shape = [n_classes]))}

	x_image = tf.reshape(x, [-1, 227, 227,3])

	#卷积层 1
	conv1 = tf.nn.conv2d(x_image, W_conv['conv1'], strides = [1, 4, 4, 1], padding = 'VALID')
	conv1 = tf.nn.bias_add(conv1, b_conv['conv1'])
	conv1 = tf.nn.relu(conv1)
	#池化层1
	pool1 = tf.nn.avg_pool(conv1, ksize = [1, 3, 3, 1], strides = [1, 2, 2, 1], padding = 'VALID')
	#LRN层， Local Response Normalization
	norm1 = tf.nn.lrn(pool1, 5, bias = 1.0, alpha = 0.001/9.0, beta = 0.75)

	#卷积层2
	conv2 = tf.nn.conv2d(norm1, W_conv['conv2'], strides = [1, 1, 1,], padding = 'SAME')
	conv2 = tf.nn.bias_add(conv2, b_conv['conv2'])
	conv2 = tf.nn.relu(conv2)
	#池化层2
	pool2 = tf.nn.avg_pool(conv2,ksize = [1, 3, 3, 1], strides = [1, 2, 2, 1], padding = 'VALID')
	#LRN层，Local Response Normalization
	norm2 = tf.nn.lrn(pool2, 5, bias = 1.0, alpha = 0.001/9.0, beta = 0.75)

	#卷积层3
	conv3 = tf.nn.conv2d(norm2, W_conv['conv3'], strides = [1, 1, 1, 1], padding = 'SAME')
	conv3 = tf.nn.bias_add(conv3, b_conv['conv3'])
	conv3 = tf.nn.relu(conv3)

	#卷积层4
	conv4 = tf.nn.conv2d(conv3, W_conv['conv4'], strides = [1, 1, 1, 1], padding = 'SAME')
	conv4 = tf.nn.bias_add(conv4, b_conv['conv4'])
	conv4 = tf.nn.relu(conv4)
	 #卷积层5
	conv5 = tf.nn.conv2d(conv4, W_conv['conv5'], strides = [1, 1, 1, 1], padding = 'SAME')
	conv5 = tf.nn.bias_add(conv5, b_conv['conv5'])
	conv5 = tf.nn.relu(conv5)

	#池化层5
	pool5 = tf.nn.avg_pool(conv5, ksize = [1, 3, 3, 1], strides = [1, 2, 2, 1], padding = 'VALID')
	reshape = tf.reshape(pool5, [-1, 13 * 13 * 256])

	fc1 = tf.add(tf.matmul(reshape, W_conv['fc1']), b_conv['fc1'])
	fc1 = tf.nn.relu(fc1)
	fc1 = tf.nn.dropout(fc1, 0.5)

	#全连接层2
	fc2 = tf.add(tf.matmul(fc1, W_conv['fc2']), b_conv['fc2'])
	fc2 = tf.nn.relu(fc2)
	fc2 = tf.nn.dropout(fc2, 0.5)

	#全连接层3，即分类层
	fc3 = tf.add(tf.matmul(fc2, W_conv['fc3']), b_conv['fc3'])

	#定义损失
	loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(fc3, y))
	optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(loss)
	#评估模型
	correct_pred = tf.equal(tf.argmax(fc3, 1), tf.argmax(y, 1))
	accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

init = tf.global_variables_initializer()

#在模型的训练过程中，首先产生了模型输出通道，之后使用batch_size批量读取数据。无论采取何种措施读取数据，对于标签label来说，
#都需要将其转化为矩阵格式，因此在读入模型前，需要使用one-hot函数对其进行操作。
def onehot(labels):
	'''one-hot 编码'''
	n_sample = len(labels)
	n_class = max(labels)+1
	onehot_labels = np.zeros((n_sample, n_class))
	onehot_labels[np.arange(n_sample), labels] = 1
	return onehot_labels

save_model = './/model//AlexNetModel.ckpt'
def train(opech):
	with tf.Session() as sess:
		sess.run(init)

		train_writer = tf.summary.FileWriter('.//log', sess.graph) #输出日志
		saver = tf.train.Saver()

		c = []
		start_time = time.time()

		coord = tf.train.Coordinator()
		threads = tf.train.start_queue_runners(coord = coord)
		step = 0
		for i in range(opech):
			step = i
			image, label = sess.run([image_batch, label_batch])

			labels = onehot(label)

			sess.run(optimizer, feed_dict = {x:image, y: labels})
			loss_record = sess.run(loss, feed_dict = {x: image, y: labels})
			print('now the loss is %f ' % loss_record)

			c.append(loss_record)
			end_time = time.time()
			print('time: ', (end_time - start_time))
			start_time = end_time
			print('--------------%d onpech is finished---------------' %i)
		print('Optimization Finished!')
		saver.save(sess, save_model)
		print('Model Save Finished!')

		coord.request_stop()
		coord.join(threads)
		plt.plot(c)
		plt.xlabel('Iter')
		plt.ylabel('loss')
		plt.title('1r = %f, ti = %d, bs = %d' % (learning_rate, training_iters, batch_size))
		plt.tight_layout()
		plt.savefig('cat_and_dog_AlexNet.jpg', dpi = 200)

from PIL import Image

def per_class(imagefile):

	image = Image.open(imagefile)
	image = Image.resize([227, 227])
	image_array = np.array(image)

	image = tf.cast(image_array, tf.float 32)
	image = tf.image.per_image_standardization(image)
	image = tf.reshape(image, [1, 227, 227, 3])

	saver = tf.train.Saver()
	with tf.Session() as sess:

		save_model = tf.train.latest_checkpoint('.//model')
		saver.restore(sess, save_model)
		image = tf.reshape(image, [1, 227, 227, 3])
		image = sess.run(image)
		prediction = sess.run(fc3, feed_dict = {x: image})

		max_index = np.argmax(prediction)
		if max_index == 0:
			return 'cat'
		else:
			return 'dog'

三景页三景页

发布了64 篇原创文章 · 获赞 25 · 访问量 9893

私信关注