深度学习入门（1）----用卷积神经网络进行图像识别（一）

在深度学习入门的过程中，卷积神经网络（Convolutional Neural Netwok, CNN）模型的学习是必不可少的，CNN是深度学习理论和方法中的重要组成部分。为了更好的学习到卷积神经网络的应用，将通过卷积神经网络模型在图像识别领域的应用来入门。

应用背景：

**本项目将通过识别手写的“对”、“错”图像，也就是常说的“√”“×”，训练数据保存在’checkData.txt’文件中。checkData.txt

卷积神经网络结构：
本例中使用3个卷积层来把输入层的数据逐步进行特征抽象，再进入2个全连接层进行特征关系和权重值计算，并将结果输出到输出层。具体结构如下图所示：
在这里插入图片描述
数据格式说明：
每一张图像，用5×5的二维矩阵来表示，图像为纯黑白影像，因此矩阵中的每个值为0或者1，0表示的是白色，1表示的是黑色。例如：
0 0 0 0 0
0 0 0 0 1
0 1 0 1 0
0 0 1 0 0 表示‘√’
0 0 0 0 0

0 0 0 0 0
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0 表示‘×’
0 0 0 0 0

0 0 0 0 0
0 0 0 0 0
1 1 1 1 1
0 0 0 0 0 表示无法识别，既不是‘√’，也不是‘×’
0 0 0 0 0

在文件‘’中，每条训练数据包括28个数字，前25个数字表示的是5×5的图像数据，后3个数字表示的是这个图像属于哪一类（因为一共就3类，三个值分别表示‘√’、‘×’、‘无法识别’）。

识别的代码如下：

#!/usr/bin/env python 
# -*- coding:utf-8 -*-
import tensorflow as tf
import numpy as np
import pandas as pd
import sys

roundCount = 100
learnRate = 0.01

argt = sys.argv[1:]

for v in argt:
    if v.startswith('-round='):
        roundCount = int(v[len('-round='):])
    if v.startswith('-learnrate='):
        learnRate = float(v[len('-learnrate='):])

fileData = pd.read_csv('checkData.txt', dtype=np.float32, header=None)

wholeData = fileData.as_matrix()

rowCount = wholeData.shape[0]

print('wholeData=%s' % wholeData)
print('rowCount=%s' % rowCount)

x = tf.placeholder(shape=[25], dtype=tf.float32)
yTrain = tf.placeholder(shape=[3], dtype=tf.float32)

filter1T = tf.Variable(tf.ones([2, 2, 1, 1]), dtype=tf.float32)
n1 = tf.nn.conv2d(input=tf.reshape(x, [1, 5, 5, 1]), filter=filter1T, strides=[1, 1, 1, 1], padding='SAME')

filter2T = tf.Variable(tf.ones([2, 2, 1, 1]), dtype=tf.float32)
n2 = tf.nn.conv2d(input=tf.reshape(n1, [1, 5, 5, 1]), filter=filter2T, strides=[1, 1, 1, 1], padding='VALID')

filter3T = tf.Variable(tf.ones([2, 2, 1, 1]), dtype=tf.float32)
n3 = tf.nn.conv2d(input=tf.reshape(n2, [1, 4, 4, 1]), filter=filter3T, strides=[1, 1, 1, 1], padding='VALID')

n3f = tf.reshape(n3, [1, 9])

w4 = tf.Variable(tf.random_normal([9, 16]), dtype=tf.float32)
b4 = tf.Variable(0, dtype=tf.float32)

n4 = tf.nn.tanh(tf.matmul(n3f, w4) + b4)

w5 = tf.Variable(tf.random_normal([16, 3]), dtype=tf.float32)
b5 = tf.Variable(0, dtype=tf.float32)

n5 = tf.reshape(tf.matmul(n4, w5) + b5, [-1])

y = tf.nn.softmax(n5)

loss = -tf.reduce_mean(yTrain * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))
optimizer = tf.train.RMSPropOptimizer(learnRate)

train = optimizer.minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(roundCount):
    lossSum = 0.0

    for j in range(rowCount):
        result = sess.run([train, x, yTrain, y, loss], feed_dict={x:wholeData[j][0:25], yTrain:wholeData[j][25:28]})

        lossT = float(result[len(result) - 1])

        lossSum = lossSum + lossT

        if j == (rowCount - 1):
            print('i: %d, loss: %10.10f, avgLoss: %10.10f' % (i, lossT, lossSum / (rowCount + 1)))

print(sess.run([y, loss], feed_dict={x: [1, 0, 0, 0, 1,  0, 1, 0, 1, 0,  0, 0, 1, 0, 0,  0, 0, 0, 0, 0,  0, 0, 0, 0, 0],
                                     yTrain: [1, 0, 0]}))
print(sess.run([y, loss], feed_dict={x: [1, 0, 0, 0, 1,  0, 1, 0, 1, 0,  0, 0, 1, 0, 0,  0, 1, 0, 1, 0,  1, 0, 0, 0, 1],
                                     yTrain: [0, 1, 0]}))
print(sess.run([y, loss], feed_dict={x: [0, 0, 0, 0, 0,  0, 0, 0, 0, 0,  0, 0, 0, 0, 0,  1, 1, 1, 1, 1,  0, 0, 0, 0, 0],
                                     yTrain: [0, 0, 1]}))

将本代码保存为‘conv3.py’，在cmd中运行该代码：

python conv3.py -round=10000 -learnrate=0.0001   #训练次数为10000次，学习率为0.0001.

运行结果为：
在这里插入图片描述
从最后的3行结果可知，第一行测试结果中最大值为0.8422765，也就是为‘√’的概率最大，同理可知，第二行为‘×’，第三行为‘√’。此时对比一下原代码中的输入图片的真实值，第一行为’√’，第二行为‘×’，第三行为‘无法识别’，这与训练之后的测试值相比，第三行是预测错误的，说明本模型的预测精度还有待提高。
具体的提高方式可以从以下几个方面考虑：
（1）增加训练样本数据集，（本例中只有15个训练样本，数据量太小）
（2）增加卷积层的数量，从而进行更高层次的特征抽象和特征提取
（3）增加每个图像的像素点，本例中是5×5的，可以划分成更多的数据。

以卿妈妈Dpp

发布了13 篇原创文章 · 获赞 2 · 访问量 2637

私信关注

深度学习入门（1）----用卷积神经网络进行图像识别（一）

猜你喜欢