CNN algorithm for handwritten digit recognition (MNIST dataset)

The basic process is shown in the figure below:

x (the feature value of the picture) : Here, a 28*28=784 column of data is used to represent the composition of a picture, that is to say, each point is a feature of the picture, which is actually easier to understand, because each A point will have an impact on the appearance of the picture and the meaning of expression, but the magnitude of the impact is different.

W (the weight corresponding to the feature value) : This value is very important. After a series of training, the weight of each feature's influence on the result is obtained. Our training is to get this optimal weight value.

b (bias) : for delinearization.

y (predicted result) : the probability of which number is predicted for a single sample, for example: the possible result is [ 1.07476616 -4.54194021 2.98073649 -7.42985344 3.29253793 1.96750617 8.59438515 -6.65950203 1.6872 1473 -0.9658531 ], it means 0, 1, 2 respectively , the probability of 3, 4, 5, 6, 7, 8, 9, and then a maximum value will be taken as the result of this prediction. For this array, the result is 6 (8.59438515)

y_ (real result): label value, from the MNIST training set, the real value corresponding to each picture, such as 1 is expressed as: [0 1 0 0 0 0 0 0 0 0], also called one-hot vector (use Discrete values ​​represent individual features).

MNIST ( Mixed National Institute of Standards and Technology database ) is a computer vision data set and a benchmark data set for machine learning. It belongs to the entry-level application data of machine learning . The MNIST data set was created by Yann  LeCun et al. in the research of machine learning It is very commonly used in , containing 70,000 handwritten images with a length of 28 pixels and a width of 28 pixels, and each image is only black and white .

It contains the following four parts:

(1) Training set images: train-images-idx3-ubyte.gz (9.9MB, containing 60,000 samples).

(2) Training set class labels: train-labels-idx1-ubyte.gz (29KB, containing 60,000 class labels).

(3) Test set image: t10k-images-idx3-ubyte.gz (1.6MB, containing 10000 samples).

(4) Test set labels: t10k-labels-idx1-ubyte.gz (5KB, including 1000 labels).

The MNIST dataset is constructed from two US NIST datasets. The handwritten pictures in the training set come from the hands of 250 people, 50% of which are from high school students, and the remaining 50% are from the Census Bureau. The pictures in the test set were also handwritten to the same scale by high school students and people selected by the Census Bureau.

Each image has a corresponding label for the number it represents. The image data information is stored in the image file, and the label data information is stored in the class label file.

The data set is divided into two categories : a training set of 60,000 rows and a test set of 10,000 rows . The training set is divided into 55000 rows of training data and 5000 rows of validation data . The training set is a tensor (tensor) with a shape of [60000 , 784] . The first dimension 6000 is used to index the picture, and the second dimension 7 84 is 2 8 * 28 , which is the pixel point of the picture in MNIST , which is used to index the picture pixel.

There are ten categories of tags, 0~9 , used to indicate the number corresponding to each picture . What is the number in the picture, then the dimension number of the label is 1. Using one-hot encoding, expressed as a ten-dimensional vector, for example , ([0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0]) is used to represent label 1, and the label is a [60000 , 10] for the numeric matrix. 

MNIST can be visualized: 

import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("./MNIST_data", one_hot=False)
fig, ax_big = plt.subplots()
for i in range(10):  
# 读取十张
    x, y = mnist.test.next_batch(1)
    x = x.reshape([28, 28])
    ax = fig.add_subplot(2, 5, i + 1)  
# 以2行5列形式展示
    ax.imshow(x, cmap=plt.cm.gray)
    ax.set_xticks([])
    ax.set_yticks([])
# 隐藏子图坐标轴刻度
ax_big.set_xticks([])
# 隐藏坐标轴刻度
ax_big.set_yticks([])
plt.show()

Visualize 10 MNIST grayscale images as follows: 

Notes are attached to places that may not be understood, code:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from tensorflow.examples.tutorials.mnist import input_data  # download and extract the data set automatically

# 初始化参数
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    # shape是必选项,输出张量维度,stddev=0.1用于设置正态分布被截断前的标准差
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    # 开始时设置为0.1
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# ksize为卷积核大小  padding取valid时卷积之后图像大小为N=(imgSize-ksize)/strides
# x 表示输入图像,要求是一个tensor具有[batch,in_height,in_width,in_channels]这样的shape,为float32或者64
# 过滤器要求是一个tensor,具有[filter_height,filter_width,in_channels,out_channels]这样的shape

# get the data source
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# tf.name_scope()和tf.variable_scope()是两个作用域,
# 一般与两个创建/调用变量的函数tf.variable() 和tf.get_variable()搭配使用,用于变量共享
# input image:pixel 28*28 = 784
with tf.name_scope('input'):
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder('float', [None, 10])  # y_ is realistic result
# 创建占位符是 tf 读取数据的一种方法,让python代码来提供数据

with tf.name_scope('image'):
    x_image = tf.reshape(x, [-1, 28, 28, 1])  # -1表示由实际情况来定,图像数目*宽*高/28/28/1=第一维数
    tf.summary.image('input_image', x_image, 8)
    # 输出带图像的probuf,汇总数据的图像的的形式如下: ’ tag /image/0’, ’ tag /image/1’…,如:input/image/0等

# the first convolution layer
with tf.name_scope('conv_layer1'):
    W_conv1 = weight_variable([5, 5, 1, 32])  # convolution kernel: 5*5*1, number of kernel: 32
    b_conv1 = bias_variable([32])
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)  # make convolution, output: 28*28*32

with tf.name_scope('pooling_layer'):
    h_pool1 = max_pool_2x2(h_conv1)  # make pooling, output: 14*14*32

# the second convolution layer
with tf.name_scope('conv_layer2'):
    W_conv2 = weight_variable([5, 5, 32, 64])  # convolution kernel: 5*5, depth: 32, number of kernel: 64
    b_conv2 = bias_variable([64])
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)  # output: 14*14*64

with tf.name_scope('pooling_layer'):
    h_pool2 = max_pool_2x2(h_conv2)  # output: 7*7*64


# the first fully connected layer
with tf.name_scope('fc_layer3'):
    W_fc1 = weight_variable([7 * 7 * 64, 1024])
    b_fc1 = bias_variable([1024])  # size: 1*1024
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)  # output: 1*1024

# dropout
with tf.name_scope('dropout'):
    keep_prob = tf.placeholder(tf.float32)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)


# the second fully connected layer
# train the model: y = softmax(x * w + b)
with tf.name_scope('output_fc_layer4'):
    W_fc2 = weight_variable([1024, 10])
    b_fc2 = bias_variable([10])  # size: 1*10

with tf.name_scope('softmax'):
    y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)  # output: 1*10

with tf.name_scope('lost'):
    cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))   # 交叉熵计算损失函数
    tf.summary.scalar('lost', cross_entropy)

with tf.name_scope('train'):
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    # AdamOptimizer是tensorflow中的一种优化器,1e-4是学习率
    # 为了最小化损失函数,需要用到反向传播思想,随机梯度下降算法来最小化损失函数

with tf.name_scope('accuracy'):
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    tf.summary.scalar('accuracy', accuracy)  # 显示标量信息
    '''
    tf.argmax(y_conv, 1) 是返回第二维(行方向)上最大值的索引,
    tf.equal() 是比较两个值是否相等,返回一个 bool 值(True or False),
    tf.cast() 是将bool 值转换为 1.0 or 0.0 的浮点类型,
    tf.reduce_mean() 是计算平均值。
    '''

merged = tf.summary.merge_all()
train_summary = tf.summary.FileWriter(r'C:\Users\12956\Anaconda3\Lib\site-packages', tf.get_default_graph())
# 将这里的地址路径改成tensorboard文件夹的绝对路径地址

# init all variables
init = tf.global_variables_initializer()
# 进行初始化,之前只是定义variable

# run session
with tf.Session() as sess:
    sess.run(init)
    # 构建一个session,在session中运行
    # train data: get w and b
    for i in range(2000):  # train 2000 times
        batch = mnist.train.next_batch(50)
        # 批量给网络提供数据

        result, _ = sess.run([merged, train_step], feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
        # 在mnist数据集中,0是输入的图片,1是输入的标签,这个batch是784*10的
        # train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

        if i % 100 == 0:
            # train_accuracy = sess.run(accuracy, feed_dict)
            train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})  # no dropout
            print('step %d, training accuracy %g' % (i, train_accuracy))

            # result = sess.run(merged, feed_dict={x: batch[0], y_: batch[1]})
            train_summary.add_summary(result, i)

    train_summary.close()

    print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))


 

Guess you like

Origin blog.csdn.net/baidu_41774120/article/details/117380864