Tensorflow2 three-layer fully connected network realizes handwritten digit recognition

Tensorflow2 implements handwritten digit recognition

0. Introduction to MNIST data set

The MNIST data set comes from the National Institute of Standards and Technology (NIST). The training set consists of handwritten numbers from 250 different people, 50% of which are high school students and 50% from the population. Staff at the Census Bureau. The test set is also the same proportion of handwritten digit data.

The MNIST dataset is available at MNIST and consists of four parts:

  • Training set images: train-images-idx3-ubyte.gz
    (47 MB ​​unzipped, contains 60,000 samples)
  • Training set labels: train-labels-idx1-ubyte.gz
    (60 KB after decompression, contains 60,000 labels)
  • Test set images: t10k-images-idx3-ubyte.gz
    (7.8 MB after decompression, contains 10,000 samples)
  • Test set labels: t10k-labels-idx1-ubyte.gz
    (10 KB after decompression, contains 10,000 labels)

The pictures in the Training set and Test set here are 28×28 grayscale images, and each pixel is a value in [0, 255].

The value in the labels label set is a value in [0, 9], which marks the handwritten number of the corresponding position picture.

example:Number 3

 


1. Code details

import tensorflow as tf

Introduce tensorflow module with tf as alias

 


batch_size = 128
le_r = 0.2

Define a batch size and learning rate

 


mnist = tf.keras.datasets.mnist
(ti, tl), (vi, vl) = mnist.load_data()
print('datasets:', ti.shape, tl.shape, vi.shape, vl.shape)

Use tensorflow built-in function to import MNIST data set

And output the dimensions of the data:

datasets: (60000, 28, 28) (60000,)

(10000, 28, 28) (10000,)

 


def fun(a, b):
    a = tf.cast(a, dtype=tf.float32)
    b = tf.cast(b, dtype=tf.int64)
    return tf.reshape(a, [-1, 28*28])/255.0, tf.one_hot(b, depth=10)

ti, tl = fun(ti, tl)
vi, _ = fun(vi, vl)

Since the dimensions of vi and ti are [-1, 28, 28], we hope to flatten each [28, 28] sample into [784] for easy input into the network, so a function named fun is defined here, respectively. Perform data preprocessing on ti, tl and vi (vl does not need to be processed). The input is simply normalized, that is, divided by 255.0 (maximum value - minimum value in the data). And one-hot encoding is performed on the tags.

After preprocessing, the dimensions of ti and vi are [60000, 784] and [10000, 784] respectively.

 

example:

After normalization

Insert image description here

Its corresponding label 3 will be expanded to [0, 0, 0, 1, 0, 0, 0, 0, 0]

About data preprocessing expansion:

 


d1 = tf.data.Dataset.from_tensor_slices((ti, tl)) # ti tl 自动转换为tensor
d1 = d1.shuffle(10000).batch(batch_size)

Call the tf.data.Dataset.from_tensor_slices() function to construct a slice of (ti, tl): shuffle() can shuffle the data. batch(batch_size) can divide the data into several batch_size-sized data groups and return a Iterable object used to iterate through individual data groups.

 


w1 = tf.random.normal([784, 512])
b1 = tf.zeros([512], dtype=tf.float32)
w2 = tf.random.normal([512, 10])
b2 = tf.zeros([10], dtype=tf.float32)

Construct the weight matrix w and its offset b of the first and second layers, initialize b to a 0 matrix; initialize w according to a normal distribution.

Here, since the input ti dimension has been processed to [-1, 784],]', our input is 784 nodes, and the middle hidden layer is 512 nodes. Since it is a 10-class classification problem, the output is 10 nodes, representing numbers. 0~9, the larger the node output, the more likely it is to represent this number.

Regarding weight initialization expansion:

 


Afterwards, iterate the data set for gradient descent optimization and output the accuracy of the network on the verification set at a fixed period.

for epoch in range(10):
    print('the {0} epoch began'.format(epoch))
    d2 = iter(d1)
    for steps, (x, y) in enumerate(d2):

        with tf.GradientTape() as tape:
            tape.watch([w1, b1, w2, b2])  # 可以去掉tf.variable()包装
            h1 = x@w1 + b1
            a1 = tf.nn.sigmoid(h1)  

            out1 = a1 @ w2 + b2
            loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y, out1, from_logits=True))

        if steps % 100 == 0:
            print(steps, 'finished')
        grads = tape.gradient(loss, [w1, b1, w2, b2])
        w1 = w1 - le_r*grads[0]
        b1 = b1 - le_r*grads[1]
        w2 = w2 - le_r*grads[2]
        b2 = b2 - le_r*grads[3]

    c1 = tf.nn.sigmoid(vi @ w1 + b1)  
    c2 = tf.nn.softmax(c1 @ w2 + b2, axis=1)  

    out2 = tf.cast(tf.argmax(c2, axis=1), dtype=tf.int64)
    acc = tf.reduce_sum(tf.cast(tf.equal(out2, vl), tf.float32))/vl.shape[0]
    print('the {0} epoch finished and the acc ={1}'.format(epoch+1, acc))

 

Viewed separately:

for epoch in range(10):
    print('the {0} epoch began'.format(epoch))
    d2 = iter(d1)
    for steps, (x, y) in enumerate(d2):

The first is to run a total of 10 epochs, that is, 10 large loops. Each large loop first initializes d2 as the iterator of d1, and then iterates d1 through the for loop. The dimensions of d1, x, and y except the last one are [128, 784], [128, 10].

 

 with tf.GradientTape() as tape:
            tape.watch([w1, b1, w2, b2])  # 可以去掉tf.variable()包装
            h1 = x@w1 + b1
            a1 = tf.nn.sigmoid(h1)  # [TensorShape([128, 500])

            out1 = a1 @ w2 + b2
            loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y, out1, from_logits=True))

@ means matrix multiplication

Tensorflow's powerful automatic differentiation is used here. tape.watch([w1, b1, w2, b2]) means recording the gradient information of w1, b1, w2, b2, so that there is no need to wrap a package for w1, b1, w2, b2. Layer tf.Variable().

a1 is the final hidden layer output of h2 after being processed by the sigmoid() activation function.

 

Finally, define the loss function. The value of this function depicts the degree of deviation between our final output of [-1, 10] and the label [-1, 10]. The mean square error can be used here, but the effect is better to use the cross entropy function as the loss function. The cross-entropy function can more accurately reflect the gap between two probability distributions.

Here, because we have not performed softmax processing on the out output to make it conform to the characteristics of the probability distribution (the sum is 1), we set from_logits to True, which will return to calling softmax_cross_entropy_with_logits_v2() to help us perform softmax processing within the function.

 

Here is a demonstration of tensorflow's automatic derivation, and you can also find the second-order derivative through nesting.

With tf.GradientTape() as tape:
    Build computation graph
    loss=(x)
[w_grad] = tape.gradient(loss,[w])

 


        if steps % 100 == 0:
            print(steps, 'finished')
        grads = tape.gradient(loss, [w1, b1, w2, b2])
        w1 = w1 - le_r*grads[0]
        b1 = b1 - le_r*grads[1]
        w2 = w2 - le_r*grads[2]
        b2 = b2 - le_r*grads[3]

grads = tape.gradient(loss, [w1, b1, w2, b2]) takes out the partial derivative matrix of w1, b1, w2, b2 to loss (loss function) (grads = [dw1, db1, dw2, db2])

Lines 4, 5, 6, and 7 update parameters in the direction of gradient reduction.

Every time 100 batches are run (100 gradient descents are performed), a completion message is output.

 

    c1 = tf.nn.sigmoid(vi @ w1 + b1)  
    c2 = tf.nn.softmax(c1 @ w2 + b2, axis=1)  

    out2 = tf.cast(tf.argmax(c2, axis=1), dtype=tf.int64)
    acc = tf.reduce_sum(tf.cast(tf.equal(out2, vl), tf.float32))/vl.shape[0]
    print('the {0} epoch finished and the acc ={1}'.format(epoch+1, acc))

At this time, 1 epoch has been completed, and c1 and c2 are the forward propagation process of the verification set. At this time, the c2 dimension is [10000, 10], that is, each row is the output of a verification set sample image after network processing. There is no need to perform softmax processing on it. We only need to take out the bottom node of the node with the largest output value among the 10 nodes. You can know the predicted value of this sample picture by using the standard.

The tf.argmax() function is used here to get the subscript of the largest element in each row. The processed out2 dimension is [10000, ], which corresponds to the predicted values ​​of 10000 verification set images.

Then compare out2 with the vl label set, convert the true and false values ​​​​to 1 or 0, and sum it up to get the number of correctly predicted pictures. Then divide by vl.shape[0], which is 10000, to get the predicted number. Accuracy.

 
Insert image description here

After running it, you can see that the accuracy of this network in predicting handwritten digits is about 91%. If you use a convolutional network, you can achieve better accuracy.

 


2. Code Overview

import tensorflow as tf

batch_size = 128
le_r = 0.2

def fun(a, b):
    a = tf.cast(a, dtype=tf.float32)
    b = tf.cast(b, dtype=tf.int64)
    return tf.reshape(a, [-1, 28*28])/255.0, tf.one_hot(b, depth=10)

mnist = tf.keras.datasets.mnist
(ti, tl), (vi, vl) = mnist.load_data()
print('datasets:', ti.shape, tl.shape, vi.shape, vl.shape)

ti, tl = fun(ti, tl)
vi, _ = fun(vi, vl)

d1 = tf.data.Dataset.from_tensor_slices((ti, tl)) # ti tl 自动转换为tensor
d1 = d1.shuffle(10000).batch(batch_size)

w1 = tf.random.normal([784, 512])
b1 = tf.zeros([512], dtype=tf.float32)
w2 = tf.random.normal([512, 10])
b2 = tf.zeros([10], dtype=tf.float32)

for epoch in range(10):
    print('the {0} epoch began'.format(epoch))
    d2 = iter(d1)
    for steps, (x, y) in enumerate(d2):

        with tf.GradientTape() as tape:
            tape.watch([w1, b1, w2, b2])  # 可以去掉tf.variable()包装
            h1 = x@w1 + b1
            a1 = tf.nn.sigmoid(h1)  

            out1 = a1 @ w2 + b2
            loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y, out1, from_logits=True))

        if steps % 100 == 0:
            print(steps, 'finished')
        grads = tape.gradient(loss, [w1, b1, w2, b2])
        w1 = w1 - le_r*grads[0]
        b1 = b1 - le_r*grads[1]
        w2 = w2 - le_r*grads[2]
        b2 = b2 - le_r*grads[3]

    c1 = tf.nn.sigmoid(vi @ w1 + b1)  
    c2 = tf.nn.softmax(c1 @ w2 + b2, axis=1)  

    out2 = tf.cast(tf.argmax(c2, axis=1), dtype=tf.int64)
    acc = tf.reduce_sum(tf.cast(tf.equal(out2, vl), tf.float32))/vl.shape[0]
    print('the {0} epoch finished and the acc ={1}'.format(epoch+1, acc))

Guess you like

Origin blog.csdn.net/weixin_43461724/article/details/101032310