TensorFlow MNIST Beginner Learning

MNIST

MNIST is an entry-level computer vision dataset that contains many pictures of handwritten digits, as shown in the figure:

The data set contains pictures and corresponding annotations. This data set is provided in TensorFlow, and we can import it in the following ways:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
print(mnist)

 The output is as follows:

Extracting MNIST_data/train-images-idx3-ubyte.gz

Extracting MNIST_data/train-labels-idx1-ubyte.gz

Extracting MNIST_data/t10k-images-idx3-ubyte.gz

Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Datasets(train=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x101707ef0>, validation=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x1016ae4a8>, test=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x1016f9358>)

 Here the program will first download the MNIST dataset, then unzip it and save it to the MNIST_data folder just made, and then output the dataset object.

The dataset contains 55000 rows of training dataset (mnist.train), 5000 rows of validation dataset (mnist.validation) and 10000 rows of test dataset (mnist.test). The files are as follows:

As mentioned earlier, each MNIST data unit consists of two parts: an image containing handwritten digits and a corresponding label. Let's set these images to xs and these labels to ys. Both the training data set and the test data set contain xs and ys. For example, the image of the training data set is mnist.train.images, the label of the training data set is mnist.train.labels, and each image is 28 x 28 pixels, that is, 784 pixels, we can expand it to form a vector, a vector of length 784.

Therefore, the training set can be converted into a vector of [55000, 784]. The first dimension is the number of pictures contained in the training set, and the second dimension is the vector represented by the pixels of the picture.

Softmax

Softmax can be regarded as an activation function or link function, which converts the output of the linear function we defined into the format we want, that is, the probability distribution about 10 number classes. Therefore, given a picture, its fit for each digit can be converted into a probability value by the Softmax function. The Softmax function can be defined as:

Expanding the subform on the right-hand side of the equation, we get:

For example, to determine what the animal in a picture is, there are three possible results, cat, dog, chicken. If we can calculate their respective scores as 3.2, 5.1, and -1.7, the Softmax process will first evaluate each value. Perform power calculations, which are 24.5, 164.0, and 0.18, respectively, and then calculate the proportion of each power result to the total power result, so that you can get the three values ​​of 0.13, 0.87, and 0.00, so we can achieve differential Shrinking, that is, the good is better, the bad is worse.

If you want to further calculate the loss value, you can further logarithm and then take the negative value, so that if the value after Softmax is closer to 1, then the obtained value is smaller, that is, the smaller the loss, if it is farther away from 1, then the obtained value is larger. .

Implement a regression model

First import TensorFlow, the command is as follows:

import tensorflow as tf

 
Next we specify an input, where the input is the sample data, if it is a training set, it is a 55000 x 784 matrix, if it is a validation set, it is a 5000 x 784 matrix, and if it is a test set, it is 10000 x 784. matrix, so its number of rows is indeterminate, but the number of columns is determined.

So you can declare a placeholder object first:

x = tf.placeholder(tf.float32, [None, 784])

 Here the first parameter specifies the type of each data in the matrix, and the second parameter specifies the dimension of the data.

Next we need to build the first layer of the network, the expression is as follows:

Here, the input x is actually multiplied by the w weight, and then a bias term is added as the output, and these two variables are actually dynamically tuned during the training process, so we need to specify their type as Variable, code show as below:

w = tf.Variable(tf.zeros([784, 10]))

b = tf.Variable(tf.zeros([10]))

 The next thing that needs to be implemented is the formula described in the above figure. We will further call Softmax for calculation to get y:

y = tf.nn.softmax(tf.matmul(x, w) + b)

 Through the above lines of code, we have completed the construction of the model, and the structure is very simple.

loss function

In order to train our model, we first need to define a metric to evaluate whether this model is good. In fact, in machine learning, we usually define an indicator to indicate that a model is bad, this indicator is called cost or loss, and then try to minimize this indicator. But both ways are the same.

A very common and very nifty cost function is "cross-entropy". Cross-entropy originated from the information compression coding technology in information theory, but it has since evolved into an important technical means in other fields from game theory to machine learning. It is defined as follows:

y is the probability distribution of our predictions, y_label is the actual distribution, and a rough understanding is that cross-entropy is a measure of the inefficiency of our predictions for describing the truth.

We can first define y_label, its expression is:

y_label = tf.placeholder(tf.float32, [None, 10])

 Next we need to calculate their cross entropy, the code is as follows:

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_label * tf.log(y), reduction_indices=[1]))

 First use the reduce_sum() method to sum each dimension, and reduction_indices specifies which dimensions to sum along.

Then call reduce_mean() to find the mean, which averages all the elements in a vector.

In this way, we only need to optimize this cross entropy in the end.

So in this way we define another optimization method:

train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

 The GradientDescentOptimizer is used here, where we ask TensorFlow to minimize the cross-entropy with a gradient descent algorithm at a learning rate of 0.5. The gradient descent algorithm is a simple learning process where TensorFlow just moves each variable little by little in the direction of decreasing cost.

run the model

After defining the above content, it is equivalent to that we have built a computational graph, that is, we have set up the model, and we can put it into the Session to run:

with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    for step in range(total_steps + 1):

        batch_x, batch_y = mnist.train.next_batch(batch_size)

        sess.run(train, feed_dict={x: batch_x, y_label: batch_y})

 At each step of the loop, we randomly grab batch_size batches of data points from the training data, and then we run train with those data points as parameters replacing the previous placeholders.

Here some variable definitions are needed:

batch_size = 100

total_steps = 5000

 test model

So how does our model perform?

First let's find out which labels are correctly predicted. tf.argmax() is a very useful function that gives the index value of the maximum value of the data of a Tensor object in a certain dimension. Since the label vector is composed of 0,1, the index position where the maximum value 1 is located is the category label. For example, tf.argmax(y, 1) returns the label value predicted by the model for any input x, and tf.argmax (y_label, 1) represents the correct label, and we can use the tf.equal() method to check whether our prediction matches the true label (the same index position means a match).

correct_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_label, axis=1))

 This line of code will give us a set of boolean values. To determine the proportion of correct predictors, we can convert the boolean to a float and take the average. For example, [True, False, True, True] becomes [1, 0, 1, 1], which is 0.75 when averaged.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

 Finally, we calculate the accuracy of the learned model on the test dataset, defined as follows:

steps_per_test = 100

if step % steps_per_test == 0:

    print(step, sess.run(accuracy, feed_dict={x: mnist.test.images, y_label: mnist.test.labels}))

 This final result value should be around 92%.

In this way, we have achieved a basic training model by completing the training and testing phases, and we will continue to optimize the model to achieve better results.

The results are as follows:

0 0.453

100 0.8915

200 0.9026

300 0.9081

400 0.9109

500 0.9108

600 0.9175

700 0.9137

800 0.9158

900 0.9176

1000 0.9167

1100 0.9186

1200 0.9206

1300 0.9161

1400 0.9218

1500 0.9179

1600 0.916

1700 0.9196

1800 0.9222

1900 0.921

2000 0.9223

2100 0.9214

2200 0.9191

2300 0.9228

2400 0.9228

2500 0.9218

2600 0.9197

2700 0.9225

2800 0.9238

2900 0.9219

3000 0.9224

3100 0.9184

3200 0.9253

3300 0.9216

3400 0.9218

3500 0.9212

3600 0.9225

3700 0.9224

3800 0.9225

3900 0.9226

4000 0.9201

4100 0.9138

4200 0.9184

4300 0.9222

4400 0.92

4500 0.924

4600 0.9234

4700 0.9219

4800 0.923

4900 0.9254

5000 0.9218

 Epilogue

In this section, we briefly experience the training and prediction process of real data through a MNIST dataset, but the accuracy is not high enough. Later, we will learn to use convolution for model training, and the accuracy will be higher.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326444721&siteId=291194637