MNIST
MNIST is an entry-level computer vision dataset that contains many pictures of handwritten digits, as shown in the figure:
The data set contains pictures and corresponding annotations. This data set is provided in TensorFlow, and we can import it in the following ways:
from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('MNIST_data/', one_hot=True) print(mnist)
The output is as follows:
Extracting MNIST_data/train-images-idx3-ubyte.gz Extracting MNIST_data/train-labels-idx1-ubyte.gz Extracting MNIST_data/t10k-images-idx3-ubyte.gz Extracting MNIST_data/t10k-labels-idx1-ubyte.gz Datasets(train=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x101707ef0>, validation=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x1016ae4a8>, test=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x1016f9358>)
Here the program will first download the MNIST dataset, then unzip it and save it to the MNIST_data folder just made, and then output the dataset object.
The dataset contains 55000 rows of training dataset (mnist.train), 5000 rows of validation dataset (mnist.validation) and 10000 rows of test dataset (mnist.test). The files are as follows:
As mentioned earlier, each MNIST data unit consists of two parts: an image containing handwritten digits and a corresponding label. Let's set these images to xs and these labels to ys. Both the training data set and the test data set contain xs and ys. For example, the image of the training data set is mnist.train.images, the label of the training data set is mnist.train.labels, and each image is 28 x 28 pixels, that is, 784 pixels, we can expand it to form a vector, a vector of length 784.
Therefore, the training set can be converted into a vector of [55000, 784]. The first dimension is the number of pictures contained in the training set, and the second dimension is the vector represented by the pixels of the picture.
Softmax
Softmax can be regarded as an activation function or link function, which converts the output of the linear function we defined into the format we want, that is, the probability distribution about 10 number classes. Therefore, given a picture, its fit for each digit can be converted into a probability value by the Softmax function. The Softmax function can be defined as:
Expanding the subform on the right-hand side of the equation, we get:
For example, to determine what the animal in a picture is, there are three possible results, cat, dog, chicken. If we can calculate their respective scores as 3.2, 5.1, and -1.7, the Softmax process will first evaluate each value. Perform power calculations, which are 24.5, 164.0, and 0.18, respectively, and then calculate the proportion of each power result to the total power result, so that you can get the three values of 0.13, 0.87, and 0.00, so we can achieve differential Shrinking, that is, the good is better, the bad is worse.
If you want to further calculate the loss value, you can further logarithm and then take the negative value, so that if the value after Softmax is closer to 1, then the obtained value is smaller, that is, the smaller the loss, if it is farther away from 1, then the obtained value is larger. .
Implement a regression model
First import TensorFlow, the command is as follows:
import tensorflow as tf
Next we specify an input, where the input is the sample data, if it is a training set, it is a 55000 x 784 matrix, if it is a validation set, it is a 5000 x 784 matrix, and if it is a test set, it is 10000 x 784. matrix, so its number of rows is indeterminate, but the number of columns is determined.
So you can declare a placeholder object first:
x = tf.placeholder(tf.float32, [None, 784])
Here the first parameter specifies the type of each data in the matrix, and the second parameter specifies the dimension of the data.
Next we need to build the first layer of the network, the expression is as follows:
Here, the input x is actually multiplied by the w weight, and then a bias term is added as the output, and these two variables are actually dynamically tuned during the training process, so we need to specify their type as Variable, code show as below:
w = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10]))
The next thing that needs to be implemented is the formula described in the above figure. We will further call Softmax for calculation to get y:
y = tf.nn.softmax(tf.matmul(x, w) + b)
Through the above lines of code, we have completed the construction of the model, and the structure is very simple.
loss function
In order to train our model, we first need to define a metric to evaluate whether this model is good. In fact, in machine learning, we usually define an indicator to indicate that a model is bad, this indicator is called cost or loss, and then try to minimize this indicator. But both ways are the same.
A very common and very nifty cost function is "cross-entropy". Cross-entropy originated from the information compression coding technology in information theory, but it has since evolved into an important technical means in other fields from game theory to machine learning. It is defined as follows:
y is the probability distribution of our predictions, y_label is the actual distribution, and a rough understanding is that cross-entropy is a measure of the inefficiency of our predictions for describing the truth.
We can first define y_label, its expression is:
y_label = tf.placeholder(tf.float32, [None, 10])
Next we need to calculate their cross entropy, the code is as follows:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_label * tf.log(y), reduction_indices=[1]))
First use the reduce_sum() method to sum each dimension, and reduction_indices specifies which dimensions to sum along.
Then call reduce_mean() to find the mean, which averages all the elements in a vector.
In this way, we only need to optimize this cross entropy in the end.
So in this way we define another optimization method:
train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
The GradientDescentOptimizer is used here, where we ask TensorFlow to minimize the cross-entropy with a gradient descent algorithm at a learning rate of 0.5. The gradient descent algorithm is a simple learning process where TensorFlow just moves each variable little by little in the direction of decreasing cost.
run the model
After defining the above content, it is equivalent to that we have built a computational graph, that is, we have set up the model, and we can put it into the Session to run:
with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for step in range(total_steps + 1): batch_x, batch_y = mnist.train.next_batch(batch_size) sess.run(train, feed_dict={x: batch_x, y_label: batch_y})
At each step of the loop, we randomly grab batch_size batches of data points from the training data, and then we run train with those data points as parameters replacing the previous placeholders.
Here some variable definitions are needed:
batch_size = 100 total_steps = 5000
test model
So how does our model perform?
First let's find out which labels are correctly predicted. tf.argmax() is a very useful function that gives the index value of the maximum value of the data of a Tensor object in a certain dimension. Since the label vector is composed of 0,1, the index position where the maximum value 1 is located is the category label. For example, tf.argmax(y, 1) returns the label value predicted by the model for any input x, and tf.argmax (y_label, 1) represents the correct label, and we can use the tf.equal() method to check whether our prediction matches the true label (the same index position means a match).
correct_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_label, axis=1))
This line of code will give us a set of boolean values. To determine the proportion of correct predictors, we can convert the boolean to a float and take the average. For example, [True, False, True, True] becomes [1, 0, 1, 1], which is 0.75 when averaged.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Finally, we calculate the accuracy of the learned model on the test dataset, defined as follows:
steps_per_test = 100 if step % steps_per_test == 0: print(step, sess.run(accuracy, feed_dict={x: mnist.test.images, y_label: mnist.test.labels}))
This final result value should be around 92%.
In this way, we have achieved a basic training model by completing the training and testing phases, and we will continue to optimize the model to achieve better results.
The results are as follows:
0 0.453 100 0.8915 200 0.9026 300 0.9081 400 0.9109 500 0.9108 600 0.9175 700 0.9137 800 0.9158 900 0.9176 1000 0.9167 1100 0.9186 1200 0.9206 1300 0.9161 1400 0.9218 1500 0.9179 1600 0.916 1700 0.9196 1800 0.9222 1900 0.921 2000 0.9223 2100 0.9214 2200 0.9191 2300 0.9228 2400 0.9228 2500 0.9218 2600 0.9197 2700 0.9225 2800 0.9238 2900 0.9219 3000 0.9224 3100 0.9184 3200 0.9253 3300 0.9216 3400 0.9218 3500 0.9212 3600 0.9225 3700 0.9224 3800 0.9225 3900 0.9226 4000 0.9201 4100 0.9138 4200 0.9184 4300 0.9222 4400 0.92 4500 0.924 4600 0.9234 4700 0.9219 4800 0.923 4900 0.9254 5000 0.9218
Epilogue
In this section, we briefly experience the training and prediction process of real data through a MNIST dataset, but the accuracy is not high enough. Later, we will learn to use convolution for model training, and the accuracy will be higher.