TensorFlow study notes (six) - MNIST - Getting Started

MNIST machine learning portal

Audience This tutorial is for novice machine learning and TensorFlow are not aware of. If you already know MNIST and softmax return (softmax regression) of the relevant knowledge, you can read this quick start tutorial .

When we started to learn programming, often the first thing is to learn to print "Hello World". There is like Hello World, machine learning portal has MNIST than the entry programming.

MNIST is an entry-level computer vision data set that contains all kinds of handwritten digital pictures:

It also contains a picture of each corresponding label, tell us this is a few numbers. For example, the above four pictures of this label are 5,0,4,1.

In this tutorial, we will train a machine learning models used to predict the picture inside the numbers. Our purpose is not to design a world-class complex models - although we will later give you the source code to achieve first-class predictive models - but to introduce how to use TensorFlow. So, here we'll start with a very simple mathematical model, which is called Softmax Regression.

This tutorial corresponds to the realization of the code is very short, and really interesting content contains only three lines of code inside. The basic concept TensorFlow workflow and machine learning: however, these design ideas contained in the code which is very important to understand. Therefore, this tutorial describes the implementation of these principles of the code in great detail.

MNIST data set

Official website MNIST data set is Yann LeCun - who's Website . Here, we offer a python source code to automatically download and install this data set. You can download this the code , and then use the following code into your project inside, you can copy and paste directly into your code inside the file.

import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Downloaded data set is divided into two parts: 60,000 line training dataset ( mnist.train) and 000 rows of test data set ( mnist.test). This segmentation is important and must have in machine learning models designed with a separate set of test data is not used for training but to evaluate the performance of the model, making it easier to design the model is extended to other datasets (generalization ).

As mentioned earlier, each data unit MNIST two parts: a digital image comprising handwritten and a corresponding label. We put these pictures to "xs", these tags is set to "ys". Training data set and test data set contains xs and ys, such as the training data set is the picture  mnist.train.images , the label training data set is  mnist.train.labels.

Each image pixel X28 contains 28 pixels. We can use a numeric array to represent this picture:

We call this array unfolded into a vector of length is 28x28 = 784. How to expand the array (the sequence between the numbers) is not important, as long as they each picture unfold the same way. From this perspective, MNIST image dataset 784 is at a point inside dimensional vector space, and has a relatively complex structure  (note: the visualization of such data is computationally intensive).

Flatten an array of digital pictures lost two-dimensional structural information of the picture. This is obviously not ideal, the best computer vision methods will excavate and use this structural information, we will introduce in the following tutorials. But in this tutorial we ignore these structures, a simple mathematical model described, softmax return (softmax regression), these structures will not use the information.

Thus, in MNIST training data set, mnist.train.images is shaped as a  [60000, 784] tensor, a dimension of the first digital image to the index, the second dimension index number for each pixel in the image. Each element in this tensor inside, have said that an image intensity of a pixel value in a value between 0 and 1.

MNIST tag corresponding data set is a number from 0 to 9, is used to describe a given image in a digital representation. In order for this tutorial, we make the tag data is "one-hot vectors". In addition to a digital one-hot vector is a one dimension of the remaining number other than 1 is 0. Therefore, in this tutorial, be represented as a number n only at the n-th dimension (zero) number is a 10-dimensional vector 1. For example, the label is expressed as 0 ([1,0,0,0,0,0,0,0,0,0,0]). Therefore,  mnist.train.labels a  [60000, 10] digital matrix.

Now, we are ready to start building our model it!

Softmax return Introduction

We know MNIST of each image represents a number from 0-9. We hope to receive a given image is representative of the probability of each number. For example, our model may contain a speculative probability image is representative of the number 9 of 9 was 80% but it is the judgment of 8 5% probability (because 8 and 9 are upper part of the small circle), then give it representatives of other digital probability of a smaller value.

This is a classic case of using softmax return (softmax regression) model. softmax model can be used to assign probabilities to different objects. Even after, when we train more sophisticated model, the final step is also needed to assign probabilities with softmax.

softmax return (softmax regression) in two steps: the first step

In order to get a given photographic evidence of a particular class of digital (evidence) belong, we weighted sum of the values ​​of image pixels. If the pixels have strong evidence that this picture does not belong to the class, then the corresponding weight value is negative, on the contrary, if the pixels have strong evidence to support this picture belong to this category, then the weights are positive numbers.

The following image shows the model on a picture to learn the value of each pixel for the right to a specific number of classes. Red represents negative weights, blue for positive weights.

We also need to add an additional offset (bias), because the amount of interference with input often unrelated. Thus, for a given input picture  x  it represents digital  i  evidence can be expressed as

Wherein   the representative weight  represents a number  i  offset class, J  behalf of a given image  x  pixel index for summing the pixel. You can then convert into evidence by softmax function probability  the y- :

softmax herein can be viewed as an incentive (Activation) function or link (link) function, a linear function of converting the output into a format that we define we want, i.e. 10 numbers on a probability distribution of the classes. Thus, given a picture, which can be converted to a digital goodness of fit of each softmax function as a probability value. softmax function can be defined as:

Expand the right hand side of the sub-type, you can get:

But more often the model softmax function is defined as a form of the former: the input value is evaluated as the exponent, then the positive values of these results. This represents exponentiation, corresponding to a weight value greater evidence multiplier weights greater hypothetical model (hypothesis) inside. On the contrary, the evidence has fewer means having a smaller multiplier coefficients in the model assumptions inside. Hypothetical model where the weights can not be a 0 or a negative value. N Softmax will then these weights are based, so that their sum is equal to 1, in order to construct a valid probability distribution. (More information on Softmax function, which can refer to the book of the Michael Nieslen portion , which may be interactive visual interpretation of the softmax.)

For softmax regression model can be explained by the following diagram, the input of the xsweighted sum, then adding a respective offset, and finally inputted into the softmax function:

If you put it into an equation, we get:

We can also use vector representation of this calculation: adding vector and matrix multiplication. This helps to improve computational efficiency. (Also a more efficient way of thinking)

Furthermore, it can be written in a more compact way:

Achieve regression model

For efficient numerical calculation implemented in Python, we often use the library, such as NumPy, will similarly complex matrix multiplication operation using a different one such language. Unfortunately, each switching operation back to Python from an external computing is still a lot of overhead. If you calculate for external use GPU, so the cost will be even greater. With distributed computing mode, it will spend more resources used to transmit data.

TensorFlow also placed outside the complex calculations python completed, but in order to avoid the overhead of those who said earlier, it made a further improvement. Tensorflow not individually run a single complex calculations, but let us first graphically depicts a series of interactive computing operation, then all run together outside of Python. (Such a similar mode of operation, can be seen in many machine learning library.)

Before using TensorFlow, first import it:

import tensorflow as tf

We describe these interactive operation unit by operating the variable symbol, to create a following manner:

x = tf.placeholder("float", [None, 784])

xNot a specific value, but a placeholder placeholder, we enter this value when TensorFlow run calculations. We want to input any number MNIST images, each flattened into a map of the vector of 784 dimensions. We use a two-dimensional tensor float to represent these figures, the tensor shape [None,784 ]. (Herein Nonedenotes a dimension tensor can be any length.)

Our model also requires weights and offset values, of course, we can use them as a further input (placeholders), but there is a better way TensorFlow to represent them: Variable . A Variablerepresentative of a tensor can be modified, is present in FIG TensorFlow for describing interactive operation. They can be used to calculate the input value may also be modified in the calculation. For a variety of machine learning applications, usually there are model parameters can be Variableexpressed.

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

We give tf.Variabledifferent initial values to create different Variable: Here, we use tensor are all zero to initialize Wand b. Because we want to learn Wand bvalue of their initial value can be freely set.

Note that Wthe dimensions are [784,10], because we want to multiply it with a 784-dimensional vector to obtain a picture of 10-dimensional vector value of evidence, every class corresponding to different numbers. bThe shape [10], so we can directly output it to the top.

Now, we can realize our model friends. Only one line of code!

y = tf.nn.softmax(tf.matmul(x,W) + b)

First, we tf.matmul(​​X,W)expressed xmultiply W, inside the corresponding previous equation , here xis a 2-dimensional tensor has a plurality of inputs. Then combined b, and input to the tf.nn.softmaxfunction inside.

At this point, we will start with a few lines of code to set variables, and then one line of code to define our model. TensorFlow not only can make softmax regression model calculation becomes particularly simple, it is also used to describe a variety of other numerical calculations in this very flexible way, from machine learning models for physics simulation model. Once they are defined, our model can run on different devices: computer CPU, GPU, or even a cell phone!

Trainer

In order to train our model, we first need to define an indicator to assess the model is good. In fact, in machine learning, we usually define the indicators to represent a model is bad, this indicator known as the cost (cost) or loss (loss), and then try to minimize this indicator. However, these two methods are the same.

A very common, very nice cost function is a "cross-entropy" (cross-entropy). Information on cross entropy produced inside information theory of compression technology, but it later evolved into an important technical means from game theory to other fields such as machine learning inside. It is defined as follows:

y  is the probability distribution of our prediction,  y '  is the actual distribution of the (one-hot our input vector). Rough understanding is that the cross-entropy is a measure of the inefficiency of our forecast is used to describe the truth. A more detailed explanation of the cross-entropy is beyond the scope of this tutorial, but you need very good understanding of it .

To calculate the cross-entropy, we first need to add a new placeholder to enter the correct value:

y_ = tf.placeholder("float", [None,10])

Then we can   calculate the cross-entropy:

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

First,  tf.log calculate  y the logarithm of each element. Next, we put  y_ each element and  tf.log(y_) corresponding elements are multiplied. Finally,  tf.reduce_sum the sum of all calculated tensor elements. (Note that the cross-entropy here is not only used to measure a single prediction and the true value, but the sum of the cross-entropy of all 100 images for 100 data points to predict performance better than the performance of a single data point describe the performance of our model.

Now we know what we need to do it in our model, with TensorFlow to train it is very easy. Because TensorFlow you have a map describing various computing unit, which can automatically use the back-propagation algorithm (backpropagation algorithm) to effectively determine your variables that affect how you want to minimize the cost value of. Then, TensorFlow will use optimization algorithms you choose to continue to modify variables in order to reduce costs.

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Here, we claim TensorFlow descent algorithm (gradient descent algorithm) learning rate with a gradient of 0.01 to minimize cross entropy. Gradient descent algorithm (gradient descent algorithm) is a simple learning process, TensorFlow just a little bit of each variable costs continue to decrease to a direction of movement. Of course TensorFlow also provides a number of other optimization algorithms : simply adjust the line of code you can use other algorithms.

TensorFlow herein actually it does is that it goes to the description of FIG inside your calculations and algorithms increase gradient descent algorithm calculates a new set of operating means for achieving back-propagation in the background. Then, it returns to you just a single operation, when running this operation, it uses a gradient descent algorithm training your model, fine-tune your variables, declining costs.

Now, we have set up our model. Before running the calculations, we need to add an action to initialize the variable we created:

init = tf.initialize_all_variables()

Now we can be a Sessionstarting inside our model, and initialize variables:

sess = tf.Session()
sess.run(init)

Then start training model, where we make circuit training model 1000!

for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Each step of the cycle, we will randomly batch fetch 100 data points in the training data, and then we use these data points to run as before the replacement parameter placeholders train_step.

Using a fraction of the random data for training is called random training (stochastic training) - more precisely where a stochastic gradient descent training. In an ideal world, we want to use all of our data for each step of the training, because this will give us a better training results, but obviously this requires a lot of computational overhead. So, every training we can use different subsets of data, this can only reduce computational overhead, they can learn to maximize the overall characteristics of the data set.

Assess our model

So how do we model the performance of it?

First, let's find out who predict the correct label. tf.argmax Is a very useful function, it is given a maximum value of the object index tensor which data resides on one dimension. Since the tag is a vector consisting of 0, so the maximum value of the index position is located a category label, such as tf.argmax(y,1)the return of the model prediction for any input x to the tag value, and  tf.argmax(y_,1) the representative of the correct label, we can use  tf.equal to test our the prediction is true tag match (index position in expressing match).

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

This line will give us a set of Boolean values. In order to determine the correct proportion of the predictor, we can convert Boolean values to floating point, and then averaged. For example, [True, False, True, True] it will become  [1,0,1,1] , after obtain averaged  0.75.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

Finally, we calculate the accuracy of the learned model test data set above.

print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

The final result value should be about 91%.

This result is all right? Ah, not so good. In fact, this result is very poor. This is because we only use a very simple model. However, doing some small improvements, we can get the correct rate of 97%. The best models can even get more than 99.7% accuracy! (For more information, you can look at this for various models compare the performance list .)

More important than the result is that we learn to design ideas from this model. However, if you are still a little disappointed with the results here, you can see the next tutorial , where you can learn how to build more complex models with FensorFlow for better performance!

 

program 

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

import tensorflow as tf

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder("float", [None, 10])
cross_entropy = -tf.reduce_mean(y_ * tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print(i, sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

sess.close()

 

Published 47 original articles · won praise 121 · views 680 000 +

Guess you like

Origin blog.csdn.net/guoyunfei123/article/details/82852786