Ternsorflow learning: 005-MNIST implementation model

Foreword

In the last lecture , we chose softmax analysis model, and create it with tf. The contents of this lecture is to train in order to test this model.

Trainer

In order to train our model, we first need to define an indicator to assess the model is good. In fact, in machine learning, we usually define the indicators to represent a model is bad, this indicator known as the cost (cost) or loss (loss), and then try to minimize this indicator.

Tool for measuring the quality of the model used, the cost function can be called. Concepts

A very common, very nice cost function is a "cross-entropy" (cross-entropy). Information on cross entropy produced inside information theory of compression technology, but it later evolved into an important technical means from game theory to other fields such as machine learning inside. It is defined as follows:
\ [H'y (Y) = - \ sum_ {Y_ {I} {I} '\ y_i log {}} \]

y is the probability distribution of our prediction, y ' is the actual distribution of the (one-hot our input vector). Rough understanding is that the cross-entropy is a measure of the inefficiency of our forecast is used to describe the truth. A more detailed explanation of the cross-entropy is beyond the scope of this tutorial, but you need very good understanding of it .

To calculate the cross-entropy, we first need to add a new placeholder to enter the correct value:

y_ = tf.placeholder("float", [None,10])

Then we can calculate the cross-entropy with the following equation:
\ [- \ sum_ {Y} {I '\ log {Y}} \]

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

First, tf.logcalculate ythe logarithm of each element.
Next, we put y_each element and tf.log(y_)corresponding elements are multiplied.
Finally, tf.reduce_sumthe sum of all calculated tensor elements.

Note that the cross entropy here is not only used to measure a single prediction and the true value, but the sum of the cross-entropy of all 100 pictures.
For the 100 data points to predict performance better describes our model of performance than the performance of a single data point.

Now we know what we need to do it in our model, with TensorFlow to train it is very easy. Because TensorFlow you have a map describing various computing unit, which can automatically use the back-propagation algorithm (backpropagation algorithm) to effectively determine your variables that affect how you want to minimize the cost value of. Then, TensorFlow will use optimization algorithms you choose to continue to modify variables in order to reduce costs.

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Here, we claim TensorFlow descent algorithm (gradient descent algorithm) learning rate with a gradient of 0.01 to minimize cross entropy. Gradient descent algorithm (gradient descent algorithm) is a simple learning process, TensorFlow just a little bit of each variable costs continue to decrease to a direction of movement. Of course TensorFlow also provides a number of other optimization algorithms : simply adjust the line of code you can use other algorithms.

TensorFlow herein actually it does is that it goes to the description of FIG inside your calculations and algorithms increase gradient descent algorithm calculates a new set of operating means for achieving back-propagation in the background. Then, it returns to you just a single operation, when running this operation, it uses a gradient descent algorithm training your model, fine-tune your variables, declining costs.

Now, we have set up our model. Before running the calculations, we need to add an action to initialize the variable we created:

init = tf.initialize_all_variables()

Now we can be a Sessionstarting inside our model, and initialize variables:

sess = tf.Session()
sess.run(init)

Then start training model, where we make circuit training model 1000!

for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Each step of the cycle, we will randomly batch fetch 100 data points in the training data, and then we use these data points to run as before the replacement parameter placeholders train_step.

Using a fraction of the random data for training is called random training (stochastic training) - more precisely where a stochastic gradient descent training. In an ideal world, we want to use all of our data for each step of the training, because this will give us a better training results, but obviously this requires a lot of computational overhead. So, every training we can use different subsets of data, this can only reduce computational overhead, they can learn to maximize the overall characteristics of the data set.

Assess our model

So how do we model the performance of it?

First, let's find out who predict the correct label. tf.argmaxIs a very useful function, it is given a maximum value of the object index tensor which data resides on one dimension. Since the tag is a vector consisting of 0, so the maximum value of the index position is located a category label, such as tf.argmax(y,1)the return of the model prediction for any input x to the tag value, and tf.argmax(y_,1)the representative of the correct label, we can use tf.equalto test our the prediction is true tag match (index position in expressing match).

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

This line will give us a set of Boolean values. In order to determine the correct proportion of the predictor, we can convert Boolean values to floating point, and then averaged.
For example, [True, False, True, True]it will become [1,0,1,1], after obtain averaged 0.75.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

Finally, we calculate the accuracy of the learned model test data set above.

print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

The final result value should be about 91%.

This result is all right? Ah, not so good. In fact, this result is very poor. This is because we only use a very simple model. However, doing some small improvements, we can get the correct rate of 97%. The best models can even get more than 99.7% accuracy! (For more information, you can look at this for various models compare the performance list .)

More important than the result is that we learn to design ideas from this model. However, if you are still a little disappointed with the results here, you can see the next tutorial , where you can learn how to build more complex models with FensorFlow for better performance!

Guess you like

Origin www.cnblogs.com/schips/p/12159154.html