Tensorflow MNIST 手写体识别代码注释(3)

tf.train.GradientDescentOptimizer

代码

cost = tf.reduce_mean(-tf. reduce_sum(y*tf.log(pred), reduction_indices=1))
learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
tf.train.GradientDescentOptimizer(learning_rate, use_locking = False, name = ’GradientDescent’)

参数

  • learning_rate: A Tensor or a floating point value. 要使用的学习率
  • use_locking: 要是 True 的话,就对于更新操作(update operations.)使用锁
  • name: 名字,可选,默认是 ”GradientDescent”

例子

import tensorflow as tf

x = tf.Variable(2, name='x', dtype=tf.float32)
log_x = tf.log(x)
log_x_squared = tf.square(log_x)

optimizer = tf.train.GradientDescentOptimizer(0.5)

#minimize() 函数处理了梯度计算和参数更新两个操作
train = optimizer.minimize(log_x_squared)	
init = tf.initialize_all_variables()

with tf.Session() as session:
	session.run(init)
  	print("starting at", "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))
    for step in range(10):  
      	session.run(train)
      	print("step", step, "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))

其他优化器
优化器的源代码在文件:C:\Users\yeping\Anaconda3\Lib\site-packages\tensorflow\train\__init__.py中,查了一下,一共有下面这些优化器,做个记号回头慢慢研究。

  • AdadeltaOptimizer
  • AdagradOptimizer
  • AdagradDAOptimizer
  • AdamOptimizer
  • FtrlOptimizer
  • GradientDescentOptimizer
  • MomentumOptimizer
  • Optimizer
  • ProximalAdagradOptimizer
  • ProximalGradientDescentOptimizer
  • RMSPropOptimizer
  • SyncReplicasOptimizer

网上查了一下,有篇博客讨论了各种优化器,内容非常翔实,留下做个参考:SanFanCSgo, 《机器学习:各种优化器Optimizer的总结与比较》

tf.global_variables_initializer()

在使用变量之前,必须对变量进行初始化。按照习惯用法,使用tf.global_variables_initializer()将所有全局变量的初始化器汇总,并对其进行初始化。

init = tf.global_variables_initializer()
with tf.Session() as sess:
	sess.run(init)

mnist.train.next_batch()

mnist.train.next_batch是专门用于由tensorflow提供的MNIST教程的函数。它的工作原理是在开始时将训练图像和标签对随机化,并在每次调用该函数时选择每个随后的batch_size张图像。一旦到达末尾,图像标签对将再次随机分配,并重复该过程。仅在使用所有可用对后,才重新组合和重复整个数据集。

tf.reset_default_graph()
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
W = tf.Variable(tf.random_normal([784, 10]))
b = tf.Variable(tf.zeros([10]))

learning_rate = 0.01
training_epochs = 25
batch_size = 100
total_batch= int(mnist.train.num_examples/batch_size)

pred = tf.nn.softmax(tf.matmul(x, W) + b)
cost = tf.reduce_mean(-tf. reduce_sum(y*tf.log(pred), reduction_indices = 1))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epoch in range(training_epochs):
        avg_cost= 0
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            _, c = sess.run([optimizer, cost], feed_dict = {x: batch_xs, y: batch_ys})
            avg_cost += c / total_batch
        if (epoch+1) % display_step == 0:
            print ("Epoch:", "%04d" % (epoch+1), "cost = ", "{:9f}".format(avg_cost))
    print("finished!")

sess.run

_, c = sess.run([optimizer, cost], feed_dict = {x: batch_xs, y: batch_ys})

这一段代码,首先展示了 Python 函数可以返回多个值的特点。实质上是下面的赋值语法:

p, q = [1, 2]
... ...
[p, q] = [1, 2]

其次,根据上面的定义,可以看出 optimizer、cost、x、y 之间的依赖关系。首先根据 x、y 计算 cost,然后计算 optimizer。当然,optimizer 本身并不返回结果。

再论 cost 定义

有必要再次讨论 cost 的定义。

x = tf.placeholder( tf.float32, [None, 784] )
y = tf.placeholder( tf.float32, [None, 10] )
W = tf.Variable( tf.random_normal([784, 10]) )
b = tf.Variable( tf.zeros([10]) )
pred = tf.nn.softmax( tf.matmul(x, W) + b )
cost = tf.reduce_mean( -tf.reduce_sum( y*tf.log(pred), reduction_indices = 1 ) )

训练数据集的大小
x、y 的定义中用到了 None,它表示数据集的数量不确定。于是才有了后面的代码:

_, c = sess.run([optimizer, cost], feed_dict = {x: batch_xs, y: batch_ys})

其中 feed_dict = {x: batch_xs, y: batch_ys} 赋予了 x、y 具体的数据集,数据集大小为 batch_size = 100。

cost 的计算过程
y、pred 都是 100 x 10 维度的张量,y*tf.log(pred) 结果也是 100 x 10 维度的张量。tf.reduce_sum( y*tf.log(pred), reduction_indices = 1 )把维度 1 通过求和运算压缩了,结果变成了 100 x 1 维度的张量。

r = y*tf.log(pred)
s = -tf.reduce_sum( r, reduction_indices = 1 )
cost = tf.reduce_mean( t )


r = [ r 1 , 1 r 1 , 2 . . . r 1 , 10 r 2 , 1 r 2 , 2 . . . r 1 , 10 . . . . . . r 100 , 1 r 100 , 2 . . . r 100 , 10 ] r =\left[ \begin{matrix} r_{1,1}&r_{1,2}&...& r_{1,10}\\ r_{2,1}&r_{2,2}&...& r_{1,10}\\ &...&...\\ r_{100,1}&r_{100,2}&...& r_{100,10}\\ \end{matrix} \right]

s = [ s 1 s 2 . . . s 100 ] = [ r 1 , 1 + r 1 , 2 + . . . + r 1 , 10 r 2 , 1 + r 2 , 2 + . . . + r 1 , 10 . . . r 100 , 1 + r 100 , 2 + . . . + r 100 , 10 ] s = \left[ \begin{matrix} s_1\\ s_2\\ ...\\ s_{100}\\ \end{matrix} \right]= \left[ \begin{matrix} r_{1,1}+r_{1,2}+...+ r_{1,10}\\ r_{2,1}+r_{2,2}+...+ r_{1,10}\\ ...\\ r_{100,1}+r_{100,2}+...+ r_{100,10}\\ \end{matrix} \right]

最后,
c o s t = s 1 + s 2 + . . . + s 100 100 cost = \frac{s_1+s_2+...+s_{100}}{100}

发布了174 篇原创文章 · 获赞 80 · 访问量 35万+

猜你喜欢

转载自blog.csdn.net/quicmous/article/details/103652725