TensorFlow notes (XII) - Optimizer related functions and features introduced in tf.train

table of Contents

1. Preamble

2, Tensorflow function

2.1 Training (Training)

█ optimization (Optimizers)

█ Slots

█ coordinator and queue runner (Coordinator and QueueRunner)


1. Preamble

This article was talking about mainly the following list related functions. Training function () by gradient descent method to optimize the operation associated with increased loss function to minimize, during training, a first example of the optimization function, such tf.train.GradientDescentOptimizer, and training based on certain gradient optimization learning rate:

optimizer = tf.train.GradientDescentOptimizer(learning_rate)

Then, a single value may be provided for recording a global training step. And using minimize () operation that updates the model parameters can be optimized not only trained, can also be counted as a global step (global step). Similar to other tensorflow operations, these operations need training in tf.session conversation in

global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
Action group operating
Training Optimizers,Gradient Computation,Gradient Clipping,Distributed execution
Testing Unit tests,Utilities,Gradient checking

2, Tensorflow function

2.1 Training (Training)

TFRecords a file to a string sequence. This format is not randomly get it more suitable for large-scale data flow, and less need for rapid sequence of partitions or other non-acquisition mode.

█  optimization (Optimizers)

tf various class provides methods to optimize the gradient calculation is a loss function, which comprises a more classical optimization algorithm , such GradientDescent and Adagrad .

▶▶class tf.train.Optimizer

operating description
class tf.train.Optimizer Basic optimization class, which is often not called directly, and more use of its subclasses,
such as GradientDescentOptimizer, AdagradOptimizer
or MomentumOptimizer
tf.train.Optimizer.__init__(use_locking, name) Create a new optimizer,
the optimizer must be called subclasses (subclasses) constructor
tf.train.Optimizer.minimize(loss, global_step=None, 
var_list=None, gate_gradients=1, 
aggregation_method=None, colocate_gradients_with_ops=False, 
name=None, grad_loss=None)
Add operation node for minimizing Loss, and updates var_list
the function is simply merged compute_gradients () and apply_gradients () function
returns var_list updated as an optimization, if global_step non None, this operation is also made from global_step by operating
tf.train.Optimizer.compute_gradients(loss,var_list=None, gate_gradients=1,
aggregation_method=None, 
colocate_gradients_with_ops=False, grad_loss=None)
Var_list loss calculated gradient of the variables
list as a function of a first portion of the function minimize () returns a tuple (gradient, variable) consisting of
tf.train.Optimizer.apply_gradients(grads_and_vars, global_step=None, name=None) The calculated gradient is applied to the variable, is a function minimize () a second portion, the operation returns a specific application Operation gradient of increment operations do global_step
tf.train.Optimizer.get_name() Get the name of

▷  class tf.train.Optimizer 
usage

# Create an optimizer with the desired parameters.
opt = GradientDescentOptimizer(learning_rate=0.1)
# Add Ops to the graph to minimize a cost by updating a list of variables.
# "cost" is a Tensor, and the list of variables contains tf.Variable objects.
opt_op = opt.minimize(cost, var_list=<list of variables>)
# Execute opt_op to do one step of training:
opt_op.run()

▶▶ using gradient before processing them 
using the minimize () operation, which can not only calculate the gradient, but also on the variable gradient effect. If you want to use them before processing gradient optimizer can use the following three steps:

1、使用函数compute_gradients()计算梯度
2、按照自己的愿望处理梯度
3、使用函数apply_gradients()应用处理过后的梯度

E.g:

# 创建一个optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)

# 计算<list of variables>相关的梯度
grads_and_vars = opt.compute_gradients(loss, <list of variables>)

# grads_and_vars为tuples (gradient, variable)组成的列表。
#对梯度进行想要的处理,比如cap处理
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]

# 令optimizer运用capped的梯度(gradients)
opt.apply_gradients(capped_grads_and_vars)

█  Slots (other optimizer)

Some optimizer subclasses, such as MomentumOptimizer and AdagradOptimizer allocation and management of the additional variables for training. These variables are called 'Slots', Slots corresponding name, slots can be accessed by the name of the optimizer. Contribute to a training algorithm and report on log debug state slots

operating description
tf.train.Optimizer.get_slot_names() It returns a list of names created by the slots of the Optimizer
tf.train.Optimizer.get_slot(var, name) Returns a name corresponding to the slot, name created by the Optimizer var
var for incoming minimize () or apply_gradients () variable
class tf.train.GradientDescentOptimizer Using a gradient descent algorithm Optimizer
tf.train.GradientDescentOptimizer.__init__(learning_rate, 
use_locking=False, name=’GradientDescent’)
Construction of a new gradient descent optimizer (Optimizer)
class tf.train.AdadeltaOptimizer Use Adadelta algorithm of Optimizer
tf.train.AdadeltaOptimizer.__init__(learning_rate=0.001, 
rho=0.95, epsilon=1e-08, 
use_locking=False, name=’Adadelta’)
Creating Adadelta Optimizer
class tf.train.AdagradOptimizer Use Adagrad algorithm of Optimizer
tf.train.AdagradOptimizer.__init__(learning_rate, 
initial_accumulator_value=0.1, 
use_locking=False, name=’Adagrad’)
创建Adagrad优化器
class tf.train.MomentumOptimizer 使用Momentum算法的Optimizer
tf.train.MomentumOptimizer.__init__(learning_rate, 
momentum, use_locking=False, 
name=’Momentum’, use_nesterov=False)
创建momentum优化器
momentum:动量,一个tensor或者浮点值
class tf.train.AdamOptimizer 使用Adam 算法的Optimizer
tf.train.AdamOptimizer.__init__(learning_rate=0.001,
beta1=0.9, beta2=0.999, epsilon=1e-08,
use_locking=False, name=’Adam’)
创建Adam优化器
class tf.train.FtrlOptimizer 使用FTRL 算法的Optimizer
tf.train.FtrlOptimizer.__init__(learning_rate, 
learning_rate_power=-0.5, 
initial_accumulator_value=0.1, 
l1_regularization_strength=0.0, 
l2_regularization_strength=0.0,
use_locking=False, name=’Ftrl’)
创建FTRL算法优化器
class tf.train.RMSPropOptimizer 使用RMSProp算法的Optimizer
tf.train.RMSPropOptimizer.__init__(learning_rate, 
decay=0.9, momentum=0.0, epsilon=1e-10, 
use_locking=False, name=’RMSProp’)
创建RMSProp算法优化器

对上述算法的说明【2】:

  • 动量(momentum)helps SGD沿相关方向导航,并减弱无关的振荡。它只是将上一步骤的一部分方向添加到当前步骤。这可以在正确的方向上实现放大速度,并减少错误方向的振荡。该分数通常在(0,1)范围内。使用自适应动量也是有意义的。在学习开始时,一个大的动量只会阻碍你的进步,所以使用类似0.01的数值很有意义,一旦所有的高梯度消失,你可以使用更大的动量。动量也有一个问题:当我们非常接近目标时,我们在大多数情况下的动量非常高,而且它不知道它应该放慢速度。这可能导致它错过或在最低点附近振荡
  • 内斯特洛夫加速梯度(nesterov accelerated gradient, NAG)通过开始减速提前克服了这个问题。在动量中,我们首先计算梯度,然后通过先前的任何动量放大该方向的跳跃。 NAG做的是同样的事情,但是按照另一种顺序:首先我们根据存储的信息做出一个大的跳跃,然后我们计算梯度并进行一个小的修正。这个看似不相关的变化给了显著的实际加速。
  • AdaGrad或自适应梯度允许学习速率根据参数进行调整。它对不频繁的参数执行更大的更新,对频繁的参数执行更小的更新。正因为如此,它非常适合稀疏数据(NLP或图像识别)。另一个优点是它基本上不需要调整学习速度。每个参数都有其自己的学习速率,并且由于算法的特性,学习速率是单调递减的。这导致了最大的问题:在某个时间点,学习率很低,系统停止学习
  • AdaDelta解决了AdaGrad中单调降低学习率的问题。在AdaGrad中,学习率大致被计算为除以平方根之和。在每个阶段,您都会在总和中再加一个平方根,这会导致分母不断下降。在AdaDelta中,不是将所有过去的平方根相加,而是使用允许总和减少的滑动窗口。 RMSprop与AdaDelta非常相似
  • Adam或自适应动量是一种类似于AdaDelta的算法。但除了存储每个参数的学习率外,它还分别存储每个参数的动量变化A few visualizations

                                

                             

█ 协调器和队列运行器(Coordinator and QueueRunner)

查看queue中,queue相关的内容,了解tensorflow中队列的运行方式。

操作 描述
class tf.train.Coordinator 线程的协调器
tf.train.Coordinator.clear_stop() 清除停止标记
tf.train.Coordinator.join(threads=None, stop_grace_period_secs=120) 等待线程终止
threads:一个threading.Threads的列表,启动的线程,将额外加入到registered的线程中
tf.train.Coordinator.register_thread(thread) Register一个用于join的线程
tf.train.Coordinator.request_stop(ex=None) 请求线程结束
tf.train.Coordinator.should_stop() 检查是否被请求停止
tf.train.Coordinator.stop_on_exception() 上下文管理器,当一个例外出现时请求停止
tf.train.Coordinator.wait_for_stop(timeout=None) 等待Coordinator提示停止进程
class tf.train.QueueRunner 持有一个队列的入列操作列表,用于线程中运行
queue:一个队列
enqueue_ops: 用于线程中运行的入列操作列表
tf.train.QueueRunner.create_threads(sess, 
coord=None, daemon=False, start=False)
创建运行入列操作的线程,返回一个线程列表
tf.train.QueueRunner.from_proto(queue_runner_def) 返回由queue_runner_def创建的QueueRunner对象
tf.train.add_queue_runner(qr, collection=’queue_runners’) 增加一个QueueRunner到graph的收集器(collection )中
tf.train.start_queue_runners(sess=None, coord=None, daemon=True, start=True, collection=’queue_runners’) 启动所有graph收集到的队列运行器(queue runners)

▷ class tf.train.Coordinator

#Coordinator的使用,用于多线程的协调
try:
  ...
  coord = Coordinator()
  # Start a number of threads, passing the coordinator to each of them.
  ...start thread 1...(coord, ...)
  ...start thread N...(coord, ...)
  # Wait for all the threads to terminate, give them 10s grace period
  coord.join(threads, stop_grace_period_secs=10)
except RuntimeException:
  ...one of the threads took more than 10s to stop after request_stop()
  ...was called.
except Exception:
  ...exception that was passed to coord.request_stop()

▷ tf.train.Coordinator.stop_on_exception()

with coord.stop_on_exception():
  # Any exception raised in the body of the with
  # clause is reported to the coordinator before terminating
  # the execution of the body.
  ...body...
#等价于
try:
  ...body...
exception Exception as ex:
  coord.request_stop(ex)

参考:

【1】Tensorflow一些常用基本概念与函数(四)

【2】如何为GradientDescentOptimizer设置自适应学习率?

Guess you like

Origin blog.csdn.net/qq_37764129/article/details/94214768