Tensorflow high-level API introduction: Estimator

1. Estimator architecture

2. Steps to use Estimator

3. Understand Estimator from the source code

(1) The source code of Estimator is as follows:

(2) Build model_fn

(3) What is tf.estimator.EstimatorSpec

(4) Specify config


1. Estimator architecture

            

    You can see that the Estimator is a High level API, and the Mid-level API are:

  • Layers: used to build the network structure
  • Datasets: used to build a data reading pipeline
  • Metrics: used to evaluate network performance

    It can be seen that if we use Estimator, we only need to pay attention to these three parts, instead of worrying about some too detailed things, and no longer need to use annoying Session.

2. Steps to use Estimator

         

 

  • Create one or more input functions, namely input_fn;
  • Define the feature columns of the model, namely feature_columns;
  • Instantiate the Estimator, specify the feature column and various hyperparameters;
  • Call one or more methods on the Estimator object, passing the appropriate input function as the source of the data. (Train, evaluate, predict)

 3. Understand Estimator from the source code

(1) The source code of Estimator is as follows:

class Estimator(object):
  def __init__(self, 
               model_fn, 
               model_dir=None, 
               config=None, 
               params=None, 
               warm_start_from=None):
  ...

- model_dir: 指定checkpoints和其他日志存放的路径。
- model_fn: 这个是需要我们自定义的网络模型函数,后面详细介绍
- config: 用于控制内部和checkpoints等,如果model_fn函数也定义config这个变量,则会将config传给model_fn
- params: 该参数的值会传递给model_fn。
- warm_start_from: 指定checkpoint路径,会导入该checkpoint开始训练

(2) Build model_fn

def model_fn(
    features,    # This is batch_features from input_fn,`Tensor` or dict of `Tensor` (depends on data passed to `fit`).
    labels,     # This is batch_labels from input_fn
    mode,       # An instance of tf.estimator.ModeKeys
    params,     # Additional configuration
    config=None
   ):
 ...

- features和labels两个参数是从输入函数中返回的特征和标签批次,也就是说,features 和 labels 是模型将使用的数据;
- params 是一个字典,它可以传入许多参数用来构建网络或者定义训练方式等。例如通过设置params['n_classes']来定义最终输出节点的个数等。
- config 通常用来控制checkpoint或者分布式什么,这里不深入研究。
- mode 参数表示调用程序是请求训练、评估还是预测,分别通过tf.estimator.ModeKeys.TRAIN/EVAL/PREDICT 来定义。
    另外通过观察DNNClassifier的源代码可以看到,mode这个参数并不用手动传入,因为Estimator会自动调整。
    例如当你调用estimator.train(...)的时候,mode则会被赋值tf.estimator.ModeKeys.TRAIN。

     model_fn needs to provide different processing methods for different modes, and all need to return an instance of tf.estimator.EstimatorSpec . The popular explanation is: the model has three stages of training, verification and testing, and for different modes, there are different ways of processing the data. For example, in the training phase, we need to feed the data to the model. The model gives a predicted value based on the input data. Then we calculate the loss through the predicted value and the true value, and finally use the loss to update the network parameters. In the evaluation phase, we don’t The network parameters need to be backpropagated to update. In other words, mdoel_fn needs to set three sets of codes for the three modes. What else does model_fn need to return? Estimator stipulates that model_fn needs to return tf.estimator.EstimatorSpec so that it can be processed more generally.

(3) What is tf.estimator.EstimatorSpec

    It is a class (class), which is defined in model_fn, and model_fn returns an instance of it. This instance is used to initialize the Estimator class. The source code is as follows:

class EstimatorSpec():
  def __new__(cls,
              mode,
              predictions=None,
              loss=None,
              train_op=None,
              eval_metric_ops=None,
              export_outputs=None,
              training_chief_hooks=None,
              training_hooks=None,
              scaffold=None,
              evaluation_hooks=None,
              prediction_hooks=None):
...
- mode:一个ModeKeys,指定是training(训练)、evaluation(计算)还是prediction(预测).
- predictions:Predictions Tensor or dict of Tensor.
- loss:Training loss Tensor. Must be either scalar, or with shape [1].
- train_op:适用于训练的步骤.
- eval_metric_ops: Dict of metric results keyed by name. The values of the dict can be one of the following:
  (1) instance of Metric class.
  (2) Results of calling a metric function, namely a (metric_tensor, update_op) tuple. 
      metric_tensor should be evaluated without any impact on state (typically is a pure computation results based on variables.). 
      For example, it should not trigger the update_op or requires any input fetching.

    Different modes need to pass in different parameters:     

  • For mode == ModeKeys.TRAIN: The required fields are loss and train_op.
  • For mode == ModeKeys.EVAL: the required field is loss.
  • For mode == ModeKeys.PREDICT: The required field is predictions.

  1) The simplest case: predict, only need to pass in mode and predictions    

predicted_classes = tf.argmax(logits, 1)
if mode == tf.estimator.ModeKeys.PREDICT:
    predictions = {
        'class_ids': predicted_classes[:, tf.newaxis],
        'probabilities': tf.nn.softmax(logits),
        'logits': logits,
    }
    return tf.estimator.EstimatorSpec(mode, predictions=predictions)

  2) Evaluation mode: eval needs to pass in mode, loss, eval_metric_ops

    If you call the evaluate method of the Estimator, model_fn will receive mode = ModeKeys.EVAL. In this case, the model function must return a tf.estimator.EstimatorSpec containing the model loss and one or more indicators (optional).

# Compute loss.
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
# Compute evaluation metrics.
accuracy = tf.metrics.accuracy(
              labels=labels,
              predictions=predicted_classes,
              name='acc_op')
metrics = {'accuracy': accuracy}
if mode == tf.estimator.ModeKeys.EVAL:
    return tf.estimator.EstimatorSpec(
              mode, 
              loss=loss, 
              eval_metric_ops=metrics)

  3) Training mode: train needs to pass in mode, loss, train_op

# Compute loss.
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

  4) General mode: model_fn can fill all parameters independent of the mode. In this case, Estimator will ignore some parameters. In eval and infer modes, train_op will be ignored. Examples are as follows:

def model_fn(mode, features, labels):
  predictions = ...
  loss = ...
  train_op = ...
  return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions=predictions,
      loss=loss,
      train_op=train_op)

(4) Specify config

    The config here needs to be passed in tf.estimator.RunConfig, and its source code is as follows:

class RunConfig(object):
  """This class specifies the configurations for an `Estimator` run."""

  def __init__(self,
               model_dir=None,
               tf_random_seed=None,
               save_summary_steps=100,
               save_checkpoints_steps=_USE_DEFAULT,
               save_checkpoints_secs=_USE_DEFAULT,
               session_config=None,
               keep_checkpoint_max=5,
               keep_checkpoint_every_n_hours=10000,
               log_step_count_steps=100,
               train_distribute=None,
               device_fn=None,
               protocol=None,
               eval_distribute=None,
               experimental_distribute=None,
               experimental_max_worker_delay_secs=None,
               session_creation_timeout_secs=7200):
...
- model_dir: 指定存储模型参数,graph等的路径;
- save_summary_steps: 每隔多少step就存一次Summaries;
- save_checkpoints_steps: 每隔多少个step就存一次checkpoint;
- save_checkpoints_secs: 每隔多少秒就存一次checkpoint,不可以和save_checkpoints_steps同时指定;
    如果二者都不指定,则使用默认值,即每600秒存一次。如果二者都设置为None,则不存checkpoints。
- keep_checkpoint_max:指定最多保留多少个checkpoints,也就是说当超出指定数量后会将旧的checkpoint删除。
    当设置为None或0时,则保留所有checkpoints;
- keep_checkpoint_every_n_hours:保存checkpoint文件的频率;
- log_step_count_steps:该参数的作用是,(相对于总的step数而言)指定每隔多少step就记录一次训练过程中loss的值,
    同时也会记录global steps/s,通过这个也可以得到模型训练的速度快慢;

 

Guess you like

Origin blog.csdn.net/MOU_IT/article/details/103822448