estimator：估算器

tf.estimator -----一种高级TensorFlow API。估算器封装以下操作：

训练（training）
评价（evaluation）
预测（prediction）
导出服务（export for serving）

可以使用其提供的预制估算器（pre-made Estimators），也可以编写自己的自定义估算器（custom Estimators）。所有Estimators（无论是预制的还是自定义的）都是基于tf.estimator.Estimator类的类。

估算器功能

估算器具有以下优点：

开发者可以在本地主机或分布式多服务器环境上运行基于Estimator的模型，而无需更改模型。此外，开发者可以在CPU，GPU或TPU上运行基于Estimator的模型，而无需重新编码模型。
估算器提供了一个安全的分布式训练循环，该循环控制如何以及何时执行以下操作：
- 加载数据
- 处理异常
- 创建检查点文件并从故障中恢复
- 为TensorBoard保存摘要

使用Estimators编写应用程序时，必须将数据输入管道与模型分开。这种分离简化了使用不同数据集的实验。

预制估算器

预制的估算器使开发者可以在比基本TensorFlow API更高的概念层次上工作。开发者无需再担心创建计算图或会话，因为Estimators会为开发者处理所有“管道”。此外，预制的Estimators使开发者仅需进行最少的代码更改即可尝试不同的模型架构。例如， tf.estimator.DNNClassifier是一个预制的Estimator类，它基于密集的前馈神经网络训练分类模型。

预制估算程序的结构

依赖预制Estimator的TensorFlow程序通常包括以下四个步骤：

1.编写一个或多个数据集导入函数。

例如，开发者可能创建一个函数来导入训练集，而另一个函数来导入测试集。每个数据集导入函数必须返回两个对象：

一个字典，其中的键是特征名称，值是包含相应特征数据的张量（或SparseTensors）
一个包含一个或多个标签的张量

例如，以下代码说明了输入函数的基本框架：

def input_fn(dataset):
    ...  # manipulate dataset, extracting the feature dict and the label
    return feature_dict, label

2.定义特征列。

每个tf.feature_column标识特征名称，其类型以及任何输入预处理。例如，以下代码段创建了三个包含整数或浮点数据的特征列。前两个特征列仅标识特征的名称和类型。第三个特征列还指定了程序将调用以缩放原始数据的lambda：

扫描二维码关注公众号，回复： 11700779 查看本文章

# Define three numeric feature columns.
population = tf.feature_column.numeric_column('population')
crime_rate = tf.feature_column.numeric_column('crime_rate')
median_education = tf.feature_column.numeric_column(
  'median_education',
  normalizer_fn=lambda x: x - global_education_mean)

3.实例化相关的预制估算器。

例如，这是一个名为LinearClassifier的预制Estimator的示例实例：

# Instantiate an estimator, passing the feature columns.
estimator = tf.estimator.LinearClassifier(
  feature_columns=[population, crime_rate, median_education])

4.调用训练，评估或推理方法。

例如，所有估算器都提供一种train模型的train方法。

# `input_fn` is the function created in Step 1
estimator.train(input_fn=my_training_set, steps=2000)

预制估算器的好处

预制的估算器对最佳实践进行编码，具有以下优点：

确定计算图的不同部分应在何处运行的最佳实践，在一台计算机或群集上实施策略。
事件（摘要）编写和通用摘要的最佳做法。

如果不使用预制的估算器，则必须自己实现上述功能。

自定义估算器

每个Estimator的核心（无论是预制的还是自定义的）都是其模型函数 ，它是一种为训练，评估和预测生成图表的方法。当开发者使用预制的估算器时，其他人已经实现了模型功能。依靠自定义Estimator时，开发者必须自己编写模型函数。

根据Keras模型创建估算器

开发者可以使用tf.keras.estimator.model_to_estimator将现有的tf.keras.estimator.model_to_estimator模型转换为Estimators。这样做可以使开发者的Keras模型获得Estimator的优势，例如分布式训练。

实例化Keras MobileNet V2模型，并使用优化器（optimizer），损失（ loss）和度量（ metrics）对模型进行编译，以进行以下训练：

keras_mobilenet_v2 = tf.keras.applications.MobileNetV2(
    input_shape=(160, 160, 3), include_top=False)
keras_mobilenet_v2.trainable = False

estimator_model = tf.keras.Sequential([
    keras_mobilenet_v2,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(1)
])

# Compile the model
estimator_model.compile(
    optimizer='adam',
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=['accuracy'])

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_160_no_top.h5
9412608/9406464 [==============================] - 1s 0us/step

根据已编译的Keras模型创建一个Estimator 。 Keras模型的初始模型状态保留在创建的Estimator ：

8800EC069E FA58101C2D

像对待其他任何Estimator一样对待派生的Estimator 。

IMG_SIZE = 160  # All images will be resized to 160x160

def preprocess(image, label):
  image = tf.cast(image, tf.float32)
  image = (image/127.5) - 1
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  return image, label

def train_input_fn(batch_size):
  data = tfds.load('cats_vs_dogs', as_supervised=True)
  train_data = data['train']
  train_data = train_data.map(preprocess).shuffle(500).batch(batch_size)
  return train_data

要进行训练，请调用Estimator的训练函数：

est_mobilenet_v2.train(input_fn=lambda: train_input_fn(32), steps=500)

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
Downloading and preparing dataset cats_vs_dogs/4.0.0 (download: 786.68 MiB, generated: Unknown size, total: 786.68 MiB) to /home/kbuilder/tensorflow_datasets/cats_vs_dogs/4.0.0...

Warning:absl:1738 images were corrupted and were skipped

Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/cats_vs_dogs/4.0.0.incompleteY3OG6H/cats_vs_dogs-train.tfrecord
Dataset cats_vs_dogs downloaded and prepared to /home/kbuilder/tensorflow_datasets/cats_vs_dogs/4.0.0. Subsequent calls will reuse this data.
INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp3kl6ql4q/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})

INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp3kl6ql4q/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})

INFO:tensorflow:Warm-starting from: /tmp/tmp3kl6ql4q/keras/keras_model.ckpt

INFO:tensorflow:Warm-starting from: /tmp/tmp3kl6ql4q/keras/keras_model.ckpt

INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.

INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.

INFO:tensorflow:Warm-started 158 variables.

INFO:tensorflow:Warm-started 158 variables.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp3kl6ql4q/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp3kl6ql4q/model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:loss = 0.7249289, step = 0

INFO:tensorflow:loss = 0.7249289, step = 0

INFO:tensorflow:global_step/sec: 20.5614

INFO:tensorflow:global_step/sec: 20.5614

INFO:tensorflow:loss = 0.67807716, step = 100 (4.865 sec)

INFO:tensorflow:loss = 0.67807716, step = 100 (4.865 sec)

INFO:tensorflow:global_step/sec: 22.1484

INFO:tensorflow:global_step/sec: 22.1484

INFO:tensorflow:loss = 0.6722798, step = 200 (4.515 sec)

INFO:tensorflow:loss = 0.6722798, step = 200 (4.515 sec)

INFO:tensorflow:global_step/sec: 21.9558

INFO:tensorflow:global_step/sec: 21.9558

INFO:tensorflow:loss = 0.60414714, step = 300 (4.555 sec)

INFO:tensorflow:loss = 0.60414714, step = 300 (4.555 sec)

INFO:tensorflow:global_step/sec: 21.9324

INFO:tensorflow:global_step/sec: 21.9324

INFO:tensorflow:loss = 0.7141589, step = 400 (4.559 sec)

INFO:tensorflow:loss = 0.7141589, step = 400 (4.559 sec)

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 500...

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 500...

INFO:tensorflow:Saving checkpoints for 500 into /tmp/tmp3kl6ql4q/model.ckpt.

INFO:tensorflow:Saving checkpoints for 500 into /tmp/tmp3kl6ql4q/model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 500...

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 500...

INFO:tensorflow:Loss for final step: 0.6571169.

INFO:tensorflow:Loss for final step: 0.6571169.

<tensorflow_estimator.python.estimator.estimator.EstimatorV2 at 0x7f7aa444ef60>

同样，要进行评估，请调用Estimator的评估函数：

est_mobilenet_v2.evaluate(input_fn=lambda: train_input_fn(32), steps=10)

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2020-07-23T01:29:37Z

INFO:tensorflow:Starting evaluation at 2020-07-23T01:29:37Z

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from /tmp/tmp3kl6ql4q/model.ckpt-500

INFO:tensorflow:Restoring parameters from /tmp/tmp3kl6ql4q/model.ckpt-500

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Evaluation [1/10]

INFO:tensorflow:Evaluation [1/10]

INFO:tensorflow:Evaluation [2/10]

INFO:tensorflow:Evaluation [2/10]

INFO:tensorflow:Evaluation [3/10]

INFO:tensorflow:Evaluation [3/10]

INFO:tensorflow:Evaluation [4/10]

INFO:tensorflow:Evaluation [4/10]

INFO:tensorflow:Evaluation [5/10]

INFO:tensorflow:Evaluation [5/10]

INFO:tensorflow:Evaluation [6/10]

INFO:tensorflow:Evaluation [6/10]

INFO:tensorflow:Evaluation [7/10]

INFO:tensorflow:Evaluation [7/10]

INFO:tensorflow:Evaluation [8/10]

INFO:tensorflow:Evaluation [8/10]

INFO:tensorflow:Evaluation [9/10]

INFO:tensorflow:Evaluation [9/10]

INFO:tensorflow:Evaluation [10/10]

INFO:tensorflow:Evaluation [10/10]

INFO:tensorflow:Inference Time : 2.13840s

INFO:tensorflow:Inference Time : 2.13840s

INFO:tensorflow:Finished evaluation at 2020-07-23-01:29:39

INFO:tensorflow:Finished evaluation at 2020-07-23-01:29:39

INFO:tensorflow:Saving dict for global step 500: accuracy = 0.60625, global_step = 500, loss = 0.63060856

INFO:tensorflow:Saving dict for global step 500: accuracy = 0.60625, global_step = 500, loss = 0.63060856

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 500: /tmp/tmp3kl6ql4q/model.ckpt-500

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 500: /tmp/tmp3kl6ql4q/model.ckpt-500

{'accuracy': 0.60625, 'loss': 0.63060856, 'global_step': 500}

应用实例：

1.预制估算器

# 我们会用到NumPy来处理各种训练数据
import numpy as np
import tensorflow as tf

# 创建一个特征向量列表，该特征列表里只有一个特征向量，
# 该特征向量为实数向量，只有一个元素的数组，且该特征向量名称为 x，
# 我们还可以创建其他更加复杂的特征列表
feature_columns = [tf.feature_column.numeric_column("x", shape=[1])]

# 创建一个LinearRegressor训练器，并传入特征向量列表
estimator = tf.estimator.LinearRegressor(feature_columns=feature_columns)

# 保存训练用的数据
x_train = np.array([1., 2., 3., 6., 8.])
y_train = np.array([4.8, 8.5, 10.4, 21.0, 25.3])

# 保存评估用的数据
x_eval = np.array([2., 5., 7., 9.])
y_eval = np.array([7.6, 17.2, 23.6, 28.8])

# 用训练数据创建一个输入模型，用来进行后面的模型训练
# 第一个参数用来作为线性回归模型的输入数据
# 第二个参数用来作为线性回归模型损失模型的输入
# 第三个参数batch_size表示每批训练数据的个数
# 第四个参数num_epochs为epoch的次数，将训练集的所有数据都训练一遍为1次epoch
# 第五个参数shuffle为取训练数据是顺序取还是随机取，是否打乱数据集
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=2, num_epochs=None, shuffle=True)

# 再用训练数据创建一个输入模型，用来进行后面的模型评估
train_input_fn_2 = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=2, num_epochs=1000, shuffle=False)

# 用评估数据创建一个输入模型，用来进行后面的模型评估
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_eval}, y_eval, batch_size=2, num_epochs=1000, shuffle=False)

# 使用训练数据训练1000次
estimator.train(input_fn=train_input_fn, steps=1000)

# 使用原来训练数据评估一下模型，目的是查看训练的结果
train_metrics = estimator.evaluate(input_fn=train_input_fn_2)
print("train metrics: %s" % train_metrics)

# 使用评估数据评估一下模型，目的是验证模型的泛化性能
eval_metrics = estimator.evaluate(input_fn=eval_input_fn)
print("eval metrics: %s" % eval_metrics)

# 输出结果：
# train metrics: {'average_loss': 0.5415543, 'label/mean': 13.999993, 'loss': 1.0831085, 'prediction/mean': 14.167501,
#                   'global_step': 1000}
# eval metrics: {'average_loss': 0.3701312, 'label/mean': 19.299805, 'loss': 0.7402624, 'prediction/mean': 19.188004,
#                   'global_step': 1000}

2.自定义估算器

import numpy as np
import tensorflow as tf


# 定义模型训练函数，同时也定义了特征向量
def model_fn(features, labels, mode):
    # 构建线性模型
    W = tf.get_variable("W", [1], dtype=tf.float64)
    b = tf.get_variable("b", [1], dtype=tf.float64)
    y = W * features['x'] + b

    # 构建损失模型
    loss = tf.reduce_sum(tf.square(y - labels))

    # 训练模型子图
    global_step = tf.train.get_global_step()
    optimizer = tf.train.GradientDescentOptimizer(0.01)
    # tf.assign_add返回一个op，表示给变量global_step加1的操作
    train = tf.group(optimizer.minimize(loss),
                     tf.assign_add(global_step, 1))

    # 通过EstimatorSpec指定我们的训练子图以及损失模型
    return tf.estimator.EstimatorSpec(
        mode=mode,
        predictions=y,
        loss=loss,
        train_op=train)


# 创建自定义的训练模型
estimator = tf.estimator.Estimator(model_fn=model_fn)

# 后面的训练逻辑与使用LinearRegressor一样
x_train = np.array([1., 2., 3., 6., 8.])
y_train = np.array([4.8, 8.5, 10.4, 21.0, 25.3])

x_eval = np.array([2., 5., 7., 9.])
y_eval = np.array([7.6, 17.2, 23.6, 28.8])

train_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=2, num_epochs=None, shuffle=True)

train_input_fn_2 = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=2, num_epochs=1000, shuffle=False)

eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_eval}, y_eval, batch_size=2, num_epochs=1000, shuffle=False)

estimator.train(input_fn=train_input_fn, steps=1000)

train_metrics = estimator.evaluate(input_fn=train_input_fn_2)
print("train metrics: %s" % train_metrics)

eval_metrics = estimator.evaluate(input_fn=eval_input_fn)
print("eval metrics: %s" % eval_metrics)

# 输出结果：
# train metrics: {'loss': 1.8509237, 'global_step': 1000}
# eval metrics: {'loss': 1.5535184, 'global_step': 1000}

tf.estimator用法