TensorFlow网络模型的保存与导入（save and restore）

构建深度学习网络，训练时间往往比较长，当训练完成以后，效果比较好，我们希望能将模型保存下来，进行实际的部署和测试，或者进行迁移学习。

下面结合我自己的一些实践介绍一下TensorFlow模型保存和导入的方法。

主要有两种：

（1）传统的使用ckpt模型，然后需要把网络模型整体框架重写一次；

（2）较高版本的tf，使用checkpoint，再使用meta文件，直接将训练好的图导入。

具体示例：

在我的项目里面，我是提取了图像的598个特征，对图像进行而分类。现在打算利用提取的特征向量输入BP神经网络进行训练。BP神经网络包含：输入层（598维向量），第一个隐含层（32个神经元），第二个隐含层（16个神经元），输出层（2个神经元）。

训练代码，网络构建+训练优化

具体代码：

trainData = DataSet(trainx, trainy)  # 训练数据根据具体情况设定

n_nodes_hl1 = 32
n_nodes_hl2 = 16

n_classes = 2
batch_size = 50

def weight_variable(shape, name):
    initial = tf.truncated_normal(shape, stddev=0.1, name=name)
    return tf.Variable(initial)

def bias_variable(shape, name):
    initial = tf.constant(0.1, shape=shape, name=name)
    return tf.Variable(initial)

x = tf.placeholder(tf.float32, [None, 598], name='inputx')
y = tf.placeholder(tf.float32, [None, 2], name='outy')

with tf.variable_scope("inference"):
    w_h1 = weight_variable([598, 32], name="weight_hidden1")
    b_h1 = bias_variable([32], name="bias_hidden1")

    w_h2 = weight_variable([32, 16], name="weight_hidden2")
    b_h2 = bias_variable([16], name="bias_hidden2")

    w_out = weight_variable([16, 2], name="weight_out")
    b_out = bias_variable([2], name="bias_out")

    l1 = tf.add(tf.matmul(x, w_h1), b_h1, name='layer1')
    l1 = tf.nn.relu(l1, name='layer1_out')

    l2 = tf.add(tf.matmul(l1, w_h2), b_h2, name='layer2')
    l2 = tf.nn.relu(l2, name='layer2_out')

    y_out = tf.nn.softmax(tf.matmul(l2, w_out)+b_out, name="probres")

cost = tf.reduce_mean(tf.reduce_sum(tf.square(y-y_out), reduction_indices=[1]), name="squarecost")
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

correct = tf.equal(tf.argmax(y_out, 1), tf.argmax(y, 1), name="correct")
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name="accuracy")

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    loss = []
    saver = tf.train.Saver()  # 保存
    tf.add_to_collection('pred_network', y_out)  # 使用导入meta文件，需要保存输出作为后续的测试
    for i in range(100001):
        batch = trainData.next_batch(20)
        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={x: batch[0], y: batch[1]})
            print("step %d, training accuracy is %g" % (i, train_accuracy))
            loss1 = sess.run(cost, feed_dict={x: batch[0], y: batch[1]})
            print("loss:", loss1)
            loss.append(loss1)
            testacc = accuracy.eval({x: testx, y: testy})
            trainacc = accuracy.eval({x: trainx, y: trainy})
            print("Test Accuracy:", testacc)
            print("Train Accuracy:", trainacc)
            print("======================")
            if (testacc >= 0.9 and trainacc > 0.87):  # 条件根据具体情况而定
                saver.save(sess, "logs/" + "grayglcmlbpmodel.ckpt", i)  # 保存模型
                break
        optimizer.run(feed_dict={x: batch[0], y: batch[1]})
    print("Accuracy:", accuracy.eval({x: testx, y: testy}))

编写TensorFlow模型的一个小技巧：最好将每一个变量，即每一个Variable都设置name

现在，使用上述所说的第一种方法进行模型导入，并测试。

首先，要重新写好前面网络构建的那部分代码：

n_nodes_hl1 = 32
n_nodes_hl2 = 16

n_classes = 2
batch_size = 50

def weight_variable(shape,name):
   initial = tf.truncated_normal(shape, stddev=0.1,name=name)
   return tf.Variable(initial)

def bias_variable(shape, name):
   initial = tf.constant(0.1, shape=shape, name=name)
   return tf.Variable(initial)

def inference(x):
   with tf.variable_scope("inference"):
      w_h1 = weight_variable([598, 32], name="weight_hidden1")
      b_h1 = bias_variable([32], name="bias_hidden1")

      w_h2 = weight_variable([32, 16], name="weight_hidden2")
      b_h2 = bias_variable([16], name="bias_hidden2")

      w_out = weight_variable([16, 2], name="weight_out")
      b_out = bias_variable([2], name="bias_out")

      l1 = tf.add(tf.matmul(x, w_h1), b_h1, name='layer1')
      l1 = tf.nn.relu(l1, name='layer1_out')

      l2 = tf.add(tf.matmul(l1, w_h2), b_h2, name='layer2')
      l2 = tf.nn.relu(l2, name='layer2_out')

      y_out = tf.nn.softmax(tf.matmul(l2, w_out)+b_out, name="probres")
   return y_out

接着导入保存的模型进行批量的测试：

x = tf.placeholder(tf.float32, [None, 598], name='inputx')
pred = inference(x)

sess = tf.Session()
saver = tf.train.Saver()

sess.run(tf.global_variables_initializer())

ckpt = tf.train.get_checkpoint_state("logs/")
if ckpt and ckpt.model_checkpoint_path:  # 加载保存的模型
   saver.restore(sess, ckpt.model_checkpoint_path)

for i in range(testx.shape[0]):
   s = np.argmax(testy[i])
   tt = np.reshape(testx[i], (1, 598))
   res = sess.run(tf.argmax(pred[0]), feed_dict={x: tt})# res就是最终的类别
   print(res)

第二种方法，使用meta文件导入网络框架，然后再使用某次保存的模型。

具体实现：

sess = tf.Session()
new_saver = tf.train.import_meta_graph('logs/grayglcmlbpmodel.ckpt-5200.meta')
new_saver.restore(sess, 'logs/grayglcmlbpmodel.ckpt-5200')
y = tf.get_collection('pred_network')[0]   # 'pred_network'和之前训练的网络相对应
graph = tf.get_default_graph()
x = graph.get_operation_by_name('inputx').outputs[0]
for i in range(testx.shape[0]):   # testx是测试的输入
   tt = np.reshape(testx[i], (1, 598))
   res = sess.run(tf.argmax(y[0]), feed_dict={x: tt})   # res就是最终的类别
   print(res)

导入模型时，可能会产生的问题

(1)运行导入模型时，一个都还没预测/分类出来，就产生如下类似的错误：

tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value inference/Variable_3

原因：根本没有导入模型！

解决方法：检查一下，模型的路径是否写对了，检查一下saver是否已经restore了。

(2)运行导入的模型时，只预测/分类出一个测试数据，不能实现批量预测/分类，产生如下类似错误：

tensorflow.python.framework.errors_impl.NotFoundError: Key inference_1/Variable not found in checkpoint

NotFoundError (see above for traceback): Key inference_1/Variable not found in checkpoint

原因：这个错误可能会比较奇怪一点，因为报错信息显示，在保存好的checkpoint中找不到某个变量，而保存好的模型应该是保存了所有变量才对啊！其实，报错的原因是多次导入了保存的模型，此时，应该确保只导入一次模型，具体操作就是，只启动一个session，而不要每次把session放到循环中。

结语

虽然使用TensorFlow一段时间了，但是，感觉自己对很多细节可能掌握得还不够深入。最近在训练网络时，就出现了上述所说的第二个问题，还困扰了我一段时间，有感而发，所以写这篇博文，分享自己的一些经验！当然，TensorFlow保存模型的方法，还有一种是保存成.pb的格式，这个可以通过编译后跨平台使用，有兴趣的读者也可以了解一下！

TensorFlow网络模型的保存与导入（save and restore）

TensorFlow网络模型的保存与导入（save and restore）

结语

猜你喜欢